Automating Azure Data Factory v2 with Biml

Automating Azure Data Factory v2 with Biml

As you may have seen at PASS Summit 2017 or another event, with the announcement of Azure Data Factory v2 (adf), Biml will natively support adf objects.

Please note, that the native support is currently only available in BimlStudio 2018. If you’re using BimlExpress, you can still generate the JSON for your pipelines, datasets etc. using Biml but you cannot use the newly introduced tags.

If you are lucky enough to be a BimlStudio user, these are the new tags to look forward to:

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
	<DataFactories>
		<DataFactory>
			<LinkedServices></LinkedServices>
			<Datasets></Datasets>
			<Pipelines></Pipelines>
			<Triggers></Triggers>
			<Annotations></Annotations>
        </DataFactory>
    </DataFactories>
</Biml>

Let’s take a deeper look at these.
Annotations work just like everywhere else in Biml so we won’t go into details. The DataFactory tag itself only takes two arguments: Name (required) and an optional logical display folder.
The others are more interesting. As you get full intellisense on all of them, we will not elaborate every single option but will focus on giving you a good overview of the model:

1. LinkedServices
These are your connections (sources and targets) so potential members range from SqlServer over AzureSqlDatabase up to AmazonS3. Basically, everything that you could also configure in adf directly.
It could look like this:
<SqlServer ConnectionString=“myconnstring“ Name=“MyOnPremSQL“/>

2. Datasets
Each Dataset will represent a file, a table or something similar. They are again connection-type specific and will always reference a linked service and also contain the structure of the set. The closest equivalent in Biml is probably a <Table>.

Example:

     <SqlServer LinkedServiceName="MyOnPremSQL" Name=" OnPrem_Person_Person" Table="Person">
	<Structure>
		<Column Name="LastName" DataType="String"/>
		<Column Name="FirstName" DataType="String"/>
	</Structure>
     </SqlServer>

3. Pipelines
Biml currently supports custom activities as well as the copy activity out of the box. Each pipeline will have a name and contain one or more activities. The copy activity will then take one source and one sink, which could look like this:

<Pipelines>
	<Pipeline Name="MyPipeline">
             <Activities>
		<Copy>
                      <SqlServerSource DatasetName="OnPrem_Person_Person"/>
                      <AzureSqlDatabaseSink DatasetName="ASDB_Person_Person"/>
                </Copy>
             </Activities>
	</Pipeline>
</Pipelines>

4. Triggers
Last, but not least, you can (but don’t have to, in case you will only run your pipeline manually) define scheduled and tumbling triggers.

A full configuration could look like this:

<#@ template tier="2" language="VB" optionexplicit="False"#>
<# 
Dim sourceConnection as AstDbConnectionNode = RootNode.DbConnections("Source")
Dim importResults = sourceConnection.GetDatabaseSchema()
#>
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
	<DataFactories>
		<DataFactory Name="Azure Loader">
			<LinkedServices>
				<AzureSqlDatabase Name="SourceData" ConnectionString="YOURCONNSTRING"></AzureSqlDatabase>
				<AzureDataLake Name="AzureDataLake" Url="https://YOURADLNAME.azuredatalakestore.net/webhdfs/v1" ResourceGroup="YOURRG" SubscriptionId="YOURSUB" Tenant="YOURTENANT"/>
			</LinkedServices>
			<Datasets>
				<# for each tbl in importResults.TableNodes #>
				<AzureSqlDatabase Name="ASQLDBSRC_<#=tbl.Name#>" LinkedServiceName="SourceData"  Table="<#=tbl.Schema.Name#>.<#=tbl.Name#>">
					<Structure>
						<# for each  column in tbl.Columns  #>
						<Column Name="<#=column.Name#>" DataType="<#=column.DataType#>" />
						<# next #>
                    </Structure>
				</AzureSqlDatabase>				
				<AzureDataLake Name="ADLS_<#=tbl.Name#>" LinkedServiceName="AzureDataLake" FolderPath="adf/<#=tbl.Name#>/" File="<#=tbl.Name#>.csv.gz">
					<TextFormat ColumnDelimiter="," RowDelimiter="\n" />
					<Compression CompressionType="GZip" CompressionLevel="Optimal" />
				</AzureDataLake>
					<# next #>
            </Datasets>
			<Pipelines>
				<Pipeline Name="LoadDataLake">
					<Activities>
						<# for each tbl in importResults.TableNodes #>
						<Copy Name="Copy <#=tbl.Name #>">
							<AzureSqlDatabaseSource DatasetName="ASQLDBSRC_<#=tbl.Name#>"/>
							<AzureDataLakeSink DatasetName="ADLS_<#=tbl.Name#>" CopyBehavior="PreserveHierarchy" />
						</Copy>
						<#next #>
                    </Activities>
                </Pipeline>
		        </Pipelines>
        </DataFactory>
    </DataFactories>
</Biml>

Of course, you can also have multiple activities and linked services in the same factory. Just add the required objects and tags as necessary.
If you build the solution, your adf JSON files will sit in output\DataFactory.
Deployment at this point runs through PowerShell – once you’ve build the solution and deployed it through PowerShell, all objects will be available (and scheduled, if you added triggers) in your Data Factory.

Any questions or comments? We’d love to hear from you at biml@solisyon.de!

Further Information on Biml in English can be found on our English Biml Page.

Happy Biml’ing!

Facebook