Loading Flat Files with Biml and XML Format Definitions

Loading Flat Files with Biml and XML Format Definitions

Diesen Artikel finden Sie hier auch in deutscher Sprache.

Alright, so far we had a lot of fun with Biml – but we were always loading data from other databases. What if we want to load from a flat file? Can Biml actually do that? OF COURSE!

In this example, we will look at some very basic flat files, generated out of AdventureWorks2014. We will by far not cover all datatypes but you should end up with a good impression of what’s needed to define and load flat files in Biml.

To make this more flexible, we will not only generate the data out of AdventureWorks but also generate the format (in XML) from there. We’ll achieve that with bcp.

Create a directory c:\flatfiles including 2 subfolders – data and format and place the following code into a batch file called “generate_csv.bat” in that directory:

bcp adventureworks2014.person.personphone out Data\PersonPhone.csv -T -t, -c
bcp adventureworks2014.person.emailaddress out Data\PersonEmailaddress.csv -T -t, -c
bcp adventureworks2014.sales.salesorderdetail out Data\salesorderdetail.csv -T -t, -c
bcp adventureworks2014.sales.salesorderheader out Data\salesorderheader.csv -T -t, -c

bcp adventureworks2014.person.personphone format nul -f Format\PersonPhone.xml -T -t, -c -x
bcp adventureworks2014.person.emailaddress format nul -f Format\PersonEmailaddress.xml -T -t, -c -x
bcp adventureworks2014.sales.salesorderdetail format nul -f Format\salesorderdetail.xml -T -t, -c -x
bcp adventureworks2014.sales.salesorderheader format nul -f Format\salesorderheader.xml -T -t, -c -x

Run the batch file. You should now see 4 CSV-files with data in your data-folder as well as the 4 format definitions in the format folder. Feel free to play with that batch file to add more or alternate on the tables you’re exporting.
Also, as we still plan to import those CSV data into SQL Server, create a new database “MyBiml_Flatfiles”.
To finalize our preparations, let’s create a Biml file to point to that database:

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
        	<OleDbConnection Name="Target" ConnectionString="Data Source=localhost;initial catalog=MyBiml_Flatfiles;provider=SQLNCLI11;integrated security=SSPI"></OleDbConnection>
			<Database Name="MyBiml_Flatfiles" ConnectionName="Target"></Database>
			<Schema Name="dbo" DatabaseName="MyBiml_Flatfiles"></Schema>

We now have some data sources set up as well as a target database. Let’s make them talk to each other.
To make use of flat files, Biml will require one or more flat file formats as well as the corresponding flat file connections. A flat file connection is nothing but a link between a flat file format and the path to a flat file. While in real life, you might rather want to loop across multiple files using the same format, we’ll keep it simple for now and assume that we’ll have one file per format (which is the case here anyway due to our AdventureWorks sample data).
For reasons of readability and also the ability to easily reuse our code, we’ll make use of a new extension method called GetFlatFileFormatfromXML. If you have not read our post on creating your own extension methods, you may want to do that first.

This will also allow us to keep the Biml file itself clean and tidy:

<#@ template language="VB" tier="2" #>
<#@ Import namespace="System.io" #>
<#@ code file="../code/FlatFileExtensions.vb" #>
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<# for each XmlFile as String  in Directory.Getfiles(basepath + "\Format") #>
	<#= XmlFile.GetFlatFileFormatfromXML().GetBiml() #>
<# next #>

Effectively, we will loop through our format directory and create a flat file format from each file using the mentioned extension method.

Our Biml file references a code file “FlatFileExtensions.vb” which will obviously do all the magic for us, so let’s take a look at that file:

Imports Varigence.Biml.Extensions
Imports Varigence.Languages.Biml
Imports Varigence.Languages.Biml.FileFormat
Imports Varigence.Languages.Biml.Table
Imports System.IO
Imports System.Xml
Imports System.Data 
Imports System.Runtime.CompilerServices

Module FlatFileExtension
	Public Basepath As String   = "C:\flatfiles"
	Public Locale As Varigence.Languages.Biml.Cube.Language = Varigence.Languages.Biml.Cube.Language.Lcid1033
	Public Function ToAstTableNode(FlatFile As AstFlatFileFormatNode, Schema As AstSchemaNode) As AstTableNode
		Dim BimlTable As New AstTableNode(Nothing)
		BimlTable.Name = "FF_" + FlatFile.name
		BimlTable.Schema = schema
		For Each flatFileColumn As astflatfilecolumnnode In FlatFile.Columns
			Dim tableColumn As New AstTableColumnNode(Nothing)
			tableColumn.Name = flatFileColumn.Name
			tableColumn.DataType = flatFileColumn.DataType
        	tableColumn.Length = flatFileColumn.Length
        	tableColumn.Precision = flatFileColumn.Precision
        	tableColumn.Scale = flatFileColumn.Scale
        	tableColumn.CodePage = flatFileColumn.CodePage
		Dim annotation As New AstAnnotationNode(Nothing)
		annotation.Tag = "ConnName"
		annotation.Text = FlatFile.Name
 		Return BimlTable
	End Function	
	Public Function ToFlatfileConnection (FlatFileFormat As AstFlatFileFormatNode) As Connection.AstFlatFileConnectionNode
	Dim Connection As New Connection.AstFlatFileConnectionNode(Nothing)
	connection.Name = FlatFileFormat.Name
	connection.FileFormat = FlatFileFormat
	connection.FilePath= Basepath & "\Data\" & FlatFileFormat.name & ".csv"
	Return Connection
	End Function
	Public Function GetFlatFileFormatfromXML (XmlFile As String) As AstFlatFileFormatNode
	Dim FlatFileFormat As New AstFlatFileFormatNode(Nothing)
	Dim xmldoc As New XmlDocument	
	Dim records As XmlNodeList = xmldoc.GetElementsByTagName("RECORD").item(0).childnodes
	Dim rows As XmlNodeList = xmldoc.GetElementsByTagName("ROW").item(0).childnodes 
	Dim row As xmlnode
 	FlatFileFormat.Locale = Locale
	FlatFileFormat.Name =   path.GetFileNameWithoutExtension(XmlFile)
	FlatFileFormat.RowDelimiter =  ConvertDelimiter(records.item(records.count-1 ).attributes("TERMINATOR").value)
	FlatFileFormat.ColumnNamesInFirstDataRow = False
	FlatFileFormat.isunicode = False
	For Each record As xmlnode In records
		row = rows.item(record.attributes("ID").value-1 ) 
		Dim DataType As String = row.attributes("xsi:type").value
		Dim DatatypeID As Integer = ConvertDatatype(datatype)
		Dim Column As New AstFlatFileColumnNode(Nothing)
		column.name = row.attributes("NAME").value
		column.Delimiter = ConvertDelimiter(record.attributes("TERMINATOR").value )
		If DatatypeID= Nothing Then
			' By default, we will make this a string!
			column.DataType = DbType.String
			column.DataType = DatatypeID
		End If
		If DatatypeID= Nothing Then
			' By default, we want out strings to be 1000 Characters
			column.Length = 1000
		ElseIf DatatypeID = dbtype.AnsiString Or DatatypeID = DbType.String Then 
			column.Length = record.attributes("MAX_LENGTH").value 
		End If
		If ConvertDatatype(datatype) = dbtype.VarNumeric Then
			column.Precision = 32
			column.Scale = 16
		End If
		If DatatypeID= Nothing  Then 
			Dim columnannotation As New AstAnnotationNode(Nothing)
			columnannotation.Tag = "Original Datatype"
			columnannotation.Text = datatype
		End If
	Return FlatFileFormat
	End Function
	Public Function ConvertDatatype (CSVType As String) As String
		Select Case CSVType 
			Case  "SQLINT" 
				Return dbtype.Int32
				Return dbtype.int16
			Case  "SQLVARCHAR" 
				Return dbtype.AnsiString
				Return dbtype.DateTime	
			Case "SQLMONEY"
				Return dbtype.Currency
				Return dbtype.Double
				Return DbType.String
				' GUIDs should be interpreted as strings
				Return  DbType.String
			Case Else 
				Return Nothing
		End Select
	End Function
	Public Function ConvertDelimiter (CSVDelimiter As String) As String
		Select Case CSVDelimiter 
			Case  "\r\n" 
				Return "CRLF"	
			Case Else 
				Return CSVDelimiter
		End Select
	End Function
End Module

This code file actually holds way more, than just the extension method:

– 2 variables, one to define the path to our flatfiles as well as to define the locale of these flatfiles (you may need to play with this depending on your local system settings)
– 2 additional extension methods “ToAstTableNode“ and “ToFlatfileConnection” which we’ll both use later
– 2 converter functions “ConvertDatatype“ and “ConvertDelimiter“ – as bcp format files describe data types and delimiters a bit different than Biml would expect it, we need to map them. This also allows us to do some general definitions like, in this case, have all GUID columns rather come in as strings.

Due to the way, that xml format files are set up, we will actually need to load 2 XmlNodeLists (records and rows) and then loop through them. Our extension method will create a new AstFlatFileFormatNode which defines the flat file format, assign a name, text qualifier etc. to it and then loop through the columns from the format definition and add them.

Beware, that some of the settings we’re using are static and might not be ideal for your specific use case!

The combination of our Biml file and this method, which will return an AstFlatFileFormatNode, results in a “File Format” entry in Biml for each xml file in our format directory.

To achieve useful results out of this, we need a table definition and a connection for each of them, which we can easily achieve using the other two extension methods in the code file. Therefore, we can just loop through the flat file formats in another Biml file:

<#@ template language="VB" tier="3" #>
<#@ code file="../code/FlatFileExtensions.vb" #>
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
	<# for each tbl as AstFlatFileFormatNode in rootnode.flatfileformats #>
		 <#= tbl.ToAstTableNode(rootnode.schemas(0)).GetBiml #>
	<# next #>
	<# for each tbl as AstFlatFileFormatNode in rootnode.flatfileformats #>
		<#= tbl.ToFlatfileConnection.GetBiml #>
	 <# next #>

To be able to import data into SQL Server, we will actually need to create the target tables in our database – but there is no difference to previous exercises when loading data from SQL Server.
The code to create those tables should probably look pretty familiar:

<#@ template tier="4" language="VB" #>
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
          <Package Name="01_CreateStaging" ConstraintMode="Parallel" ProtectionLevel="EncryptSensitiveWithUserKey">
                <#	for each table as asttablenode in RootNode.Tables #>				   
	                <ExecuteSQL Name="Create <#=table.Name#>" ConnectionName="Target">
					<#  next #>					 

Same applies for the code to actually populate those tables:

<#@ template tier="5" language="VB" #>
<#@ code file="../code/FlatFileExtensions.vb" #>
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
		<Package Name="02_Populate Tables" ConstraintMode="Parallel"  ProtectionLevel="DontSaveSensitive" DelayValidation="true">
			<# for each table as asttablenode in RootNode.Tables  #>
				<Container Name="Transfer <#= table.gettag("ConnName")#>" ConstraintMode="Linear">
						<ExecuteSQL Name="Truncate" ConnectionName="Target">
							<DirectInput>truncate table <#=table.ScopedName#></DirectInput>
						<Dataflow Name="Load <#=table.gettag("ConnName")#>">
								<FlatFileSource ConnectionName="<#= table.gettag("ConnName") #>" Name="<#= table.name #>" LocaleId="<#= locale #>"/>
								<OleDbDestination Name="Dest" ConnectionName="Target">
									<TableOutput TableName="<#=table.ScopedName#>" />
			<# next #>

The only difference to what you’ve seen in previous posts is the FlatFileSource. It requires a ConnectionName and a Name for the task itself – to play it safe, we also provide a locale for it as well to avoid a locale mismatch.
This is it – you have now defined flat file sources in Biml, created the target tables and loaded the data from those flat files into the tables.

Any questions or comments? We’d love to hear from you at biml@solisyon.de!

Further Information on Biml in English can be found on our English Biml Page.

Happy Biml’ing!