Export to splitted Flatfiles with Biml

Export to splitted Flatfiles with Biml

Diesen Artikel finden Sie hier auch in deutscher Sprache.

This post uses objects and annotations from our previous post "Export to Flatfiles with Biml". Please use the code from that post as a prerequisit.

In the previous post, we’ve exported the whole database to flatfiles with one file per table. But what if we want to split large tables into multiple files? One easy way to do that would be to retrieve the data using OFFSET-FETCH NEXT from SQL Server.

To do that, we need to determine first how many rows our table actually has and calculate the number of files we will have to generate. As usual, there are multiple different ways to do that. Here, we’ll just build a SQL statement in a Biml file (which we’ll later use with CallBimlScript) in which we’ll do a row count on the table which we’ll divide by the maximum number of rows we want per file (notice, how that file returns just a SQL Statement instead of Biml Code!):

<#@ template language="VB" #>
<#@ property name="tablename" type="string" #>
<#@ property name="maxrows" type="string" #>
SELECT(Rowid - 1) * <#= MaxRows #>   AS StartRow
    					, RIGHT(LTRIM(STR(100000 + Rowid)),5) AS FileID
							FROM
							(
							    SELECT CAST(COUNT(*) / <#= MaxRows #>  AS INT) + CASE
																	WHEN COUNT(*) / <#= MaxRows #>.00  = CAST(COUNT(*) / <#= MaxRows #> AS INT)
																	THEN 0
																	ELSE 1
																 END AS FileCount
							    FROM <#= tablename #>
							) AS FileCount
							INNER JOIN
							(
							    SELECT ROW_NUMBER() OVER(ORDER BY id) AS RowID
							    FROM sysobjects 
							) AS RowHelper
							ON RowHelper.RowID &lt;= Filecount

Next is another file to be called using CallBimlScript (which will actually call the first file, so this is another example of calling a Biml script within another callee). As input parameters, this file will take a flat file format as well as the maximum number of rows. As input parameters. It will create a new package for the file format provided, call the previous script to get the list of required files, read it into a dataset and then loop through that set. Using 2 variables evaluating as expressions, we will populate one file at a time using a dynamic select and file name:

<#@ template language="VB" designerbimlpath="Biml/Packages" #>
<#@ property name="FF" type="AstFlatFileFormatNode" #>
<#@ property name="MaxRows" type="string" #>
<Package Name="Export_Split <#=ff.name #>" ConstraintMode="Linear">
    <Variables>
        <Variable DataType="Object" Name="DS" />
        <Variable DataType="String" Name="FileID">00001</Variable>
        <Variable EvaluateAsExpression="true" DataType="String" Name="Filename">"C:\\Flatfiles\\Output\\Splitted\\<#= ff #>_" +  @[User::FileID] + ".csv"</Variable>
        <Variable EvaluateAsExpression="true" DataType="String" Name="SELECT">"SELECT <#= ff.gettag("Sourcecolumns") #> FROM <#= ff.gettag("Sourcetable") #> ORDER BY <#= ff.gettag("Keycolumns") #> OFFSET " + @[User::StartRow] + " ROWS FETCH NEXT <#= MaxRows #> ROWS ONLY"</Variable>
        <Variable DataType="String" Name="StartRow">0</Variable>
    </Variables>
    <Connections>
        <Connection ConnectionName="<#= ff #>">
            <Expressions>
                <Expression ExternalProperty="ConnectionString">@[User::Filename]</Expression>
            </Expressions>
        </Connection>
    </Connections>
    <Tasks>
        <ExecuteSQL Name="Read Dataset" ConnectionName="Source" ResultSet="Full">
            <DirectInput><#= CallBimlScript("11_ExportSplitted_SELECT.biml",ff.gettag("Sourcetable"),maxrows) #></DirectInput>
            <Results>
                <Result Name="0" VariableName="User.DS" />
            </Results>
        </ExecuteSQL>
        <ForEachAdoLoop Name="For Each Loop" SourceVariableName="User.DS">
            <Tasks>
                <Dataflow Name="Export <#= ff #>">
                    <Transformations>
                        <OleDbSource Name="SRC <#= ff.name #>" ConnectionName="Source">
                            <VariableInput VariableName="User.SELECT" />
                        </OleDbSource>
                        <FlatFileDestination ConnectionName="<#= FF #>" Overwrite="true" Name="DEST <#= ff #>"/>
                    </Transformations>
                </Dataflow>
            </Tasks>
            <VariableMappings>
                <VariableMapping Name="0" VariableName="User.StartRow" />
                <VariableMapping Name="1" VariableName="User.FileID" />
            </VariableMappings>
        </ForEachAdoLoop>
    </Tasks>
</Package>

Last but not least, we build a Biml file that will actually call this script for one example table, splitting it at 2500 rows per file:

<#@ template language="VB" tier="3" #>
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
	<Packages>
		<#= CallBimlScript("10_ExportSplitted_Packages.biml",rootnode.fileformats("Person_Person"),2500) #>
    </Packages>
</Biml>

With these settings and a standard AdventureWorks2014 database, where Person.Person holds just under 20.000 Records, this should result in 8 output files.

Any questions or comments? We’d love to hear from you at biml@solisyon.de!

Further Information on Biml in English can be found on our English Biml Page.

Happy Biml’ing!

Facebook