Diesen Artikel finden Sie hier auch in deutscher Sprache.
Have you ever run into the situation, where you logically could run all of your load tasks in SSIS in parallel but the system (source or target) would not support it physically?
Then you probably ended up organizing your dataloads into multiple groups, aiming for ideally equally sized containers.
Doing that manually is a time consuming issue – and it never ends as the timings of your data loads will constantly change.
This post is about showing you, how you can easily you can automate the assignment of your tables to containers using simple T-SQL as well as about automating the load package using Biml.
It is not about the difference in performance between load patterns or how to build staging packages with Biml – there are many other posts and tutorials out there about that, so we’ll strictly focus on how we can optimize our SSIS workload.
First, create a new database (ours is called “Performance”, but you can choose any name that suits your needs; just be sure to adjust the connection string in the Biml code later on).
Within that database, we’ll need some meta tables which we will store in a new schema:
CREATE SCHEMA meta GO CREATE TABLE [meta].[Container] ([ContainerID] [INT] NULL, [TotalCost] [FLOAT] NULL, [Tables] [INT] NULL ) ON [PRIMARY] GO ALTER TABLE [meta].[Container] ADD CONSTRAINT [DF_Container_TotalCost] DEFAULT((0)) FOR [TotalCost] GO ALTER TABLE [meta].[Container] ADD CONSTRAINT [DF_Container_Tables] DEFAULT((0)) FOR [Tables] GO CREATE TABLE [meta].[Tables] ([TableName] [NVARCHAR](50) NULL, [Cost] [FLOAT] NULL, [Container] [INT] NULL ) ON [PRIMARY]
Let’s also populate our meta.tables table with some sample data (this is just fictional data based of AdventureWorks):
INSERT INTO [meta].[Tables] ([TableName] ,[Cost] ,[Container]) VALUES ('Person.Person',1,0) INSERT INTO [meta].[Tables] ([TableName] ,[Cost] ,[Container]) VALUES ('Person.PersonPhone',1,0) INSERT INTO [meta].[Tables] ([TableName] ,[Cost] ,[Container]) VALUES ('Sales.SalesOrderHeader',10,0) INSERT INTO [meta].[Tables] ([TableName] ,[Cost] ,[Container]) VALUES ('Sales.SalesOrderDetail',20,0)
The table meta.containers could technically also be a temporary table. We’ve decided against that so you can see what’s happening behind the scenes.
Let’s focus on the meta.tables table for now. It has three columns:
– TableName – guess what we’ll store in there
– Container – this one will hold the information, which container we want this table to be loaded it, which will be automatically populated by our stored procedure
– Cost – this column will hold the load cost of this specific table. In our opinion, this should ideally be the average time it took SSIS to load this table in the recent past. If you don’t have that information available, it might as well something like the size of this table in either Gigabytes or Rows. The more accurate this column is, the better your results will be.
As pointed out, we will have a stored procedure (or actually, multiple stored procedures) do the magic for us, so let’s also create these:
SP_CreateContainers will simply populate our meta.container table with one row per container:
CREATE PROCEDURE [dbo].[SP_CreateContainers](@MaxContainers INT) AS BEGIN DECLARE @CurrentContainer INT= 1 TRUNCATE TABLE Meta.Container WHILE @CurrentContainer <= @MaxContainers BEGIN INSERT INTO Meta.Container (ContainerId) VALUES (@CurrentContainer) SET @CurrentContainer = @CurrentContainer + 1 END END
SP_UpdateContainerStats will update the statistics of each container (how many tables are in there and what is the total of their cost):
CREATE PROCEDURE [dbo].[SP_UpdateContainerStats] AS BEGIN UPDATE a SET tables = cnt, TotalCost = Cost FROM meta.container a INNER JOIN (SELECT container, COUNT(*) AS CNT, SUM(Cost) AS Cost FROM meta.tables GROUP BY container) b ON a.containerid = b.container END GO
SP_SetContainers is the one doing the actual work. It will:
– Reset the container assignment currently stored in meta.tables
– Call SP_CreateContainers and have it create the number of containers that was passed through using the only parameter: MaxContainers
– Open a cursor that reads all tables from meta.tables, sorted from highest to lowest cost
– For each table, check which container currently has the lowest total cost, put it in that container, update container statistics
Once that is done (for 2 or for 2000 tables), all our tables will have been assigned to a container and their distribution should be fairly equal:
CREATE PROCEDURE [dbo].[SP_SetContainers](@MaxContainers INT) AS BEGIN UPDATE meta.tables SET container = 0 exec [SP_CreateContainers] @maxcontainers Declare @TableName nvarchar(50) Declare @Cost float declare @NextContainer Bigint DECLARE tbl_Cursor CURSOR FOR SELECT TableName,Cost FROM meta.tables order by Cost desc OPEN tbl_Cursor FETCH NEXT FROM tbl_Cursor INTO @TableName,@Cost WHILE @@FETCH_STATUS = 0 BEGIN select @NextContainer= min(containerid) from ( SELECT [ContainerID] ,[TotalCost] ,[Tables],min([TotalCost]) over() MinCost FROM [meta].[Container]) a where [TotalCost] = MinCost update meta.tables set container = @NextContainer where TableName = @TableName exec SP_UpdateContainerStats FETCH NEXT FROM tbl_Cursor INTO @TableName,@Cost END CLOSE tbl_Cursor DEALLOCATE tbl_Cursor exec SP_UpdateContainerStats END
But we will still need our SSIS packages to populate those tables.
Of course, we want to automate the package creation with Biml, so let’s take a look at that.
In C#:
<#@ import namespace="System.Data.SqlClient" #> <#@ import namespace="System.Data" #> <# int MaxContainers = 3; string ConnString = "Data source=localhost; Database=Performance; Integrated Security=SSPI"; SqlConnection conn = new SqlConnection(ConnString); SqlCommand cmd = new SqlCommand("exec [dbo].[SP_SetContainers] " + MaxContainers, conn); conn.Open(); cmd.ExecuteNonQuery(); conn.Close(); #> <Biml xmlns="http://schemas.varigence.com/biml.xsd"> <Packages> <Package Name="Load_Containers" ConstraintMode="Parallel"> <Tasks> <# for (int Container = 1; Container <= MaxContainers; Container++) { #> <Container Name="Load Container <#=Container #>" ConstraintMode="Linear"><Tasks> <# foreach (DataRow row in ExternalDataAccess.GetDataTable("Provider=SQLNCLI11;" + ConnString,"select * from meta.tables where container = " + Container).Rows) { #> <Container Name="Load <#= row["TableName"] #>" ConstraintMode="Linear"> <Tasks> <!-- Your dataload pattern goes here! --> </Tasks> </Container> <# } #> </Tasks></Container> <# } #> </Tasks> </Package> </Packages> </Biml>
Or in VB:
<#@ template language="VB" optionexplicit="False" #> <#@ import namespace="System.Data.SqlClient" #> <#@ import namespace="System.Data" #> <# MaxContainers = 3 ConnString = "Data source=localhost; Database=Performance; Integrated Security=SSPI" Dim conn as new SqlConnection(ConnString) Dim cmd = new SqlCommand("exec [dbo].[SP_SetContainers] " & maxcontainers, conn) conn.Open() cmd.ExecuteNonQuery() conn.Close() #> <Biml xmlns="http://schemas.varigence.com/biml.xsd"> <Packages> <Package Name="Load_Containers" ConstraintMode="Parallel"> <Tasks> <# for Container = 1 to MaxContainers #> <Container Name="Load Container <#=Container #>" ConstraintMode="Linear"><Tasks> <# for each row in ExternalDataAccess.GetDataTable("Provider=SQLNCLI11;" + ConnString,"select * from meta.tables where container = " & container).Rows #> <Container Name="Load <#= Row("TableName")#>" ConstraintMode="Linear"> <Tasks> <!-- Your dataload pattern goes here! --> </Tasks> </Container> <# next #> </Tasks></Container> <# next #> </Tasks> </Package> </Packages> </Biml>
As you can see, our Biml code will hold a variable that defines how many containers we actually want, will run our stored procedure with that parameter and will eventually build a package with the defined number of containers.
Each container will hold the tables that ended up in the respective container. As mentioned before, this post is not about load patterns in general, so the actual code to populate the tables is missing.
Thats’s it!
Any questions or comments? We’d love to hear from you at !
Further Information on Biml in English can be found on our English Biml Page.
Happy Biml’ing!