January 30, 2010

SSIS Fact and Myth: Dynamic Selects in SSIS Packages

Fact: The metadata of the SSIS dataflow task cannot be changed at run time (e.g. changing number or name of columns in source components) *

Myth: You always have to use dataflow tasks when moving data with SSIS.


I often see people hitting roadblocks when using SSIS in scenarios where the source and/or target structures needs to change dynamically at run time. You can see an example of that in this post

Based on my experience, some of these scenarios - including the link above -, require minimal or no data transformation, hence making the usage of dataflow tasks a less appealing story.

Let's see an example.

Scenario:
You need to build a process to export data out of a SQL Server Database into flat files. The process should be able to perform a 'data dump' given the table name(s) or given a query(ies).


Solution:
These requirements can be addressed by using few SSIS control flow tasks:




1. An Execute SQL Task to get the detail of the data dump(s): Source DB instance, destination flat file name and path, source table/query and so on. The execute SLQ task runs a query that retrieves a data set that looks like this:




-- Just a collection of hardcoded values to give the idea
-- of using a data driven approach. I would probably create
-- a table for this.
Select 'AdventureWorks2008' DB, 'Sales.Currency' AS TableNameOrQuery, 'C:\Temp\' FilePath, 'table' As ExportType, 'Currency.txt' OutPutFileName, 'MARINERLAPTOP24\SQL2008' SourceServerName
UNION ALL
Select 'AdventureWorks2008' DB, 'Sales.Store' AS TableNameOrQuery, 'C:\Temp\' FilePath, 'table' As ExportType, 'Sales.txt' OutPutFileName, 'MARINERLAPTOP24\SQL2008' SourceServerName
UNION ALL
Select 'AdventureWorks2008' DB, 'Production.Product' AS TableNameOrQuery, 'C:\Temp\' FilePath, 'table' As ExportType, 'Product.txt' OutPutFileName, 'MARINERLAPTOP24\SQL2008' SourceServerName
UNION ALL
Select 'AdventureWorks2008' DB, '"Select CurrencyCode, Name from AdventureWorks2008.Sales.Currency Where Name <> ''Afghani''"' AS TableNameOrQuery, 'C:\Temp\' FilePath, 'query' As ExportType, 'MyQuery_Currency.txt' OutPutFileName, 'MARINERLAPTOP24\SQL2008' SourceServerName


2. A ForEachLoop container to loop through each of the data dump(s).


3. A Execute process tasks to run BCP utility. You can learn more about BCP in Book on Line



Final thoughts:

  • Although you could argue you could build the same process without using SSIS and I still think SSIS works great here.
  • While BCP is a SQL Server utility, you could use an equivalent utility/approach when dealing with other database engines. E.g. in Oracle you could use SQLPlus with Spool commands.
  • Dataflows are not the only option to move data in SSIS.

* While built-in DataFlow task metadata cannot be altered at run time, with some custom coding, you could generate the packages programatically to achive a similar effect. For more information, see BOL topic

Download the package from here:


Let me know if you have a different approach.

Raf.

6 comments:

  1. Rafel,
    Thanks for your helping post.

    But my requirement is little bit different.I have one mixup query (using lots of temp table) which generate three resultset (or can say three tables). How i can export data into three excel sheets or three flat files.

    Thanks,
    Ramakant

    ReplyDelete
  2. Hi Rafael,
    Nice post, rightly said that in SSIS DFT is not the only way of doing ETL. However doing the ETL using Control Flow tasks does not leverage the SSIS advantages.
    Some time back I had craeted a package to do dynamic ETL where the input file could have dynamic columns, different delimiters and nested data(kind of pseudo XML). I wrote a Script task to iterate through this file row by row and populate it to a Staging table.
    Next had some config tables(not SSIS Config) to set up the requirements from where I would generate the Dynamic SQL to do the ETL in EST.
    Even various validatino was done on the staging table based on specific requirement which was again saved in another configuration table.
    This worked fine for me.
    But this kind of thing actually does not need SSIS it could well be done in .NET. I used SSIS here because SSIS gave me a framework.
    Let me know ur comments. a pretty long comment ;)

    ReplyDelete
  3. Sudeep,
    I could not agree more with you. I think the data flow has earned its own place in ETL processes and I would not force any ETL process to live within the confines of the control flow just to get around of the metedata binding constraints of the dataflow.

    ReplyDelete
  4. Hi,

    I have a similar situation and will appreciate your help in that. We are storing source queries / stored procedure and mapping between source and destination column in a database and trying to create DFT which will execute the source queries and then map columns based on the mapping stored in the database. The number of source columns can vary in different sources. I was able to execute different queries when the source columns name were not changing but not able to find solution when number of source columns or their name are changing.
    Can you suggest any solution for this problem

    Thanks

    SG

    ReplyDelete
  5. shalabh,

    You you describe is exactly what I said you cannot do in SSIS. Please re-read the first 2 lines of the post and the note at the bottom. DFT cannot be altered at run time. Either you use an approach like the one I described or you need to generate packages dynamically, which requires custom coding.

    ReplyDelete
  6. There is a third party tool that you can use to dynamically change your DFT column mappings at run time.

    http://www.cozyroc.com/ssis/data-flow-task

    ReplyDelete

I will love to hear your feedback and comments. Thanks.