How often do you use raw files in your packages?
Last week, I used them for the first time when trying to improve performance in a slow running package containing a data flow with large lookups.
What’s wrong with that package?
The data flow in question was very simple. An OLE DB source reading data from a staging table, a series of lookup transformations to add columns to the rows in transit, an additional lookup to check if the row existed , a conditional split, and 2 destinations, one for new rows and one of rows to be updated. I quickly came to the conclusion that the performance issue was caused by the large size of the 2 of the lookup’s data sets. Let’s call them the evil lookups. When the package ran, I noticed that it ran decently quick until it reached the 20GB of memory mark and the last lookup was still half way caching data, at that point things started to crawl. The picture below shows the layout of the data flow.