Pages

Showing posts with label Netezza. Show all posts
Showing posts with label Netezza. Show all posts

February 9, 2011

SSIS and Netezza: OLE DB Connection Bug in Multi-Threaded Environments

This is a long overdue post. Several months ago, while I was working in a SQL Server to Netezza data migration, I ran into some issues when trying to to load data into Netezza tables via SSIS. First, it was the OLE DB destination acting up and not keeping its settings. While that was a very annoying issue, there was a work around  - you can read about it in my other post.
Later, I ran into what appeared to be a random connectivity issue using Netezza OLE DB provider. The packages would unit test just fine, but some of them would fail when ran concurrently from a ‘parent’ package.
nzoledb_salasr2428 Master Pkg
Master Pkg repro 1












Each time the parent package ran, the result was the same: The first x numbers of package succeeded while the remaining ones failed. Packages would fail randomly as shown in the 2 examples in the picture above. The error messages generated by SSIS and Netezza were not very helpful:

SSIS Error Code DTS_E_OLEDBERROR.  An OLE DB error has occurred. Error code: 0x80004005. 
SSIS Error Code DTS_E_PROCESSINPUTFAILED.  The ProcessInput method on component "OLE_DST NZ Table" (1776) failed with error code 0xC0209029. The identified component returned an error from the ProcessInput method. The error is specific to the component, but the error is fatal and will cause the Data Flow task to stop running.  There may be error messages posted before this with more information about the failure. 

After a lot of troubleshooting I was able to make some observations:
  • The first set of packages, as given by MaxConcurrentExecutables property in parent package, completed successfully. If I ran it in my laptop, with 2 cores, 2 packages would succeed. If I ran it in the server, a 16 core machine, then the first 16 or so packages would succeeded.
  • Watching the activity in Netezza admin console revealed that the first x connections remained open for extra 1-3 minutes after the first set of packages completed.
At that point, I decided to open a ticket with Netezza support. Things were not easy as they did not have a lot of experience with SSIS and their first attempts to reproduce the issue failed. They tried to make the point that I should run one package at the time, by setting MaxConcurrentExecutables=1 in the master package. The thing is that a) I was not willing to take that as an acceptable solution – waste of resources in multi-processor machines. b) the issue was even worse when using that setting.
To make the story short, after 5 months and exchanging 122 emails with Netezza support, they finally were able to reproduce the issue and acknowledge there was an problem with their OLE DB driver. Their final email says:

The issue is in our OLEDB driver relating to error-handling in a multi-threaded environment.The fix will be in patch 5.0.10P8. The release date will be sometime in mid to late December

I have not been able to verify if the issue was actually fixed because our internal operations team does not have a date for applying that patch. And if you are wondering what did I do to get around the problem, I ended up using a script task to pause the execution of.packages for 3 minutes, long enough for the open connections to close, before the next set of packages got executed. This is not by any mean an approach I would put into production, but was good enough for me since this was a 1-time data migration.

Master Pkg with Wait logic

Here are the details of my environment:
• Windows Server 2003 R2 SP2 Enterprise x64 edition
• 64 GB RAM - 16 cores
• SQL Server 2005 SP2 Enterprise edition (64 bit)
• Netezza OLE DB Diver 5.0.8 (but we got same error with 5.0.6 version)

I am wondering how many people is using SSIS to load data into Netezza and running into similar issues?

June 23, 2010

SSIS and Netezza: Loading data using OLE DB Destination

In the first post I provided details about the environment I was working in and the tools I was using when loading data into Netezza tables via SSIS. In the second and third post I showed an odd behavior (bug?) in the OLE DB destination when working with Netezza tables. In this forth post, I will share details about how I am configuring the OLE DB Destinations and connection string when pointing to Netezza tables

After installing the Netezza OLE DB provider (v. 5.0.8), I tested the connection to Netezza from outside of SSIS. For that, I used a trick I learned a while back when troubleshooting connectivity issues with Pervasive PSQL - another no so popular database engine. I created a .udl (Universal Data Link) file that allowed me to look over all the Netezza OLE DB provider properties and test the connection. You can find more about how I did it here: http://msdn.microsoft.com/en-us/library/e38h511e(VS.71).aspx

This how the udl file looks like:





TIP: The nice thing about using an /udl file is that you can open the file in a text editor to see the connection string.



  
The Netezza connection string looks like:

Provider=NZOLEDB;Password=MyPassword;User ID=MyID;Data Source=NZipaddress;Initial Catalog=MyDB;Persist Security Info=True;

Once I confirmed that the driver and the permissions of the Netezza user I was using were all squared out, I was ready to create my first package. Since this was a straight copy of data from SQL Server to Netezza, the logic in the package was nothing really exciting. So, let me jump directly to the settings of the OLE DB destination component.
Netezza OLE DB destination settings
I have found the best performance and the less trouble when using the following settings: 

  • AccessMode = OpenRowSet Using FastLoad 

  • AlwaysUseDefaultCodePage = True 

  •  FastLoadMaxInsertCommitSize = 0   
Notice that FastLoadMaxInsertCommitSize = 0 is the default selection and something that you would typically change if you are loading data into a SQL Server table instead. But for Netezza destiantions, using a value different than zero seems to degrade performance considerably. Ah! and remeber you have to use the OLE DB destiantion propeties window or the advance editor to select "OpenRowSet Using FastLoad".

  Additionally, you can add the Loggin Level and path parameters to the connection string to create a very, VERY verbose log file, which may come handy when troubleshooting issues:

Logging Level=1;Log Path=c:\LogDir;




June 21, 2010

SSIS and Netezza: OLE DB Destination Bug?

This is the second post of my SSIS & Netezza series. In an earlier post I provided details about the environment I was working in. At this time, I would like to make a quick note about an interesting behavior in the OLE DB Destination when pointing to a Netezza table. When you open the OLE DB destination editor, there are no “fast load” options under the data access mode property.

For those not familiar with this property, using ‘fast load’ tells SSIS to attempt the load in batches of rows – typically 10k row batches – as opposed of doing row-by-row load/commit operations, that in general yields  better performance. Now, the odd thing is that if you look into the properties pane of the destination component, then you see 2 extra options for ‘fast load’.


This means that after you set everything else up in inside of the destination editor, you also have to visit the windows pane of the component to setup the access mode property. Annoying, isn’t it? But wait there is more. If you ever have to open the destination component editor again, the access mode property will get reset to the default value ‘Open Rowset’, so you have to go to the properties window of the component one more time to select the fast load option.

I get the same behavior in SSIS 2005 and SSIS 2008 R2 when using the the Netezza OLE DB driver 5.0.8

I also report this as a bug in the MSFT SQL Server connect site. You can add your notes or vote to get this issue fixed - Assuming it's an SSIS issue - here: SSIS: Fast Load option missing from Netezza OLE DB destination 

Are you using SSIS to load Netezza databases? if so, how are you doing it and what issues have you had to work around?

June 10, 2010

SSIS and Netezza: Getting started


First a disclaimer: I am new to Netezza, and I am by no means an expert on it.

The last couple of weeks I have been assisting my new team with a massive data migration (~15 Tb) from SQL Server 2005 to a Netezza 8.0.6 appliance using SSIS 2005 -Yes, I know, I know, that is pretty old version of SSIS, and believe me I am trying to make the case for the upgrade- .
So far, I have been doing mostly easy stuff, just straightforward SSIS packages to read from a SQL Server table and dump the data into its Netezza equivalent. But even at this level of simplicity, I have encountered  issues and glitches along the way – or  just made silly mistakes –.

The Tools

In the SLQ Server side just the regular stuff:  mainly SQL Server Management Studio and BIDS.
In the Netezza side:
  • Netezza Administrator which is the native admin tool. So far, I have used it to monitor sessions, query activity and to do other light admin stuff. 
  • SQuirreL SQL Client. Because Netezza does not have a native GUI for running queries – something I ave been spoiled with-, I decided to install SQuirreL SQL Client, which is an open source "graphical Java program that will allow you to view the structure of a JDBC compliant database, browse the data in tables, issue SQL commands, etc. ". Now, if you enjoy Unix commnad line environments for writing queries, then you can use nzsql, the native SQL command line interpreter.
  • Netezza 5.0.8 OLE DB Driver (provided by Netezza). So far I have been using the OLE DB driver and getting decent results, however I am still troubleshooting an issue where some packages fail if X number of connections have been already made. More about this in a next post.
After completing 50 packages or so I realized this has been one of the few occasions where I am using SSIS to get data into something that is not a SLQ Server DB, and so far, I have learned the hard way that some of the guidelines I have adopted and followed for SQL Server targets not always work when working against Netezza (e.g tweaking the FastLoadMaxCommitSize property so no all rows inserted are committed in one transaction). In the next post(s), I will share some details about the specifics of building, running and troubleshooting the packages against Netezza as well as issues I have run into. So far, this is what I have in mind:
  • Netezza - SSIS Connectivity: OLE DB driver
  • SQL Server –SSIS –Netezza Data type conversion issues
  • Troubleshooting and Debugging SSIS – Netezza Issues
  • Tips when populating Netezza tables via SSIS
If you have created SSIS packages that connects to Netezza, I would like to hear about your experience. What tool do you use to query Netezza? What driver you use within SSIS: ODBC or OLE DB?