Pages

Thursday, May 17, 2012

Data Profiling task in SSIS

Data profiler task can be used for the following reasons.

1. To understand the source data

2. To compare data against the previous load and new load

Data profiling task will create the xml file which gives you entire statics data present in SQL server database / tables. We can use only “ADO.NET” connection for profile purpose, so make sure we have moved our source data from different system like Oracle, Flat file and etc into the SQL server staging table.

DataProfileViewer.exe [Location : C:\Program Files\Microsoft SQL Server\100\DTS\Binn] will be used to view the xml file and see the statistics about the database /table.

Advantages:

1. We use the lot of Select statement to validate the data. Like distinct data and NULL ratio and etc. Profiling task does everything and reduces the time for data analysis

2. We can do it for entire database / specific table, so that validation becomes eaiser.

Disadvantage:

1. This is very expensive, because it scan through the entire table / database. So if we have this task in our regular loading process takes lot of time and resources.

How to configure:

AdventureWorks is the source database for AdventureWorksDW2008 database. So we will use the profiler task against the AdventureWorks and see how the data present there.

2. We can use only the file connection[we can also use the xml file path from the variable] for destination, because it will write only the XML file. Please take a look at the following screen.

clip_image002

3. Click on the “Quick Profile” to configure the database / table to see the statistics

clip_image003

4. Create a new connection and select the <All> to see the statistics for the entire database / or you can select single table. I am selecting “Prodction.Product” table for quick example.

clip_image004

5. On the compute, select the necessary profile and click OK.

6. Now run the SSIS package.

7. AdvWork.xml file will have all the information about the “Product” table in AdventureWorks database.

This is the very huge xml file. Seeing in the notepad will not help. So use the following tool see the profiled information.

DataProfileViewer.exe:

You can find DataProfileViewer.exe in the location “C:\Program Files\Microsoft SQL Server\100\DTS\Binn”.

clip_image006

Thanks,

Gowdhaman

1 comment:

  1. Excellent post. Thanks for the creating this such an informative post.

    ReplyDelete