Pages

Thursday, May 17, 2012

Data Profiling task in SSIS

Data profiler task can be used for the following reasons.

1. To understand the source data

2. To compare data against the previous load and new load

Data profiling task will create the xml file which gives you entire statics data present in SQL server database / tables. We can use only “ADO.NET” connection for profile purpose, so make sure we have moved our source data from different system like Oracle, Flat file and etc into the SQL server staging table.

DataProfileViewer.exe [Location : C:\Program Files\Microsoft SQL Server\100\DTS\Binn] will be used to view the xml file and see the statistics about the database /table.

Advantages:

1. We use the lot of Select statement to validate the data. Like distinct data and NULL ratio and etc. Profiling task does everything and reduces the time for data analysis

2. We can do it for entire database / specific table, so that validation becomes eaiser.

Disadvantage:

1. This is very expensive, because it scan through the entire table / database. So if we have this task in our regular loading process takes lot of time and resources.

How to configure:

AdventureWorks is the source database for AdventureWorksDW2008 database. So we will use the profiler task against the AdventureWorks and see how the data present there.

2. We can use only the file connection[we can also use the xml file path from the variable] for destination, because it will write only the XML file. Please take a look at the following screen.

clip_image002

3. Click on the “Quick Profile” to configure the database / table to see the statistics

clip_image003

4. Create a new connection and select the <All> to see the statistics for the entire database / or you can select single table. I am selecting “Prodction.Product” table for quick example.

clip_image004

5. On the compute, select the necessary profile and click OK.

6. Now run the SSIS package.

7. AdvWork.xml file will have all the information about the “Product” table in AdventureWorks database.

This is the very huge xml file. Seeing in the notepad will not help. So use the following tool see the profiled information.

DataProfileViewer.exe:

You can find DataProfileViewer.exe in the location “C:\Program Files\Microsoft SQL Server\100\DTS\Binn”.

clip_image006

Thanks,

Gowdhaman

SQL server 2008 R2 Best Practice analyzer

To find out the configuration best practices for your SQL server 2008 R2 and 2008, you can use this “Microsoft SQL Server 2008 R2 Best Practices Analyzer(BPA)” using the following link.

http://www.microsoft.com/en-us/download/details.aspx?id=15289

There are two prerequisites, Please download and install prior to install the best practices analyzer. 

    1. Microsoft Baseline configuration Analyzer 2.0
    2. PowerShell V2.0

Once, you done with installation.

Configuration:

Important note: Check with administrator before you run the following command

SQL server BPA can connect to both local and remote sql servers.

  1. Open the Powersheel as an administrator
  2. Enter the following command
    “Enable-PSRemoting”
  3. Enter “Y” to overwrite the PowerSheel configurations

clip_image002

Scan for Best practice:

  1. Open the “Microsoft Baseline configuration Analyzer 2.0” from the start menu.
  2. Select “SQL server 2008 R2 BPA”
  3. Click on the parameters tab and enter the required parameters and Select the required objects like “Database Engine”, “Analysis Services” ,“Integration Services” and Etc.
  4. Wait the for the reports to come.

We can see the Errors and warning with different Components. These information is very useful for configuring you SQL server 2008 R2 and 2008.

clip_image004

following are some use full error messages for your reference.

Error in Engine:

Category: Performance

Source: localhost

Issue: This instance of SQL Server has only one tempdb data file

Impact: Under a heavy load, the tempdb database can become a single point of contention and affect concurrency and performance

Resolution: To reduce contention in the tempdb database, increase the number of tempdb data files, and configure the required startup trace flags.

 

Warning in Analysis Services:

Category: Performance

Source: localhost

Issue: One or more dimensions have all attribute relationships defined as Flexible

Impact: The RelationshipType property determines whether Analysis Services creates rigid or flexible aggregations. After an incremental update on a dimension, flexible aggregations are dropped and must be reprocessed, but rigid aggregations persist

Resolution: Examine the relationship between attributes and define them as Rigid when the relationship between the attributes is not likely to change such as City and State

 

Warning in Integration Services:

Category: Performance

Source: localhost

Issue: The Sysssislog table was found in the master or msdb database

Impact: Integration Services package logging can generate a large volume of lengthy log entries. You may experience decreased performance if you log to a system database, such as the master or msdb database

Resolution: A good practice is to create a separate database for Integration Services logging. You do not have to create the Sysssislog table manually. If the table does not already exist, Integration Services creates this table for you when you select a SQL Server database as your logging destination

 

Thanks,

Gowdhaman