SQL Server's Data Profiling Task Gets the Right Data the First Time
Get some real useful information from SQL Server Integration Services' Data Profiling Task feature.
- By Eric Johnson
SQL Server 2008 Integration Services ships with the wonderful new Data Profiling Task. Much as the name implies, it profiles the data in a given table and returns a ton of useful information. The task gathers statistics for candidate keys, column length, NULL data ratio, data patterns, and much more. All of the results are written out to XML. The task makes more sense when you look at it, so let’s jump in.
The configuration of the task is pretty simple, you select the types of profiles that you want to run and then set the specific options for that profile. In Fig. 1, I selected Column Length Distribution for all columns (*) in the HumanResources.Employee table. This profile will gather length statistics on the columns, such as Minimum and Maximum Length, and provide a distribution of all of the lengths found. On the general page of the task, you also need to provide a destination, which can be a file connection or a variable.
Once you set up the task with all the data profiles you want to run, you simply execute the SSIS package and let it work its magic. If you wrote to a variable, you’re on your own to do something with the data within the SSIS package. On the other hand, if you wrote to a file, the SQL Server 2008 Client Tools come complete with the Data Profile Viewer, which can be found at in the program menu under Microsoft SQL Server 2008 | Integration Services. This is a pretty simple application that allows you to browse the XML file created by the Data Profiling Task.
|Figure 1. Use the Data Profiling Task to do things like gather data on tables you'll need to manage that you may not have seen before. Here's a sample showing how to get column length data for a table called HumanResources.Employee. (Click image to view larger
|Figure 2. Running the Data Profile Task as I specified in Fig. 1 will bring up these results in XML format. (Click image to view larger
Fig. 2 shows the results for the Column Length Distribution profile I ran earlier. There's quite a bit of information returned, and this is just one of the profiles. If you take some time and dig into all the profile types, you'll find that this task provides a large amount of information. If you’re the creative type, you can run this task and write the data to a variable, which can be further queried by your SSIS package.
With a little work, the Data Profiling Task tool is one that can give you lots of information the next time you're presented with a new database that you know nothing about.
Eric Johnson, SQL Server MVP, is the owner of Consortio Services in Colorado Springs providing IT systems management and technology
consulting. He is also the President of the Colorado Springs SQL Server
User Group. He can be contacted at www.consortioservices.com.