Yukon Getting Data Warehousing Overhaul
- By Scott Bekker
The next version of Microsoft SQL Server, code-named "Yukon," will have a completely rebuilt engine for extraction, transformation and loading of data from one source to another -- a key element of data warehousing and business intelligence operations.
Data Transformation Services (DTS), Microsoft's name for its ETL technology, entered the SQL Server product line with SQL Server 7.0 in 1998 and was incrementally improved in SQL Server 2000.
"We are investing heavily in ETL tools inside of Yukon. We want to ship all the different [Business Intelligence] pieces that customers need, which includes enterprise ETL in the Yukon box," said Tom Rizzo, director of product marketing for SQL Server.
At this point, Microsoft is sticking with the DTS name inside Yukon. But the company is considering a new name for DTS given all the changes going on under the hood.
Microsoft is working to make ETL more scalable, manageable and reliable in Yukon. Restartability of ETL operations is a key improvement on all those fronts. "Let's say you had 10 million rows of data to move. In previous versions, if you had an error after five million rows, you wouldn't be able to fix that error and restart from 5,000,001. You had to start with row No. 1," explains Rizzo. "That was a pain point."
Yukon will include tools to allow restarts, as well as to allow graphical debugging so a database administrator can go in and fix the problem or see if a decision on a data cleansing operation is working the way it was expected to.
New APIs and support for .NET languages will make the DTS component of SQL Server more easily extensible to custom rules and third-party apps. "We wanted to create a tool that was even better than the simple scripting [of past versions]," Rizzo said.
The SQL Server team leaned on Microsoft Research for a new feature called "fuzzy lookup." Microsoft officials explain that the feature is for cases where it's difficult to match data in one source to data in another. The new fuzzy lookup assigns a level of confidence to a match. The tool will allow a database administrator to set a value such as 90 percent confidence as the level that DTS will automatically execute an ETL move.
SQL Server corporate vice president Gordon Mangione's speech at the SQL PASS conference this week was Microsoft's first public indication that DTS was coming in for serious work in Yukon. The Yukon release is supposed to be released in late 2004.
About the Author
Scott Bekker is editor in chief of Redmond Channel Partner magazine.