UPDATE: Microsoft SQL Server Parallel Data Warehouse Hits Metal Next Month
The parallel data warehouse (PDW) edition of Microsoft's SQL Server 2008 R2 solution will soon see the light of day. It will appear on HP's hardware next month.
The new HP Enterprise Data Warehouse Appliance will be available sometime in mid-December, Microsoft announced on Tuesday at the PASS (Professional Association for SQL Server) Summit event, which is being held this week in Seattle. The product will use Microsoft's SQL Server 2008 R2 Parallel Data Warehouse edition solution, formerly known by its "Madison" code name.
Madison is Microsoft's reworking of the DatAllegro Inc. massive parallel processing product that Microsoft acquired nearly two years ago. It represents the last of Microsoft's SQL Server 2008 editions to become a full-fledged product. The delay in release may reflect Microsoft's caution with a rather complex product.
"There is more to PDW than just a release of [SQL Server] 2008 R2 on hardware," explained Wes Miller, a Directions on Microsoft analyst, in an e-mail from the PASS event. "Tuning and changes within SQL (to partition the data) are there as well."
On top of HP's product build, Microsoft is working with other possible hardware partners. Bull was the second possible hardware vendor mentioned.
Massive Parallel Processing Going Mainstream?
Microsoft is somewhat late to the competition with its massive parallel processing technology in PDW, noted James Kobielus, a senior analyst with Forrester Research. He said that such technologies have already been deployed in products from companies such as Greenplum and Netezza. Those companies have already been scooped up by bigger players. EMC announced the acquisition of Greenplum in July, while IBM indicated in September that it is acquiring Netezza.
"Microsoft is playing catch up," Kobielus said in a phone interview. "What they have rolled out now with parallel data warehousing is a bit late to the game, and they know it." Microsoft has had clustering capabilities in SQL Server for a while, he explained, but the scalability part was lacking, and that's what the DatAllegro technology brings.
Scalability in PDW means handling tens of terabytes of data and then moving to hundreds of terabytes worth, according to Microsoft. Neither Microsoft nor HP announced pricing for the product (editor's note: see "Pricing Update" at end of the article), but Kobielus suggested it might be one of the lower cost options for organizations needing massive scalability.
"Everything I've heard from Microsoft would indicate that the pricing starts at about $11,000 per terabyte of raw data on the data warehouse within PDW, which would put Microsoft in the august company of being one of the cheapest offerings on the market for massively scalable data warehousing."
Massive parallel processing technology is being used by industries such as finance, telecommunications and government, Kobielus said. They may need to aggregate data going back many years. Alternatively, they may need to mine customer experience information by sifting through call-center data or click-stream data from the Web, he explained. At about 50 terabytes to 60 terabytes of data, clustering is needed; thereafter, clustering starts to approach its limits.
"At around 100 terabytes worth of data, traditional clustering is not as scalable or flexible as it needs to be, and that's when you need to move to massively parallel processing," Kobielus explained. "It uses multiple servers, virtualized as if they were one unified data warehousing resource available for BI analytics."
Miller noted that Microsoft's collaboration with HP will provide a "turnkey" data warehouse solution that takes advantage of the hardware for "up to 480 cores."
"Arguably, Oracle, IBM or any other RDBMS (including SQL) could have been used before -- but this is the first time Microsoft has worked with an OEM to tune Windows and SQL to work out of the box on a system with such scale, while also delivering the ease of deployment," Miller stated.
CTP of the Next SQL Server
Microsoft had a few other PASS announcements besides its PDW news. The company has opened up the first community technology preview (CTP) of the next version of SQL Server, which goes by the "Denali" code name.
IT pros that are MSDN and TechNet subscribers can sign up to be part of the Denali CTP at this page.
Microsoft described a few of Denali's new features very briefly in its announcement. One of those features is called "Cresent," which is the code name for an interactive Web-based method of viewing data. "AlwaysOn" is designed to enable high availability for SQL Server. There's a column-store technology, code-named "Apollo," designed to boost the performance of queries. Database and application developers will have a tool, code-named "Juneau," in Visual Studio that supports both SQL Server and SQL Azure. Finally, Microsoft announced "data quality services" to work with Denali that provide tools to "profile, cleanse, match and merge data."
Microsoft is demonstrating some of those new technologies at the PASS Summit, according to Miller.
"Crescent was demonstrated today, and it effectively works to take data from a data warehouse and turn it into information that a typical knowledge worker can quickly and easily work with," he wrote. "In many ways, [it's] the next step for PowerPivot after enabling Excel power users (an important segment, but still just a fraction of the knowledge-worker space)."
Microsoft had one more code name to disclose at the PASS event, namely "Atlanta." It's being billed as a new cloud-based service that enables best practices when configuring SQL Server. This agent will work with both 32-bit and 64-bit versions of SQL Server 2008 or with later versions, according to this Microsoft description.
Finally, Microsoft announced an updated Microsoft Certified Master program for SQL Server users. This highest certification program for SQL Server professionals will be available at testing centers in nine countries. It previously was held just at Microsoft's campus in Redmond, Wash. Candidates will need to pass just two exams -- a "four-hour Knowledge Exam" and a "six-hour hands-on Lab Exam" that will be available early next year. Microsoft is claiming that this updated MCM cert program will take less time to complete and cost substantially less.
Microsoft on Wednesday provided the following pricing information on SQL Server 2008 R2 Parallel Data Warehouse edition, attributed to a Microsoft spokesperson:
"SQL Server 2008 R2 Parallel Data Warehouse is priced at $38,255US per processor at Microsoft’s level A pricing. SQL Server 2008 R2 Parallel Data Warehouse appliances will typically have up to 22 processors per rack. Therefore the software price per rack is up to $841,610 USD. SQL Server 2008 R2 Parallel Data Warehouse delivers low TCO for our customers with the price per terabyte starting at $13,228 including estimated hardware cost.
"The HP Enterprise Data Warehouse Appliance planned for mid-December 2010 availability. As customer requirements will differ, pricing will vary based on multiple elements, including the customer’s storage choices. As a starting point, the price of the HP hardware, software, and services for a single data rack and one control rack list for less than $900KUS. Customers may choose higher density disks at a slightly higher price or choose to add up to 4 data racks for significantly more user data capacity.
"The price includes Windows but it does not include Microsoft's PDW software or support services for either Windows or PDW. The PDW licenses and the support for the Microsoft applications must be purchased separately."
Kurt Mackie is senior news producer for the 1105 Enterprise Computing Group.