Clustered Grids Make Their Way Into the Enterprise -- Microsoft Certified Professional Magazine Online

Clustered Grids Make Their Way Into the Enterprise

By Scott Bekker
05/14/2001

Thanks to advances in hardware and operating system design, today’s PCs have the potential to double as computational workhorses for distributed computing efforts. And thanks to the efforts of a host of companies, IT organizations can for the first time expect to use off-the-shelf software and shrink-wrapped operating systems like Windows NT 4.0 or Windows 2000 Professional to exploit the latent computing capabilities of their desktop PCs.

One enterprise customer that’s benefited from advances in the scalability and availability of distributed computing technologies is investment banking firm J.P. Morgan & Co. Inc.

Mike Liberman, vice president and global head of interest rate derivative technology for J.P. Morgan, traditionally relied upon UltraSparc-based servers from Sun Microsystems Inc. to perform such computationally intensive tasks as long-term derivative calculation.

“We use sophisticated mathematical models for calculating the risks of those derivatives, and these mathematical models are extremely intensive in terms of their computational requirements,” he explains.

In Liberman’s account, J.P. Morgan’s existing infrastructure – which was based on Sun UltraEnterprise servers outfitted with up to 20 UltraSparc processors per box – was simply running out of horsepower when it came to keeping up with the computational requirements posed by its derivative-crunching efforts.

“As our business problems kept growing and as our model became more and more sophisticated it put enormous strain on the computing power that we had on hand,” Liberman confirms.

If it couldn’t find another solution, J.P. Morgan faced the unenviable prospect of a seemingly endless – and expensive – series of hardware upgrade cycles. So rather than expending millions to purchase computing horsepower from Sun, Liberman and J.P. Morgan determined to take a chance upon an altogether different approach: “EnFuzion,” a new distributed computing product from TurboLinux Inc.. If it worked, EnFuzion could allow J.P. Morgan to exploit the computing power of its hundreds and thousands of distributed workstations – most of which ran Windows NT 4.0 or Windows 2000 – to perform supercomputer-like number crunching.

“We’d spent millions of dollars on the hardware already and when you need to double the size it’s very expensive, so we started looking at a more innovative way of addressing those needs,” he confirms.

The technology that makes J.P. Morgan’s derivative crunching effort possible actually falls under the broader rubric of grid computing. Grid computing refers to the clustering of distributed resources – including supercomputers, storage subsystems, conventional desktop PCs and even embedded computing appliance-type devices – into so-called “computational grids,” which can then be exploited in highly parallel distributed computing efforts. Grid computing itself is often grouped under the larger superset of peer-to-peer (P2P) computing.

Enterprise vendors are lining up in support of grid computing. In August 2000, Intel Corp., Hewlett-Packard Co. and IBM Corp. launched a P2P Working Group to champion the interests of grid computing in the enterprise. Also, in July 2000, Sun purchased a distributed computing start-up, called Gridware Inc., that developed software which exploits the resources of idle Unix workstations to perform supercomputer-like computational tasks.

But although P2P advocates remain bullish on the prospects of grid computing in the enterprise, they also caution that it’s not by any stretch of the imagination a one-size-fits-all computing panacea.

“It’s definitely more of a parallelization-type thing, so it’s really for certain types of applications, especially those in the graphics, financial and aerospace industries,” comments Stephen Spector, senior marketing manager with TurboLinux. “If you’re working in these industries, we can go and distribute your computing tasks across a thousand machines, but if you’re working with conventional applications, it’s not really applicable.”

J.P. Morgan’s Liberman, for one, suggests that while it may be possible for IT organizations to exploit the potential of grid computing technologies for conventional applications, such companies will “have to be much more clever when [they] write [their] software.”

Jeff Davis, a senior systems programmer with petroleum giant Amerada Hess Corp., exploits a more traditional variant of distributed computing – low-cost Beowulf Linux clusters. Scalable clustering solutions such as Beowulf and the RS/6000 SP clusters from IBM Corp. have, after all, blazed a trail for today’s grid computing tools in the enterprise.

Amerada Hess has always leveraged a distributed computing solution in one form or another to help it crunch geophysical data, Davis says, so going with a Beowulf Linux cluster running across 200 workstations from Dell Computer Corp. was a no-brain-type decision that ultimately helped save his company approximately $2.5 million.

Amerada Hess case could actually have upgraded its existing IBM RS/6000-based SP2 cluster, Davis allows, but ultimately determined that a Beowulf cluster was the way to go from the perspective of not only cost savings, but also of performance and flexibility.

Unlike EnFuzion and other grid computing solutions of its ilk – which typically exploit idle resources across an IT environment -- high-performance clusters run on dedicated systems and feature dedicated high-speed interconnects, making them ideal for a variety of computationally intense applications and implementations.

Of course, as Reza Roohalamimi, senior manager of clustering solutions with Dell suggests, you don’t necessarily have to deploy a distributed cluster on dedicated Linux or Unix systems to reap its benefits. After all, Dell also has a very successful Windows NT/2000-based clustering program, Roohalamimi points out, and has worked with customers such as Cornell University's Cornell Theory Center (CTC) to engineer a 256 processor, 64-node Windows 2000 cluster based on 16 of its four-way PowerEdge servers. “When we say ‘Beowulf’ here at Dell, we’re talking about building high-performance clusters using either Windows NT/2000 or Linux and putting them on industry-standard hardware,” Roohalamimi explains.

Distributed clustering solutions typically rely on a technology – dubbed “message passing” – that facilitates communication between nodes. As long as you’ve got message passing software installed on your target workstations and servers, Roohalamimi contends, you can reap the benefits of a Beowulf-type cluster even across conventional 10- or 100-BT Ethernet networks.

“You can still use it even if you’re going to use a set of workstations that are sitting idle at night, you just need to make sure that you have message passing software installed on these workstations,” Roohalamimi explains. “The physical interconnect plays an important role in this, but you can support these applications even on standard 10- or 100-MB Ethernet. If you have a good implementation of message passing, they’ll do just fine.”

There are already a variety of open source message passing implementations available for Linux platforms, but thanks to the efforts of vendors like MPI Software Technology Inc., which markets a message passing solution, dubbed MPI Pro, for Windows NT/2000 and Linux – IT organizations can also roll-out in-house distributed clustering solutions similar to Beowulf and others of its ilk.

“It’s possible to use MPI’s technology or a combination of custom-built or other programs to achieve the same effect [as a Beowulf cluster],” agrees Dell’s Roohalamimi. And if you deploy a cluster on Windows 2000, he notes, you can take advantage of the latter’s support for interconnect and cluster management features that in many cases aren’t yet available on Linux. The CTC exploits one such software component – the Cluster Controller, also marketed by MPI Software Technology – to provide parallel batch and interactive job scheduling for its dedicated Windows 2000 computational clusters.

“The bottom line is that it’s easier than ever to deploy scalable Windows or Linux clusters using industry standard hardware,” says Roohalamimi. -- Stephen Swoyer

About the Author

Scott Bekker is editor in chief of Redmond Channel Partner magazine.

Register! Top 5 Hybrid AD Management Mistakes and How to Avoid Them