While grid computing might sound revolutionary, it's really an incremental step in a long developmental history of computer platforms. In fact, enterprises of all types are already using a simple form of grid computing within their firewalls to economically meet processing needs.
Just as storage-area networks (SAN) virtualised enterprise storage needs, grid computing virtualises processing power. With it, a process doesn't run on a specific assigned CPU in a particular box; rather, it goes to any available processor in the network. A small task uses a single CPU, while larger jobs are split up and run on multiple CPUs simultaneously. The jobs can be assigned using either a centralised or peer-to-peer structure. Once limited to university research clusters, grid computing is making inroads into the business community, offering supercomputer performance at PC prices.
At Caprion Pharmaceuticals based in Montreal, Canada, for example, a four-CPU server needs 720 hours to analyse a single biological sample from one of Caprion's eight mass spectrometers. Mixed in with these resource-intensive computational cycles is a constant stream of smaller tasks. The company turned to grid computing to efficiently address the mix of large and small jobs on a single platform.
"Biotechnology changes so rapidly that 12 months from now, we could be doing very different kinds of computing," says Paul Kearney, director of bioinformatics. "We need a flexible computing platform that not only meets today's needs, but the future ones we don't know about yet."
If you feel confused on the subject of grid computing, you're not alone. "People envision grid computing as meaning everyone being connected planetwide," explains Gordon Haff, an analyst at Illuminata, a consulting firm that specialises in large-scale computing. "That is only part of it, and largely a future part."
Grid computing follows along the development track of technologies such as the Object Management Group's Common Object Request Broker Architecture, Microsoft's Distributed Component Object Model (DCOM) or the Open Group's Distributed Computing Environment (DCE). More recently, the Globus Project (www.globus.org) has developed standards for large-scale computing over the Internet.
"Organisations have been doing grid computing for 10 years," says Stacey Quandt, an analyst at Giga Information Group. "People are talking about it now because of the potential global grid, but many already have grids."
Examining the options
There are three main types of grid: cluster, campus and global. The definitions aren't set in stone. Rather, they represent rough divisions on the continuum of ways in which organisations can share computing power.
Sun Microsystems reports that of the 5,000 organisations currently using its Sun Grid Engine software, 90 per cent, or about 4,500, use it for cluster grids, about 450 use it for campus grid installations and 50 for global grids.
Cluster grids consist of either linked servers and systems or just systems in a single location serving the same project or business unit. In addition to providing more effective use of server resources, such a cluster can also capture unused workstation CPU time for large processing cycles.
Caprion's cluster grid was installed last September in a server room built from scratch using Sun Microsystems' Sun Fire 4800, 3800 and 280R servers running Solaris 8, with a 5TB online Oracle8i database. Sun's Grid Engine distributes tasks among 76 CPUs. It can assign a single job to all of the CPUs for maximum processing speed or run many jobs simultaneously.
The cluster grid has enabled Caprion to take on a job consisting of running several large-scale experiments that will take two to three months to complete.
"We wouldn't have even considered doing those experiments if it required months of putting in a new computing platform or rearranging the hardware," says Kearney. "But the grid software has made it relatively easy to reorganise our computer farm for the job at hand."
Campus computing, which allows several projects or departments in an organisation to share resources scattered across a campus, is the next step up from clusters. For example, the accounting department may tap into the personnel department's server CPUs when running a series of complex financial forecasts, or the engineering department may schedule its large jobs to run at night, using the processing power of all the company's workstations, to have the results by morning. Campus grids use the organisation's network to transfer data between resources in various locations.
Setting up a campus grid introduces technical complexity because of bandwidth and delay problems, as well as organisational challenges.
"You need to take into account the social engineering, not just the technology," Quandt points out. "You're setting up an infrastructure where some departments or users take priority over others."
He also points to financial and security issues. If one department is using another department's servers, for instance, this needs to be reflected in cost analyses. If security requirements vary among departments, policies must be set to accommodate the disparity.
Despite such challenges, some companies have operated grid computing for years. Hartford, Connecticut-based Pratt & Whitney, a division of United Technologies, began using distributed computing in the early 1990s to model jet engines and gas turbines. Relying on physical testing of the hardware proved to be too expensive and too slow.
"Building the hardware for a single test can run into the millions of dollars and take months," says Pete Bradley, associate fellow for high-intensity computing at Pratt & Whitney. "We switched to large-scale numerical simulations."
To boost processing power, the company installed Markham, Ontario-based Platform Computing's Platform LSF software on 5,000 workstations and 150 servers at five locations in the US and Canada. Machines are connected over 100Mbps. Ethernet.
The grid mainly executes simulations at night on the unused Sun workstations and desktops. A typical task is assigned to 20 to 40 processors. If there aren't enough processors available at one campus, the activity is transferred to another. Single jobs are never split between two campuses because of line delays.
The grid cut engineering time and development costs in half for jet engine compressor designs. As an added bonus, the software lets Pratt & Whitney execute thousands more simulations than before.
"Grid computing has saved us a fortune by making it unnecessary to purchase dedicated servers for this kind of work," says Bradley. "But more importantly, increased capacity allows us to accelerate technology development."
Some companies develop unique grid strategies. US-based biotechnology firm Cognigen, for example, uses an application developed by the University of California called NON-linear Mixed Effects Modeling (NONMEM) for analysing pharmaceutical clinical trial results and extrapolating what the results would be on a larger population. Although NONMEM isn't designed for multithreading, Cognigen still uses Sun's Grid Engine to manage multiple jobs simultaneously, sending one job to each processor and then notifying the users when the job is done. The company estimates that this has saved each scientist four to five hours per week.
Worth it for everyone
Grid computing is being successfully applied in thousands of enterprises, but that doesn't mean it works for everyone. While a quick look at CPU usage statistics in most organisations reveals vast untapped computing power, it's not always possible to capture that power economically. In the end, the theoretical absolute of grid computing using any computing resources, anywhere, anytime isn't physically possible, nor is it necessary.
"There are lots of processors sitting around in microwave ovens and appliances that no one in their right mind would want to attempt to connect to a global grid," says Illuminata's Haff. "The efficiency of grid computing sounds good, but it doesn't make sense for most companies."
It's not worth spending the time and money setting up a grid to save a few nanoseconds on a word processing document, for instance. But on the other hand, number-crunching mega-applications, which can be split into small pieces and run in parallel, and are highly repeatable and require minimal disk I/O, are ideal for a grid setup.
In addition to biotechnology and engineering applications, grid computing is probably of most value in fields such as petroleum exploration, material sciences, crash simulations, financial market modelling, motion picture special effects and weather forecasting.
"Organisations that are using it are the same ones that 15 years ago would have been using Cray computers," says Haff. "By going for a distributed approach, they achieve the same power at lower cost."
What is grid computing
Grid, or distributed, computing is the aggregating of processing power possessed by different servers or workstations into a single resource. A single large job can be split into smaller pieces and run on several, or several thousand, computers simultaneously, producing supercomputer speed from off-the-shelf hardware.
Campus grid computing: Distributed computing across one organization, usually in one geographic location.
Clustering: A cross between parallel processing and distributed computing. Networked computers, on a local level, work together just like a parallel machine. Although the work is split across multiple computers, they behave in a parallel fashion.
Local distributed computing: Distributed computing used on a local level, generally on a local-area network. Local refers to an enclosed space, building or area, such as a university or an office.
Common Object Request Broker Architecture (CORBA): A set of standards from the Object Management Group for distributed technologies and objects. CORBA is platform-neutral and quite popular. Its technology allows communication between distributed objects. CORBA's equivalents are the Open Group's DCE and Microsoft's DCOM system.
Parallel processing: A type of distributed computing that distributes the workload across multiple processors within one computer instead of distributing it across multiple computers. It provides better speed and communication than distributed computing, but it's not widely used due to the difficulty of programming for parallel machines.