Computing Clusters: Sometimes You Can't Make It On Your Own

The idea of cluster computing is elegant. Rack up a bunch of CPUs from off the shelf and get them processing in lock step to knock out hard research problems. Of course, the details are more complex than that, as two high-performance computing centers share in this article. But the fact is that some of the same components that make up the machine taking up space on your desktop can now be applied to solving scientific and engineering problems that used to be the purview of 'big iron' mainframes in a former era.

So what is the difference between computing clusters and the PC sitting in front of you? According to Russ Miller, founding director of the Center for Computational Research (CCR; www.ccr.buffalo.edu) and distinguished professor of computer science and engineering at SUNY-Buffalo, 'If you wanted to predict tomorrow's weather for the southern states in the US-maybe half a dozen states-on your laptop, it would probably take three months.' Thus the need for more intensive computing power.

NCSA's Vast Computing Power
What d'es a high-performance computing cluster look like? The National Center for Supercomputing Applications (NCSA; www.ncsa.uiuc.edu) currently runs two high-end clusters. One, nicknamed Tungsten, is a Xeon 3.0 Ghz Dell (www.dell.com) cluster with 2,480 processors and 3GB of memory per node. This system, the center Web site claims, is the 10th fastest supercomputer in the world. The other, named Mercury, was built by IBM and consists of 1,176 processors with between 4 gigabytes and 12 gigabytes of memory per node. These clusters are physically huge. According to John Towns, senior associate director, Persistent Infrastructure Directorate at NCSA, the Dell setup has five rows with nine racks in each row. It required multiple trucks to make several deliveries to get all of its components to the center.

The two clusters, along with an SGI Altix 3000 and an IBM p690, make up the four major systems that allow NCSA to serve academic researchers in the US with computing projects that require massive parallelism. (All have nicknames coming from the metal family.) Also part of the environment is a mass storage system named UniTree, which serves all the computers and has the capacity to archive three petabytes of data. (There's room to grow; it currently holds less than two petabytes.)

Towns estimates that 'in excess of 400 activities' are going on within his directorate currently. But of larger-scale activities-those funded at half a million dollars or more a year-there are about six projects happening at any point in time.

One such endeavor in which NCSA is involved is called LEAD (Linked Environment for Atmospheric Discovery; www.lead.ou.edu). The objective, he explains, is to discover a next-generation environment for research and prediction of weather-'so that we understand not only whether it's going to rain next week, but if we see conditions developing that would indicate, say, a tornado, [we can] help emergency management services to evacuate the counties that are going to be affected.'

One of five original centers opened as part of the National Science Foundation's Supercomputer Centers Program in 1986, NCSA is on the campus of the University of Illinois at Urbana-Champaign. Before the founding of these US-based resources, the only supercomputing resources existed within the Department of Defense (www.defenselink.mil) and the Department of Energy (www.defense.gov). Nothing was available to academic researchers seeking access to 'unclassified' equipment. Many researchers resorted to doing their work abroad--primarily Germany, according to Towns--to get access to these types of systems.

The explosive growth of the cluster approach is an almost-inevitable result of more powerful commodity-priced computers, the blossoming of the open source community (which provides operating systems, management tools, and many of the applications), and a breaking away from centralized control of computing power.

The latter, especially, can't be underestimated. Although the computing resources are maintained and managed in a single location, access to them is an almost-democratic undertaking at some cluster computing sites, such as CCR. All details of the systems are publicly displayed and updated constantly on the center's Web site-including the performance of individual systems, which jobs are running on particular nodes, the status of the queue for computing activities, and even the comings and goings of staff, researchers, students, visitors, etc. at various parts of center as broadcast on Webcams.

A Quick History of Cluster Computing

In its essence cluster computing hasn't really changed much since its introduction in 1993 with the launch of projects like Beowulf. Donald Becker and Thomas Sterling met at MIT with a common interest: to figure out how to use commodity-based (read: low-cost) hardware as an alternative to large supercomputers. According to Phil Merkey, an assistant professor in Mathematics and Computer Science at Michigan Technological University, the prototype system consisted of 16 DX4 processors connected by Ethernet. The machine was 'an instant success' and the idea of commodity off the shelf-based systems 'quickly spread through NASA and into academic and research communities.'

At the National Center for Supercomputing Applications John Towns recalls working with the computer science department at the University of Illinois at Urbana-Champaign then with vendors-most notably IBM-to design and build a cluster consisting of 1,024 nodes using Pentium 3 chips. NCSA's goal was to find a 'high performance solution.' The result was nicknamed Platinum by the center. IBM then turned around and productized the cluster, which became the IBM Cluster1300.

The Racks of CCR
Says CCR's Miller, 'Our knee-jerk reaction to any quest is to say, 'Yes.'' The mission of the resources at CCR is to support high-end computing, visualization, data storage and networking to 'enable discovery in the 21st century.' Miller says his group supports work in computational science and engineering for the campus, as well as a 'whole host of endeavors, whether they're at the university, corporate partners, local companies, local government agencies or what have you.' That includes a recent animation project for MTV2 ( www.mtv2.com) (which had contracted with a Buffalo company that, in turn, contracted with the center), hosting workshops for local high school students to learn about computational chemistry and bioinformatics, as well as other work that will benefit the local community.

Then there's the research, currently about 140 different projects, ranging from vaccine development, to magnetic properties modeling, to predicting better ways to make earthquake-resistant material.

One project Miller describes that is being run on the clusters is trying to understand water flow. 'We have a group in civil engineering who are modeling the Great Lakes and the Great Lakes region,' says Miller. 'What they've done is develop algorithms that can take advantage of massive parallelism... [to] get a better and finer understanding of the Great Lakes and how the water moves and how it flows than was ever possible before, by orders of magnitude. For example, you can imagine somebody putting a contaminant in the Great Lakes and you say to yourself, 'OK, depending on how people are watering their lawns, taking showers, washing their cars in Buffalo, Rochester, Syracuse and so on, when and how bad will the contaminants be when they hit New York City?''

CCR has multiple Dell clusters, but the two largest ones are Joplin, with 264 nodes running Red Hat Linux (www.redhat.com), and U2, the newest system, with 1,024 nodes (with main memory varying from 2 to 8 gigabytes on each of the nodes on U2). U2 is also running RedHat Linux though a different version. Miller has the honor of naming the machines and the current naming scheme honors inductees to the Rock and Roll Hall of Fame. The front ends to the system are named Bono (vocalist for U2) and The Edge (guitarist); two administrative servers that help keep track of and monitor the system are called Larry (drums) and Adam (bass) because, he says, they 'keep everything running.'

Currently, only about two-thirds of U2 is powered up at any one time, since the system is still going through testing. Once the Center gets the older system, Joplin, updated with the same version of RedHat that U2 is running, says Miller, it'll be integrated into U2's queuing system. 'The [capacity] computing jobs will be automatically routed to what was called Joplin-those nodes; the [capability] computing jobs will be automatically routed to U2; and it'll be transparent to the users. It'll be one queue that's essentially an Intel (www.intel.com) Pentium line of chips.' The only change for the users, explains Miller, is that they'll need to answer a couple of extra questions about their projects up front-such as 'whether the code they need to run is 32-bit or 64-bit compatible, which means it can run on both places-or if it's just 32-bit or just 64-bit. And we'll be able to route things to the appropriate nodes.'

Getting Access to the Power
To gain access, CCR has 'basic minimal requirements,' says Miller. The user describes the project and required resources in terms of storage and compute power. 'We support research, discovery, and scholarship that requires high-end computational resources. That's obviously a moving target.' In fact, he says, every six months or so, the center redefines what it means by 'high-end' in terms of data, networking, visualization, or computing requirements. If a project requires fewer than, say, 16 processors running concurrently, Miller and his team will probably kick the request back to the individual to take back to his or her lab, department, or school for handling. An advisory committee evaluates troublesome proposals, but most of the time, the decisions are 'obvious.'

At NCSA, the process for gaining access is more formal. The proposal process is modeled after the NSF process, says Towns, which involves-for large requests-a peer review performed by a national review committee that meets quarterly. These are proposals requiring in excess of 200,000 CPU hours per year. Smaller requests-from 10,000 to 20,000 hours of time-are considered 'development accounts,' for start-up projects. Reviewed and rewarded continuously, the smaller accounts allow researchers to try out their applications on the system and understand performance characteristics in preparation for submitting larger proposals.

What's a CPU hour in cluster terms? According to Towns, it's equivalent to one hour of time on a single processor node. Since these are dual processor nodes, there's a total of 2,480 processors on Tungsten. If a project is running on 64 nodes, which is 128 processors, and it runs for one hour, the user has accumulated 128 service units or CPU hours.

Neither organization charges its academic users for the time they use on the clusters. In the case of NCSA, Towns says, they're granted allocations of time as part of grant awards, and those allocations are billed from their usage.

CCR considers its clusters part of the 'university infrastructure,' says Miller, 'to support leading-edge science research.'

Both centers also attract funds from academic users in situations where they've budgeted compute time in their grant proposals to cover expanded demands on staff time; compensation also comes from commercial users that make use of the resources.

The NCSA has about 100 staff members and another '15 or 20' graduate students working in its center to provide 24/7 support for its community of users. CCR has a technical team of 13, consisting of operations people and computational scientists. The former, system administrators, keep the systems running, and the latter work closely with users to figure out, for example, what applications are needed for a particular project or to help optimize code.

Miller estimates that about half of the applications running on the CCR computers are off-the-shelf-code that has been paid for or is freeware or shareware. The other half is 'home-grown.' In the case of NCSA, Towns says, 'By and large [the majority of our users are faculty researchers] using applications they've developed to solve the problems that they're attacking.'

What It Takes To Work in Academic Computing

Every wondered what it takes to work in a high-performance computing environment? According to John Towns, senior associate director, Persistent Infrastructure Directorate at NCSA, there's no other job that can prepare you-you arrive with particular qualifications then get on-the-job training.

At NCSA, to obtain a position on the academic professional staff, you're required to have a Bachelor's in Science. Frequently, staff members have worked in a research environment, possibly as a graduate student. Their backgrounds may include hard sciences, or engineering, or computer science. All have 'an affinity for computers' and an interest in the high end-whether that is 'compute systems, networks, visualization, or data storage,' Towns says. Frequently, they don't work normal hours, which is an advantage in an environment that runs 24 hours a day.

Shared-Memory Machines vs. Clusters
One misunderstanding that can crop up about clusters is that they replace the old-style mainframe-type or mass-storage computers. In reality, each setup is advantageous to a specific type of computing work. 'Gene sequencing is fairly trivially parallelized. You can spread it across a cluster and use the resources well,' Towns says. In other words, every processor is, practically speaking, running a separate copy of the application, and no processor needs to talk much to the other applications.

CCR's Miller refers to this as 'capacity computing.' In order for a scientist to solve a certain problem, they may need to run a thousand or ten thousand simulations, each of which is best run on a single CPU... It's the aggregate of all those results that will solve their scientific problems.'

Another class of programs runs as a single application. As Towns explains it, 'Imagine that you're running simulations, and... you create a grid that represents a space and something happens in that space. Often where something interesting is happening, you need to redefine the spacing on the grid to accurately represent what's [going on]. A class of applications has been developed that in a dynamic way redefines the spacing in the grid where it needs to... If you try to represent the entire grid at the finest resolution, you don't have a big enough memory machine to do it. What you do is refine it where it's necessary... You have some nodes that have a lot of work to do and some that don't. In a shared memory system, you can easily redistribute that work among the processors, so you can keep them all busy together, and move the application along much more quickly.'

TeraGrid

A lot of research in grid computing is currently taking place. The TeraGrid is an effort by the National Science Foundation (www.nsf.gov) to build and deploy the world's largest distributed infrastructure for open scientific research. In practical terms that means developing a common environment and interface to the user community to a rather diverse set of high-performance computing resources-including clusters. In some cases it also involves hooking up the physical systems to be used in conjunction with each other and providing environments that are similar so that researchers can move between systems more easily.

As John Towns, senior associate director, Persistent Infrastructure Directorate at NCSA, explains, many researchers have multi-stage applications that require different kinds of computing architectures to solve. 'So the TeraGrid is attempting to facilitate the use of these multiple architectures-often sited at different locations-to support their research efforts.'

Learn more about the TeraGrid at www.teragrid.org. Learn more about grid computing at www.ggf.org.

Computer Years
Interestingly, both types of large systems have a fairly short life-about three to five years, usually on the lower end, according to Towns. After that, the maintenance and operational costs for the hardware becomes too high and it's time to 'simply buy a new system.' The NCSA looks at a doubling of computational resources every one to two years, and that's matched by demand. That means the center is in some stage of acquiring new equipment every year.

NCSA expects to submit a proposal looking for either a $15 million or $30 million system, with responses due in February. The winning bidder (Towns says there are typically only three or four companies bidding) will be required to have its solution in 'substantive production state' by about March 2007. From there, he says, 'there's the whole procurement process, deployment, testing, and putting it into production.

The installation and testing process has many stages and is time-consuming. In the case of Tungsten, the Dell cluster at NCSA, the center received the hardware over the course of two months. During that time, it was arriving on large trucks, being unpacked, being set up, and then configured and tested. But that was preceded by several months of software and hardware testing in-house at Dell.

Once the installation at the client site took place, NCSA did a lot of applications testing to verify that the system actually worked. Then, before it went to production state, the center opened up the equipment to what Towns calls the 'friendly user period.' This lasts about two months and allows the general user community to compile and test their applications. There's no charge to their time allocations for this, but they also need to understand that it might be unstable until all the issues are worked out. 'It's a good way for us to shake down the system before we go to production,' says Towns.

For the cluster installation at CCR, Dell was the prime contractor, and it subcontracted aspects of the project to other vendors, including Myricom Myrinet (www.myri.com) network communications, EMC (www.emc.com) storage arrays, Force 10 Networks (www.force10networks.com) for switch/routers, and IBRIX (www.ibrix.com) for file storage. When it was delivered, the prime orchestrated installation, 'making sure the right people from the right vendor showed up at the right time,' says Miller. Why Dell? When CCR went out to bid, Miller recalls, 'We met with all the vendors and had discussions, and Dell was clearly head and shoulders above anything we were looking for at that point in time.'

Managing clusters is mostly automated through job schedulers, batch schedulers, and other resource managers and monitors. At CCR, for example, 'Larry' and 'Adam,' the aforementioned administrative nodes, monitor all of the cluster nodes and 'continually ask them, 'Are you still up? Are you still running? Are you healthy?'' says Miller. When problems arise-a file system gets full or a network link g'es down-the human system administrators get notified. If a node ceases to work or a power supply 'explodes,' he says, the job scheduler will continue scheduling, but not on that node.

The Edges Keeps Moving on High-Performance Computing
Clusters are by no means the final word in high-performance computing. As Towns points out, each research community 'has a set of applications that have different system architecture requirements in order to execute and perform well.'

Recently, the NSF issued a solicitation for systems totaling $30 million. In performance terms, according to Miller, that's 'roughly 10 times U2's speed.' Then four years from now they want it be '100 to a thousand times what U2's speed is.' As he points out, 'There's simply no shortcut in terms of solving some of these big-what they used to call 'grand challenge'-problems without big machines. If you're looking at whatever it may be-the physics of the universe, or biochemical processes in the brain, or analyzing the spread of infections... they just require massive amounts of computing power.'

From humble beginnings as commodity devices, equipment that once only existed on the desktop will continue proving its mettle in dazzling displays of high-performance computing clusters.

Resources

Beowulf, the home of one of the original cluster projects:
www.beowulf.org

Dell Campus Architecture
www.dell4hied.com/solutions_detail.php?si=188&cn=1

Dell High Performance Computing Clusters
www1.us.dell.com/content/topics/global.aspx/solutions/en/clustering_hpcc?c=us&cs=555&l=en&s=biz

EMC
www.emc.com

Force10 Networks
www.force10networks.com

IBM
www-03.ibm.com/servers/eserver/clusters

IBRIX
www.ibrix.com

Myricom
www.myri.com

National Center for Supercomputing Applications (NCSA)
www.ncsa.uiuc.edu

SUNY-Buffalo Center for Computational Research
www.ccr.buffalo.edu

Featured