Computing Clusters: Sometimes You Can't Make It On Your Own
- By Dian Schaffhauser
- 11/18/05
The idea of cluster computing is elegant. Rack up a
bunch of CPUs from off the shelf and get them processing in lock step to knock
out hard research problems. Of course, the details are more complex than that,
as two high-performance computing centers share in this article. But the fact is
that some of the same components that make up the machine taking up space on
your desktop can now be applied to solving scientific and engineering problems
that used to be the purview of 'big iron' mainframes in a former era.
So what is the difference between computing clusters and
the PC sitting in front of you? According to Russ Miller, founding director of
the Center for Computational Research (CCR; www.ccr.buffalo.edu) and distinguished professor of
computer science and engineering at SUNY-Buffalo,
'If you wanted to predict tomorrow's weather for the southern states in the
US-maybe half a dozen states-on your laptop, it would probably take three
months.' Thus the need for more intensive computing power.
NCSA's Vast Computing Power
What d'es a high-performance computing cluster look like? The National Center
for Supercomputing Applications (NCSA;
www.ncsa.uiuc.edu)
currently runs two high-end clusters. One, nicknamed Tungsten, is a Xeon 3.0 Ghz
Dell (
www.dell.com) cluster with 2,480 processors
and 3GB of memory per node. This system, the center Web site claims, is the 10th
fastest supercomputer in the world. The other, named Mercury, was built by IBM
and consists of 1,176 processors with between 4 gigabytes and 12 gigabytes of
memory per node. These clusters are physically huge. According to John Towns,
senior associate director, Persistent Infrastructure Directorate at NCSA, the
Dell setup has five rows with nine racks in each row. It required multiple trucks
to make several deliveries to get all of its components to the center.
The two clusters, along with an SGI Altix 3000 and an IBM p690, make up the
four major systems that allow NCSA to serve academic researchers in the US with
computing projects that require massive parallelism. (All have nicknames coming
from the metal family.) Also part of the environment is a mass storage system
named UniTree, which serves all the computers and has the capacity to archive
three petabytes of data. (There's room to grow; it currently holds less than
two petabytes.)
Towns estimates that 'in excess of 400 activities' are going on within
his directorate currently. But of larger-scale activities-those funded at half
a million dollars or more a year-there are about six projects happening at any
point in time.
One such endeavor in which NCSA is involved is called LEAD (Linked Environment
for Atmospheric Discovery; www.lead.ou.edu).
The objective, he explains, is to discover a next-generation environment for
research and prediction of weather-'so that we understand not only whether
it's going to rain next week, but if we see conditions developing that would
indicate, say, a tornado, [we can] help emergency management services to evacuate
the counties that are going to be affected.'
One of five original centers opened as part of the National Science Foundation's
Supercomputer Centers Program in 1986, NCSA is on the campus of the University
of Illinois at Urbana-Champaign. Before the founding of these US-based resources,
the only supercomputing resources existed within the Department of Defense (www.defenselink.mil)
and the Department of Energy (www.defense.gov).
Nothing was available to academic researchers seeking access to 'unclassified'
equipment. Many researchers resorted to doing their work abroad--primarily Germany,
according to Towns--to get access to these types of systems.
The explosive growth of the cluster approach is an almost-inevitable result
of more powerful commodity-priced computers, the blossoming of the open source
community (which provides operating systems, management tools, and many of the
applications), and a breaking away from centralized control of computing power.
The latter, especially, can't be underestimated. Although the computing resources
are maintained and managed in a single location, access to them is an almost-democratic
undertaking at some cluster computing sites, such as CCR. All details of the
systems are publicly displayed and updated constantly on the center's Web site-including
the performance of individual systems, which jobs are running on particular
nodes, the status of the queue for computing activities, and even the comings
and goings of staff, researchers, students, visitors, etc. at various parts
of center as broadcast on Webcams.
A Quick History of Cluster Computing
In its essence cluster computing hasn't really changed much since its introduction
in 1993 with the launch of projects like Beowulf. Donald Becker and Thomas Sterling
met at MIT with a common interest: to figure out how to use commodity-based
(read: low-cost) hardware as an alternative to large supercomputers. According
to Phil Merkey, an assistant professor in Mathematics and Computer Science at
Michigan Technological University, the prototype system consisted of 16 DX4
processors connected by Ethernet. The machine was 'an instant success'
and the idea of commodity off the shelf-based systems 'quickly spread through
NASA and into academic and research communities.'
At the National Center for Supercomputing Applications John Towns recalls working
with the computer science department at the University of Illinois at Urbana-Champaign
then with vendors-most notably IBM-to design and build a cluster consisting
of 1,024 nodes using Pentium 3 chips. NCSA's goal was to find a 'high performance
solution.' The result was nicknamed Platinum by the center. IBM then turned
around and productized the cluster, which became the IBM Cluster1300.
The Racks of CCR
Says CCR's Miller, 'Our knee-jerk reaction to any quest is to say, 'Yes.''
The mission of the resources at CCR is to support high-end computing, visualization,
data storage and networking to 'enable discovery in the 21st century.'
Miller says his group supports work in computational science and engineering for
the campus, as well as a 'whole host of endeavors, whether they're at the
university, corporate partners, local companies, local government agencies or
what have you.' That includes a recent animation project for MTV2 (
www.mtv2.com)
(which had contracted with a Buffalo company that, in turn, contracted with the
center), hosting workshops for local high school students to learn about computational
chemistry and bioinformatics, as well as other work that will benefit the local
community.
Then there's the research, currently about 140 different projects, ranging
from vaccine development, to magnetic properties modeling, to predicting better
ways to make earthquake-resistant material.
One project Miller describes that is being run on the clusters is trying to
understand water flow. 'We have a group in civil engineering who are modeling
the Great Lakes and the Great Lakes region,' says Miller. 'What they've
done is develop algorithms that can take advantage of massive parallelism...
[to] get a better and finer understanding of the Great Lakes and how the water
moves and how it flows than was ever possible before, by orders of magnitude.
For example, you can imagine somebody putting a contaminant in the Great Lakes
and you say to yourself, 'OK, depending on how people are watering their lawns,
taking showers, washing their cars in Buffalo, Rochester, Syracuse and so on,
when and how bad will the contaminants be when they hit New York City?''
CCR has multiple Dell clusters, but the two largest ones are Joplin, with 264
nodes running Red Hat Linux (www.redhat.com),
and U2, the newest system, with 1,024 nodes (with main memory varying from 2
to 8 gigabytes on each of the nodes on U2). U2 is also running RedHat Linux
though a different version. Miller has the honor of naming the machines and
the current naming scheme honors inductees to the Rock and Roll Hall of Fame.
The front ends to the system are named Bono (vocalist for U2) and The Edge (guitarist);
two administrative servers that help keep track of and monitor the system are
called Larry (drums) and Adam (bass) because, he says, they 'keep everything
running.'
Currently, only about two-thirds of U2 is powered up at any one time, since
the system is still going through testing. Once the Center gets the older system,
Joplin, updated with the same version of RedHat that U2 is running, says Miller,
it'll be integrated into U2's queuing system. 'The [capacity] computing
jobs will be automatically routed to what was called Joplin-those nodes; the
[capability] computing jobs will be automatically routed to U2; and it'll be
transparent to the users. It'll be one queue that's essentially an Intel (www.intel.com)
Pentium line of chips.' The only change for the users, explains Miller,
is that they'll need to answer a couple of extra questions about their projects
up front-such as 'whether the code they need to run is 32-bit or 64-bit
compatible, which means it can run on both places-or if it's just 32-bit or
just 64-bit. And we'll be able to route things to the appropriate nodes.'
Getting Access to the Power
To gain access, CCR has 'basic minimal requirements,' says Miller.
The user describes the project and required resources in terms of storage and
compute power.
'We support research, discovery, and scholarship that requires
high-end computational resources. That's obviously a moving target.' In
fact, he says, every six months or so, the center redefines what it means by
'high-end' in terms of data, networking, visualization, or computing
requirements. If a project requires fewer than, say, 16 processors running concurrently,
Miller and his team will probably kick the request back to the individual to
take back to his or her lab, department, or school for handling. An advisory
committee evaluates troublesome proposals, but most of the time, the decisions
are 'obvious.'
At NCSA, the process for gaining access is more formal. The proposal process
is modeled after the NSF process, says Towns, which involves-for large requests-a
peer review performed by a national review committee that meets quarterly. These
are proposals requiring in excess of 200,000 CPU hours per year. Smaller requests-from
10,000 to 20,000 hours of time-are considered 'development accounts,'
for start-up projects. Reviewed and rewarded continuously, the smaller accounts
allow researchers to try out their applications on the system and understand
performance characteristics in preparation for submitting larger proposals.
What's a CPU hour in cluster terms? According to Towns, it's equivalent to
one hour of time on a single processor node. Since these are dual processor
nodes, there's a total of 2,480 processors on Tungsten. If a project is running
on 64 nodes, which is 128 processors, and it runs for one hour, the user has
accumulated 128 service units or CPU hours.
Neither organization charges its academic users for the time they use on the
clusters. In the case of NCSA, Towns says, they're granted allocations of time
as part of grant awards, and those allocations are billed from their usage.
CCR considers its clusters part of the 'university infrastructure,'
says Miller, 'to support leading-edge science research.'
Both centers also attract funds from academic users in situations where they've
budgeted compute time in their grant proposals to cover expanded demands on
staff time; compensation also comes from commercial users that make use of the
resources.
The NCSA has about 100 staff members and another '15 or 20' graduate
students working in its center to provide 24/7 support for its community of
users. CCR has a technical team of 13, consisting of operations people and computational
scientists. The former, system administrators, keep the systems running, and
the latter work closely with users to figure out, for example, what applications
are needed for a particular project or to help optimize code.
Miller estimates that about half of the applications running on the CCR computers
are off-the-shelf-code that has been paid for or is freeware or shareware. The
other half is 'home-grown.' In the case of NCSA, Towns says, 'By
and large [the majority of our users are faculty researchers] using applications
they've developed to solve the problems that they're attacking.'
What It Takes To Work in Academic
Computing
Every wondered what it takes to work in a high-performance computing environment?
According to John Towns, senior associate director, Persistent Infrastructure
Directorate at NCSA, there's no other job that can prepare you-you arrive with
particular qualifications then get on-the-job training.
At NCSA, to obtain a position on the academic professional staff, you're required
to have a Bachelor's in Science. Frequently, staff members have worked in a
research environment, possibly as a graduate student. Their backgrounds may
include hard sciences, or engineering, or computer science. All have 'an
affinity for computers' and an interest in the high end-whether that is
'compute systems, networks, visualization, or data storage,' Towns
says. Frequently, they don't work normal hours, which is an advantage in an
environment that runs 24 hours a day.
Shared-Memory Machines vs. Clusters
One misunderstanding that can crop up about clusters is that they replace the
old-style mainframe-type or mass-storage computers. In reality, each setup is
advantageous to a specific type of computing work. 'Gene sequencing is
fairly trivially parallelized. You can spread it across a cluster and use the
resources well,' Towns says. In other words, every processor is, practically
speaking, running a separate copy of the application, and no processor needs
to talk much to the other applications.
CCR's Miller refers to this as 'capacity computing.' In order for
a scientist to solve a certain problem, they may need to run a thousand or ten
thousand simulations, each of which is best run on a single CPU... It's the
aggregate of all those results that will solve their scientific problems.'
Another class of programs runs as a single application. As Towns explains it,
'Imagine that you're running simulations, and... you create a grid that
represents a space and something happens in that space. Often where something
interesting is happening, you need to redefine the spacing on the grid to accurately
represent what's [going on]. A class of applications has been developed that
in a dynamic way redefines the spacing in the grid where it needs to... If you
try to represent the entire grid at the finest resolution, you don't have a
big enough memory machine to do it. What you do is refine it where it's necessary...
You have some nodes that have a lot of work to do and some that don't. In a
shared memory system, you can easily redistribute that work among the processors,
so you can keep them all busy together, and move the application along much
more quickly.'
TeraGrid
A lot of research in grid computing is currently taking place. The TeraGrid
is an effort by the National Science Foundation (www.nsf.gov)
to build and deploy the world's largest distributed infrastructure for open scientific
research. In practical terms that means developing a common environment and interface
to the user community to a rather diverse set of high-performance computing resources-including
clusters. In some cases it also involves hooking up the physical systems to be
used in conjunction with each other and providing environments that are similar
so that researchers can move between systems more easily.
As John Towns, senior associate director, Persistent Infrastructure Directorate
at NCSA, explains, many researchers have multi-stage applications that require
different kinds of computing architectures to solve. 'So the TeraGrid is
attempting to facilitate the use of these multiple architectures-often sited
at different locations-to support their research efforts.'
Learn more about the TeraGrid at www.teragrid.org. Learn more about grid computing
at www.ggf.org.
Computer Years
Interestingly, both types of large systems have a fairly short life-about three
to five years, usually on the lower end, according to Towns. After that, the
maintenance and operational costs for the hardware becomes too high and it's
time to 'simply buy a new system.' The NCSA looks at a doubling of
computational resources every one to two years, and that's matched by demand.
That means the center is in some stage of acquiring new equipment every year.
NCSA expects to submit a proposal looking for either a $15 million or $30 million
system, with responses due in February. The winning bidder (Towns says there
are typically only three or four companies bidding) will be required to have
its solution in 'substantive production state' by about March 2007.
From there, he says, 'there's the whole procurement process, deployment,
testing, and putting it into production.
The installation and testing process has many stages and is time-consuming.
In the case of Tungsten, the Dell cluster at NCSA, the center received the hardware
over the course of two months. During that time, it was arriving on large trucks,
being unpacked, being set up, and then configured and tested. But that was preceded
by several months of software and hardware testing in-house at Dell.
Once the installation at the client site took place, NCSA did a lot of applications
testing to verify that the system actually worked. Then, before it went to production
state, the center opened up the equipment to what Towns calls the 'friendly
user period.' This lasts about two months and allows the general user community
to compile and test their applications. There's no charge to their time allocations
for this, but they also need to understand that it might be unstable until all
the issues are worked out. 'It's a good way for us to shake down the system
before we go to production,' says Towns.
For the cluster installation at CCR, Dell was the prime contractor, and it
subcontracted aspects of the project to other vendors, including Myricom Myrinet
(www.myri.com) network communications,
EMC (www.emc.com) storage arrays,
Force 10 Networks (www.force10networks.com)
for switch/routers, and IBRIX (www.ibrix.com)
for file storage. When it was delivered, the prime orchestrated installation,
'making sure the right people from the right vendor showed up at the right
time,' says Miller. Why Dell? When CCR went out to bid, Miller recalls,
'We met with all the vendors and had discussions, and Dell was clearly
head and shoulders above anything we were looking for at that point in time.'
Managing clusters is mostly automated through job schedulers, batch schedulers,
and other resource managers and monitors. At CCR, for example, 'Larry'
and 'Adam,' the aforementioned administrative nodes, monitor all of
the cluster nodes and 'continually ask them, 'Are you still up? Are you
still running? Are you healthy?'' says Miller. When problems arise-a file
system gets full or a network link g'es down-the human system administrators
get notified. If a node ceases to work or a power supply 'explodes,'
he says, the job scheduler will continue scheduling, but not on that node.
The Edges Keeps Moving on High-Performance Computing
Clusters are by no means the final word in high-performance computing. As Towns
points out, each research community 'has a set of applications that have
different system architecture requirements in order to execute and perform well.'
Recently, the NSF issued a solicitation for systems totaling $30 million. In
performance terms, according to Miller, that's 'roughly 10 times U2's speed.'
Then four years from now they want it be '100 to a thousand times what
U2's speed is.' As he points out, 'There's simply no shortcut in terms
of solving some of these big-what they used to call 'grand challenge'-problems
without big machines. If you're looking at whatever it may be-the physics of
the universe, or biochemical processes in the brain, or analyzing the spread
of infections... they just require massive amounts of computing power.'
From humble beginnings as commodity devices, equipment that once only existed
on the desktop will continue proving its mettle in dazzling displays of high-performance
computing clusters.