2009 Campus Technology Innovators: High-Performance Computing
- By Mary Grush, Matt Villano
THE PURDUE DIAGRID TEAM, left to right: Andrew Howard, Phillip Cheeseman, John Campbell,
David Braun, Preston Smith, Carol Song.
Innovator: Purdue University
At Purdue University (IN), the demand for computing by science
and engineering faculty has increased at a far faster rate
than the budget for new computing hardware. Meanwhile,
most computers, even multimillion-dollar supercomputers, are
only in use about half of the time. By capturing these unused
cycles, DiaGrid provides millions of hours of computation that
would otherwise be wasted, without additional technology or
facilities purchases. (DiaGrid began in 2004 as a Purdue
West Lafayette campus system known as BoilerGrid, and was
renamed in 2008 with the addition of several other campuses,
including Indiana University, the University of Notre
Dame (IN), Indiana State University, Purdue's Calumet
and North Central regional campuses, and Indiana University-
Purdue University Fort Wayne.)
The idea of reclaiming wasted computing cycles by putting
idle machines to work in a distributed computing grid is not new.
The notion was even popularized
by SETI@home, which
recruited ordinary home computers
to join in the hunt for
extraterrestrials while their
owners slept. But no other
grid project has ever before attempted to pool the
wide variety of hardware systems represented in DiaGrid.
Among the resources tapped: computers in campus labs,
offices, server rooms, and high-performance research computing
clusters, running a variety of operating systems. Now at more
than 24,000 processors (and growing) across multiple campuses,
the sheer size of the pool also sets DiaGrid apart. It provided
more than 16 million hours of computation in 2008.
DiaGrid is based on Condor, free open source software
developed at the University of Wisconsin that supports
high-throughput computing on large collections of distributed,
cross-platform computing resources. It also relies on Cycle
Computing's CycleServer tool for many of the administrative
aspects of managing and using a Condor pool, as well as
Batch System Pro from PBS GridWorks for scheduling jobs.
And DiaGrid takes advantage of high-speed connectivity via
I-Light, the fiber-optic state network connecting Indiana
campuses, along with national research networks such as
Internet2 and National LambdaRail.
DiaGrid has been used at Purdue in a variety of demanding
research projects, such as imaging the structure of viruses at
near-atomic resolutions; simulating the Oort Cloud in an effort
to understand the early stages of the solar system's formation;
projecting the reliability of Indiana's electrical supply; and
modeling the spread of water pollutants. Other applications
have included a system to help create a virtual version of a
pharmacy clean room for training student pharmacists, and a
fly-through animation of a proposed satellite city that
could serve as a refuge for Istanbul, Turkey, in the event
of a catastrophic earthquake. DiaGrid provides computational
resources to researchers on both the Open
Science Grid and the TeraGrid.
Currently the centralized equivalent of DiaGrid would
be a cluster supercomputer costing more than $3 million,
taking up 2,000 square feet of floor space, and ranking
among the top 100 supercomputers worldwide. And Dia-
Grid provides its compute power entirely from existing
computing resources that would otherwise be wasted.
Project lead John Campbell, associate vice president for
information technology at Purdue, has DiaGrid's next
foreseeable goal in sight: to add more partners and reach
a pool size of 100,000 processors in 2009.
Gerry McCartney, Purdue's vice president for information
technology and chief information officer, says
DiaGrid will continue to build and expand. "We named
this national computing grid DiaGrid after the type of
girder arrangement used in modern skyscrapers,"
McCartney says. "It's an apt metaphor. We're building a computing infrastructure that scientists and
engineers can use to make monumental discoveries.
DiaGrid is a new, national resource
for research. Experiments will be conducted
using this computing grid that could not have
been done before."
The centralized equivalent
of DiaGrid would be a $3
million supercomputer and
take up 2,000 square feet
of floor space.
Mary Grush is Editor and Conference Program Director, Campus Technology.
Matt Villano is senior contributing editor of this publication.