High-Performance Computing - Purdue University

At Purdue University (IN), the demand for computing by science and engineering faculty has increased at a far faster rate than the budget for new computing hardware. Meanwhile, most computers, even multimillion-dollar supercomputers, are only in use about half of the time. By capturing these unused cycles, DiaGrid provides millions of hours of computation that would otherwise be wasted, without additional technology or facilities purchases. (DiaGrid began in 2004 as a Purdue West Lafayette campus system known as BoilerGrid, and was renamed in 2008 with the addition of several other campuses, including Indiana University, the University of Notre Dame (IN), Indiana State University, Purdue’s Calumet and North Central regional campuses, and Indiana University-Purdue University Fort Wayne.)

The idea of reclaiming wasted computing cycles by putting idle machines to work in a distributed computing grid is not new. The notion was even popularized by SETI@home, which recruited ordinary home computers to join in the hunt for extraterrestrials while their owners slept. But no other grid project has ever before attempted to pool the wide variety of hardware systems represented in DiaGrid. Among the resources tapped: computers in campus labs, offices, server rooms, and high-performance research computing clusters, running a variety of operating systems. Now at more than 24,000 processors (and growing) across multiple campuses, the sheer size of the pool also sets DiaGrid apart. It provided more than 16 million hours of computation in 2008.

The centralized equivalent of DiaGrid would be a $3 million supercomputer and take up 2,000 square feet of floor space.

DiaGrid is based on Condor, free open source software developed at the University of Wisconsin that supports high-throughput computing on large collections of distributed, cross-platform computing resources. It also relies on Cycle Computing’s CycleServer tool for many of the administrative aspects of managing and using a Condor pool, as well as Batch System Pro from PBS GridWorks for scheduling jobs. And DiaGrid takes advantage of high-speed connectivity via I-Light, the fiber-optic state network connecting Indiana campuses, along with national research networks such as Internet2 and National LambdaRail.

DiaGrid has been used at Purdue in a variety of demanding research projects, such as imaging the structure of viruses at near-atomic resolutions; simulating the Oort Cloud in an effort to understand the early stages of the solar system’s formation; projecting the reliability of Indiana’s electrical supply; and modeling the spread of water pollutants. Other applications have included a system to help create a virtual version of a pharmacy clean room for training student pharmacists, and a fly-through animation of a proposed satellite city that could serve as a refuge for Istanbul, Turkey, in the event of a catastrophic earthquake. DiaGrid provides computational resources to researchers on both the Open Science Grid and the TeraGrid.

Currently the centralized equivalent of DiaGrid would be a cluster supercomputer costing more than $3 million, taking up 2,000 square feet of floor space, and ranking among the top 100 supercomputers worldwide. And DiaGrid provides its compute power entirely from existing computing resources that would otherwise be wasted. Project lead John Campbell, associate vice president for information technology at Purdue, has DiaGrid’s next foreseeable goal in sight: to add more partners and reach a pool size of 100,000 processors in 2009.

Gerry McCartney, Purdue’s vice president for information technology and chief information officer, says DiaGrid will continue to build and expand. “We named this national computing grid DiaGrid after the type of girder arrangement used in modern skyscrapers,” McCartney says. “It’s an apt metaphor. We’re building a computing infrastructure that scientists and engineers can use to make monumental discoveries. DiaGrid is a new, national resource for research. Experiments will be conducted using this computing grid that could not have been done before.”