IT Infrastructure and Systems

Purdue University

By centrally managing its supercomputing clusters, this Big 10 research university provides more power to more faculty with more efficiencies at less cost.

“It would be simple to buy more equipment in order to deliver more resources, or conversely to reduce our IT budget in tough economic times by buying less equipment. But to reduce costs while simultaneously providing one of the nation’s leading cyberinfrastructures for research requires new ways of working.”

That’s Gerry McCartney, vice president of information technology and CIO at Purdue University (IN), speaking about the institution’s Community Cluster Program—a centralized, supercomputing cluster operation that represents a game-changing way of providing high-performance computing power more efficiently, to more users, at a lower cost.

Developed by ITaP (Information Technology at Purdue), the idea was to pool the nodes purchased by several individual departments or researchers to build a very large central cluster. The end result is a cluster with a total compute power greater than any single user could purchase, significantly lower costs, and much higher utilization of resources.

Purdue’s first large community cluster, named Steele, was built in 2008, and the second, a larger cluster called Coates, was built in 2009. Together, Coates and Steele saved about $1.3 million in hardware costs (given the group pricing structure); they deliver more than 160 teraflops of processing power; and in 2009 alone they ran more than 6.9 million jobs in nearly 67 million compute hours. A third computer, Rossmann, coming into production in August 2010, will increase capacity by another 50 percent. Purdue’s cluster approach has resulted in two TOP500-class supercomputers that distinguish the institution as one of the largest supercomputer centers run exclusively by a university.

Vendor & Product Details
AMD: amd.com
Cfengine: cfengine.org
Chelsio Communications: chelsio.com
Cisco Systems: cisco.com
HP: hp.com
Red Hat: redhat.com

The storage, internetworking fabric, racking, and cooling—all things required to run the cluster—are provided by central computing. ITaP staff also administers and maintains the supercomputers, including end-user support, software updates, security, data storage, and backups. Coates is currently the largest entirely 10 gigabit Ethernet academic cluster in the world, consisting of 985 8- and 16-core HP systems with AMD 2380 or 8380 processors connected with Cisco and Chelsio Communications networking equipment.

By carefully balancing compute hours among many users, these valuable central computing resources are kept busy an impressive 95 percent of the time. Faculty investors always have access to their nodes, of course, but when they are not using them, their nodes are available to others at Purdue or are shared with external researchers through the TeraGrid and Open Science Grid distributed computing infrastructures. More than 14 million compute hours on the clusters were allocated to off-campus researchers in 2009.

Purdue and ITaP have a strong track record of finding ways to leverage computing power that would otherwise go unused. The institution was recognized with a 2009 Campus Technology Innovators award for DiaGrid, a distributed computing strategy that harvests idle CPU time from desktops in offices and laboratory computers at Purdue and other campuses, and applies the reclaimed compute hours to research computing (see campustechnology.com/articles/2009/07/22/campus-technology- innovators-awards-2009-high-performance-computing.aspx).

ITaP uses Red Hat Kickstart, Cfengine, and a range of other open source tools to automate the process of building the clusters and installing the nodes inexpensively and efficiently. But ITaP’s use of human resources is even more impressive: Coates and Steele were each built in a day, using a unique “high-tech barn raising” approach. Through the efforts of a very large team of well-coordinated volunteers, clusters stand up fast, with the first applications being run on stacks already completed in the morning, long before the pizza arrives. (For more on the team’s speedy builds, see CT’s recent interview with McCartney: “A High-tech Barn Raising,” campustechnology.com/articles/2010/06/09/a-high-tech-barn-raising.aspx.)

comments powered by Disqus