Purdue University

By centrally managing its supercomputing clusters, this Big 10 research university provides more power to more faculty with more efficiencies at less cost.

“It would be simple to buy more equipment in order to deliver more resources, or conversely to reduce our IT budget in tough economic times by buying less equipment. But to reduce costs while simultaneously providing one of the nation’s leading cyberinfrastructures for research requires new ways of working.”

That’s Gerry McCartney, vice president of information technology and CIO at Purdue University (IN), speaking about the institution’s Community Cluster Program—a centralized, supercomputing cluster operation that represents a game-changing way of providing high-performance computing power more efficiently, to more users, at a lower cost.

Developed by ITaP (Information Technology at Purdue), the idea was to pool the nodes purchased by several individual departments or researchers to build a very large central cluster. The end result is a cluster with a total compute power greater than any single user could purchase, significantly lower costs, and much higher utilization of resources.

Purdue’s first large community cluster, named Steele, was built in 2008, and the second, a larger cluster called Coates, was built in 2009. Together, Coates and Steele saved about $1.3 million in hardware costs (given the group pricing structure); they deliver more than 160 teraflops of processing power; and in 2009 alone they ran more than 6.9 million jobs in nearly 67 million compute hours. A third computer, Rossmann, coming into production in August 2010, will increase capacity by another 50 percent. Purdue’s cluster approach has resulted in two TOP500-class supercomputers that distinguish the institution as one of the largest supercomputer centers run exclusively by a university.

Vendor & Product Details
AMD: amd.com
Cfengine: cfengine.org
Chelsio Communications: chelsio.com
Cisco Systems: cisco.com
HP: hp.com
Red Hat: redhat.com

The storage, internetworking fabric, racking, and cooling—all things required to run the cluster—are provided by central computing. ITaP staff also administers and maintains the supercomputers, including end-user support, software updates, security, data storage, and backups. Coates is currently the largest entirely 10 gigabit Ethernet academic cluster in the world, consisting of 985 8- and 16-core HP systems with AMD 2380 or 8380 processors connected with Cisco and Chelsio Communications networking equipment.

By carefully balancing compute hours among many users, these valuable central computing resources are kept busy an impressive 95 percent of the time. Faculty investors always have access to their nodes, of course, but when they are not using them, their nodes are available to others at Purdue or are shared with external researchers through the TeraGrid and Open Science Grid distributed computing infrastructures. More than 14 million compute hours on the clusters were allocated to off-campus researchers in 2009.

Purdue and ITaP have a strong track record of finding ways to leverage computing power that would otherwise go unused. The institution was recognized with a 2009 Campus Technology Innovators award for DiaGrid, a distributed computing strategy that harvests idle CPU time from desktops in offices and laboratory computers at Purdue and other campuses, and applies the reclaimed compute hours to research computing (see campustechnology.com/articles/2009/07/22/campus-technology- innovators-awards-2009-high-performance-computing.aspx).

ITaP uses Red Hat Kickstart, Cfengine, and a range of other open source tools to automate the process of building the clusters and installing the nodes inexpensively and efficiently. But ITaP’s use of human resources is even more impressive: Coates and Steele were each built in a day, using a unique “high-tech barn raising” approach. Through the efforts of a very large team of well-coordinated volunteers, clusters stand up fast, with the first applications being run on stacks already completed in the morning, long before the pizza arrives. (For more on the team’s speedy builds, see CT’s recent interview with McCartney: “A High-tech Barn Raising,” campustechnology.com/articles/2010/06/09/a-high-tech-barn-raising.aspx.)

About the Authors

Meg Lloyd is a Northern California-based freelance writer.

David Raths is a Philadelphia-based freelance writer focused on information technology. He writes regularly for several IT publications, including Healthcare Innovation and Government Technology.

Featured

  • closeup of person wearing abstract smart glasses

    Google Unveils Android XR Smart Glasses, Powered by Gemini AI

    More than a decade after the commercial failure of Google Glass, Google is returning to the smart-glasses market, this time betting that advances in artificial intelligence, miniaturized hardware, and conversational computing can turn wearable devices into a mainstream platform.

  • Interface buttons of Generative AI tool

    Report: No Foolproof Method Exists for Detecting AI-Generated Media

    Microsoft has released a new research report warning that no single technology can reliably distinguish AI-generated content from authentic media, and that deepening reliance on any one method risks misleading the public.

  • abstract data flow

    Google Intros New Gemini Enterprise Agent Platform

    Google Cloud has announced a new platform for building and managing enterprise AI agents, as the company seeks to turn its Gemini models and Vertex AI tooling into a broader system for automating business workflows.

  • silhouette of business person facing wall of data

    Why AI Strategy Belongs in the President's Office

    Institutions that are succeeding with AI share one thing in common, and it is not a better committee, a larger budget, or a more sophisticated technology stack. It is a president who never handed off the steering wheel.