A High-Tech Barn Raising
A Q & A with Purdue's Gerry McCartney
At Purdue University, the Community Cluster Program is changing the way high-performance computers are built and maintained. By pooling their resources and contributing nodes to a centralized facility, faculty and researchers have access to much greater computing power that’s maintained for them by central computing staff. And they are invited to join about 300 people who roll up their sleeves and help construct a supercomputer in a single day’s “barn raising” event. Campus Technology asked VP of IT/CIO Gerry McCartney how it all works…
Campus Technology: It’s intriguing that you can have an old-fashioned “barn raising” to put a high-tech supercomputer together. What is a barn raising event like? What do you do with 300 volunteers?
Gerry McCartney: You make sure you’re really well organized beforehand! And let me be clear, the day of the barn raising is a little bit of a stunt. It’s done basically to show people that we can pull off complicated logistical activities that they can understand. So, we could just talk about managing systems and things that nobody understands except people inside the business--but [by coming to the barn raising] people do see that to mobilize a large group of people like this and actually have everybody working effectively requires a level of organization that most central computing organizations aren’t perceived to have. And it’s a really good morale event for the IT staff.
CT: Can you contrast that with a more usual way to build a super computer?
GM: Every research university in the country that wants one, has a cluster. But the characteristic of those clusters in general is that they are relatively small in terms of the number of buyers who come into them. So, it might be four chemical engineers or six people from structural biology… and they clump their money together and buy a machine between themselves. Part of the secret sauce here with us [in central computing] is just the difference in scale. We have about 80-something faculty involved in our program, each investing their own research funds.
And typically when big machines are ordered, they kind of come dribbling in… so you might see machinery sitting boxes, still not installed, that arrived three months ago. We decided, rather than have that happen, and because we were using other people’s money, we wanted to be able to say, “The day things arrive, you’re going to be running your jobs.”
CT: It sounds like a big commitment.
GM: It is. It’s a big, institutional deal. The president comes over, the provost comes over… So, the first thing is to make an event out of something that might have been a source of embarrassment in the past. And that’s the ability to stand up a machine. And once we have a rack built [the first rack of perhaps 20 to be finished during the day], we start running jobs on it, literally by 9:00 am the day of the barn raising event.
CT: So, how do you manage all this?
GM: The way we do it is to have one or two really good people to manage the project and all the logistics. And we do make an event of it. We put up a tent, feed everybody pizza or something… it’s often a warm day--June or July in Indiana can be pretty warm. It feels like a real barn raising.
Last year we had representatives from a number of other universities come and help us, from Indiana University, Iowa, Michigan, and Michigan State. So we don’t intend on keeping any of this secret, and we’re happy to put together a little tutorial on project management. Anybody who is interested is welcome to the information.
CT: How would you characterize the overall reaction of the research community at Purdue (and maybe beyond) to the community clusters? Is this something they end up kind of rallying around and have an actual sense of community about the clusters?
GM: Well I don’t know that there’s necessarily a sense of community in any activist sense, but what there is, increasingly, is a lot of people who didn’t have anything to do with central computing now coming to us and saying, “This looks pretty reasonable, how do I get involved, how do I invest in this?” And we haven’t lost any customers, either. So we’ve lost no customers and we’ve gained quite a few.
Are there people on campus still running their own machines? Absolutely, that will continue for a while I’m sure. But I think what we show them is that we offer a superior product at a lower price. And where we win that argument is not through an administrator like me telling them that, but when colleagues look at them and say “Are you still running your own cluster?” Things change very quickly subsequent to that.
CT: If you had to estimate, what would you say is the percentage of central HPC resources on campus, versus those held out on campus? What is the trend?
GM: That’s an interesting question. I’m going to guess about half are central now, as a reasonable number. When I started here in this job three years ago, our central research computers weren’t anything to write home about. We’ve since built two very large [TOP500-class] machines now, Steele [a Dell cluster, built 2008] and Coates [an HP cluster, built 2009], by sharing the cost.
CT: How are the costs shared?
GM: The faculty investors are paying for all the CPU nodes, and we pay for everything else: storage, internetworking fabric, systems administration, and the racking. That tends to be half the cost, what we pay. And I assured the investors that they could always come and get their nodes out.
CT: And this summer’s cluster is named?
GM: Rossmann--which we’ll build in July.
CT: Now that Coates and Steele have been in operation for some time, you certainly have some statistics to provide insight into the success of these initiatives. I’ve heard they are in use 95 percent of the time, and running at full power give you more than 160 teraflops. Of course Rossmann will only add to that.
GM: Yes. And researchers ran more than 6.9 million jobs and used nearly 67 million compute hours--more than 14 million of those were to off-campus researchers--on Coates and Steele in 2009 alone. I think we’re now the largest HPC facility in the country that isn’t one of the national centers.
[Editor’s note: Gerard McCartney is Vice President of Information Technology and CIO, and the Olga Oesterle England Professor of Information Technology at Purdue University. Purdue’s Community Cluster Program will be recognized as a Campus Technology 2010 Innovator at the Campus Technology 2010 conference, July 19-22 in Boston.]