High-Performance Computing

High-Performance Happy

More and more universities are now centralizing their high-performance computing resources — benefiting not only IT departments, but the researchers, too.

High-Performance Computing Traditionally, the high-performance computing systems used to conduct research at universities have amounted to silos of technology scattered across the campus and falling under the purview of the researchers themselves. But a growing number of universities are now taking over the management of those systems and creating central HPC environments—a move that is returning benefits in time, money, and resources for both the university and its researchers.

Henry Neeman, director of the University of Oklahoma Supercomputing Center for Education & Research (OSCER), puts it plainly: “I’ve been seeing a growing trend in centralized HPC for two reasons: capability and practicality.” He explains, “When it comes to capability, you have to consider: What is the largest job you can run on a given machine? There are particular large jobs you can’t run on a system that doesn’t have a lot of capability. And there are the practicalities regarding cooling, space, power, and labor. If you have dozens of systems dedicated to HPC, you can’t just stick them in the closet anymore.”

Notably, the movement of cyberinfrastructure to central management (which includes high-performance computing, computer clusters, and the underlying network), has been gathering speed as more universities are making research a vital part of their institutional identity. On point, the July 2006 report IT Engagement in Research, issued by the Educause Center for Applied Research, highlights the importance that some universities are placing on research. “Many universities have made public bets that they will break into the top echelons of research institutions, and this has set off an arms race to find new sources of funding, to construct new research centers, and to attract star researchers with proven grant-magnet abilities,” the study maintains.

As part of that drive to compete on a research level, many universities are seeking to attach themselves to regional and national research initiatives such as the Texas Internet Grid for Research and Education (TIGRE) and the National LambdaRail project. Having central management of the university’s cyberinfrastructure helps facilitate such pairings by pooling resources and creating a massive computing environment that wouldn’t be as impressive— or useful—as separate clusters distributed throughout the campus. In addition, central management provides constant monitoring and upkeep that a smaller, privately owned cluster might not enjoy.

High-Performance Computing

How many full-time equivalent (FTE) staff within central IT are currently assigned to the support of research?

Indeed, the oversight aspect of the issue is not a small one. “The most important element left out of most IT plans is the human element,” acknowledges Brian Voss, Louisiana State University CIO. “If we don’t have people to help us use the technology, we don’t get the most bang for the buck and it’s pretty much useless.”

Putting Power Behind the Research

Still, university IT directors don’t capriciously undertake centralizing management of the institutional cyberinfrastructure. Rather, most such initiatives are mandated by the CIO, with the aim of building out a hefty HPC environment that will enable the university to take a leading role in research, or at least approach such a position. An institution’s ability to attract and empower a CIO with experience in this direction can be key.

Jim Bottum, vice provost for computing & IT and CIO at Clemson University (SC), was brought on six months ago specifically to lead the charge to build such an HPC environment. Formerly CIO and VP for computing at Purdue (IN), Bottum also served as executive director of the National Center for Supercomputing Applications at the University of Illinois. “I was hired by Clemson to come and build a highperformance computing environment because I’ve been in the business for a while,” he admits, a bit coyly.

In his six months on the job, Bottum has orchestrated Clemson’s membership in the Open Science Grid, a consortium of universities, national laboratories, scientific collaborations, and software developers that utilizes 1,000 desktops in student labs for certain applications. He also has directed the College of Engineering and Science to move its clusters to the university’s center, and he has been busy buying the big iron for the center. In addition, says Bottum, as part of the Clemson University International Center for Automotive Research (CU-ICAR), Clemson will host CU-ICAR’s 10-teraflop system along with automaker BMW.

Bottum reports that Clemson will boast more than 20,000 square feet of centralized high-performance computing space when the center is completely outfitted. “And we have tremendous expansion capabilities,” he adds, disclosing that by summer 2007 the center will house in excess of 12 to 15 teraflops. “We are putting significant money into this project and that will include rearchitecting the campus network. My charge was to build an aggressive infrastructure and get involved in national initiatives, and I feel like we are making some progress,” he says.

Another goal: Bottum wants to connect Clemson to the National Science Foundation’s TeraGrid, a research supercomputing project that boasts more than 102 teraflops of computing capability, and more than 15 petabytes of online and archival data storage distributed among nine partner sites.

Centralization: Aggressive or Organic?

Clearly, Clemson is being aggressive in its pursuit of research, while other universities have taken a somewhat more organic approach. Texas Tech University, for example, has had a central HPC environment of sorts since the late 1990s, set up specifically to facilitate a major visualization project. Once funding ran out in 2001, however, the university began to look at ways to set up the resources for use by the entire campus community.

“When we first took it over, we had to look closely at the HPC program and analyze who was using it,” says Sam Segran, Texas Tech CIO. “We took a business approach and discovered not only who was using it, but who wasn’t—and if not, then why. What we discovered was that most colleges were not really using the system for visualization [the original intended use]; on the computing side, researchers don’t have the skill set to do visualization,” he says. “But the high computing—the pure data-crunching, multi-teraflop computing—that’s where we found a lot of interest. Researchers wanted to do a lot of that type of computing in a short amount of time.”

Based on that knowledge, the university set up a grid computing network for such high computing, purchased a Dell cluster, and is in the process of tripling its capacity to close to 5 teraflops. In addition, Texas Tech developed a community cluster, which five researchers have bought into. The concept behind the community cluster: The IT department manages the researchers’ systems and the institution matches the researchers’ investment, dollar for dollar. Researchers are guaranteed a certain number of nodes and they can use unused nodes whenever they need to, greatly improving their output abilities, says Segran.

High-Performance Computing

Presence of Research Unit in Central IT, by Institutional Mission

At Louisiana State, HPC efforts are comanaged by Voss’ department and by Ed Seidel, director of the university’s Center for Computation and Technology—a department that was created in 2001 as part of the Louisiana Governor’s Information Technology Initiative to advance the use of information technology in higher education and scientific research. The center has more than 26 teraflops running on eight different machines from Dell, IBM, and Atipa Technologies. Voss maintains that co-management makes sense from both planning and operations perspectives. “Seidel is Captain Kirk, and I am Mr. Scott,” he quips.

But the truth is that while the importance of research at LSU was always wellrecognized and understood, it remained a separate entity from the administration until Voss came on board in 2005. “I understood the role that HPC plays [in research as a whole], and so I started taking over the operational structure,” he says. “I believe in the importance of all elements of an IT environment.” What’s more, he says, “The value of research beyond advancing science is underrated: Research is a feeder line for teaching.”

At Princeton University (NJ), the desire to enhance the institution’s research reputation, coupled with the foresight to understand where technology in general was headed, prompted the school to rethink its earlier strategy.

“When I came on board in 2001, there was little central support of IT services,” recalls Betty Leydon, CIO and VP for information technology. “So we started canvassing the faculty and asking what they needed. We quickly realized the then-current model of individual research clusters was inefficient.” After gaining research faculty acceptance of a progressive, central IT management model (and pooling her department’s financial resources with those of individual faculty members and, later, faculty groups and individual colleges), in 2005 Princeton was able to purchase an IBM Blue Gene computer.

“When we started asking faculty members to contribute their research money toward purchasing this system, it was obvious we were doing something right because they said ‘Yes!’ and we got all the money we needed to buy Blue Gene,” Leydon recounts. “After we received the machine, everything grew outward from there. Now, researchers are advancing their work more quickly because by pooling their resources, everyone has gotten more resources than they would have been able to get otherwise, on their own.” Princeton has since been able to fund an additional two Dell clusters for research, using the same funding method, Leydon reports. The university’s HPC power is now up to 15.5 teraflops. “Once you get a model that works,” she advises, “it grows by itself. We’ve also been able to purchase centrally shared storage the same way. Faculty members used their research dollars to purchase this because those who are using it see the value.”

It’s All About the Resources

Increased HPC capacity is but one advantage researchers have realized from central management. They have also discovered that the more mundane tasks of providing proper cooling, power, security, and IT support are no longer their problem, leaving them with more resources to devote to pure research.

Researchers at the University of Oklahoma are taking advantage of the central HPC resources there (currently at 7.7 teraflops across two clusters, and scheduled to increase to 12.2 teraflops later this year), because “it is difficult to justify purchasing hardware when researchers know it’s already available at the center and they can use the money to hire another grad student instead,” OSCER Director Neeman says.

Sam Segran

"IT management of HPC takes more than just an effort to educate the researchers; there has to be buy-in on both sides." — Sam Segran, Texas Tech

And at Indiana University, “The notion of central management sprang from the researchers not having the ability, time, or expertise to run a large cluster in their own manner,” points out David Hancock, highperformance computing and research manager. “They were instead depending on other staff or graduate students to do the management, and there was such a high turnover rate that it wasn’t efficient. So we convinced them it was in their own interest to turn over management.” In addition to IT support and management—as well as the infrastructure necessary to run the systems—IU offers what Hancock refers to as “streamlined operations.”

“Researchers contribute the [grant] funds, and we make the purchases on their behalf,” he explains. “What we try to sell is centralized management and the provision of access to dedicated time if needed,” he adds. “Most of the researchers have signed on, and in some cases, some of them are willing to offer opportunistic use to other researchers.” It’s not a difficult sell to most researchers, Hancock maintains: Indiana has more than 20 teraflops on its Big Red cluster alone. And, “Each new cluster we get enables researchers to extend and take their research further than they could before,” he emphasizes.

For its part, Clemson is selling the idea of central management as one less headache for the researcher, Bottum says. “Users receive the benefits of 24/7 service, professional systems administration, security, backup—things they would expect in a data center—so the faculty and students can focus on doing research.”

Indeed, support for systems has been a major factor in getting researchers to sign on with central management. Explains Neeman, “In technology, you have two choices: ‘established,’ also known as obsolete; and ‘emerging,’ also known as broken. HPC is a ‘broken’ technology business, so you need to have full-time technology professionals to keep it going. At Oklahoma, we have professionals whose sole job it is to keep HPC resources working.”

That seems to be a trend among many universities. According to the ECAR report, 43 percent of responding institutions that consider themselves researchintensive have a research IT unit, and 47 percent of responding institutions that consider themselves balanced between academics and research have a research IT unit. Still, in some cases, the idea of central management has not been an easy sell.

“Researchers sometimes think that if high-performance computing is managed by IT, the money for it will end up being cannibalized for use by the administration, not for the HPC buildout,” LSU’s Voss explains.

Says Bottum: “At Clemson, we’ve had to establish credibility with the faculty, so we’ve had to go out and build that credibility. But we are not forcing the issue; faculty members are getting pressure from the deans and directors to turn over their systems.”

Texas Tech’s Segran concurs. “Some researchers don’t see how we can manage their systems and they will still be able to do the research they need,” he says. “So we work with them to get the right equipment and still be within their parameters.”

Oklahoma, in contrast, developed its central IT management at the behest of its research community, which was clamoring for a robust facility. “It didn’t take much selling; in fact, [the facility] was created in large part as a result of faculty groundswell to make it happen, so an internal HPC group was formed,”

Neeman recalls. “That creation coincided with the arrival of a new CIO, Dennis Aebersold, who had a strong interest in HPC and made it a priority for the IT department. That’s how OSCER came about.”

Reputation and Vision

Having a state-of-the-art research facility not only helps researchers as they strive for better results and the granting of ever-more-advanced research projects, it also helps the university cement its reputation as a world-class institution. A central IT management strategy can play a crucial role in making that happen.

Tips for building an HPC environment

Jim Bottum, vice provost for computing & IT and CIO at Clemson University, shares his top considerations for HPC from the ground up.

  1. User base. Know what your users’ needs are and tailor your architecture to meet them.
  2. Facilities. Do the diligence to assess HPC’s potential impact in all areas, including not just space but also power and cooling.
  3. Architecture. Build a balanced environment.
  4. Support. Decide what business you are in (e.g., hardware and systems administration only; application enablement and tuning; or environments), and staff accordingly.
  5. Leverage. Determine what you can leverage (such as national facilities) so that you do not duplicate efforts and are able to stretch your resources.

According to Leydon at Princeton, “A large part of our mission today is research, and, traditionally, Princeton has not been a large research university. The fact that we have been able to bring on these new systems and advance our research capabilities has been extraordinary in helping recruit new faculty.”

Texas Tech’s Segran agrees. “We now have to make the case to administrators that while we may not realize an immediate fiscal benefit to this approach, it will lead to better research opportunities and better grants, and ultimately will help the university in its standing and reputation.” In fact, centralization can put real power behind such internal marketing, argue campus IT execs. “We have noticed that for universities in which the HPC unit doesn’t report to the CIO, there are CIOs who don’t have a sympathetic ear,” Segran says, adding, “IT management of HPC takes more than just an effort to educate the researchers; there has to be buy-in on both sides. For us, the relationships are there, and it’s getting better.”

But there’s even more to making central IT management work, according to Princeton’s Leydon. Farsightedness is crucial. “You have to understand the landscape and where the technology is going. Plus, you need to link research and instruction. These things are integral to making the right decisions.” And the “right” decisions are part and parcel of a CIO with vision. As Leydon puts it: “It’s simply not an accident that, in many universities, central computing is now taking a larger role in supporting research.”

::WEBEXTRA :: More institutions flying high with supercomputing :: Cyberinfrastructure for the humanities.

comments powered by Disqus