High-Performance Computing
High-Performance Happy
- By Charlene O’Hanlon
- 04/01/07
More and more universities are now centralizing their high-performance computing resources — benefiting not only IT departments, but the researchers, too.
Traditionally, the high-performance computing systems used to conduct
research at universities have amounted to silos of technology
scattered across the campus and falling under the purview of the
researchers themselves. But a growing number of universities are
now taking over the management of those systems and creating central
HPC environments—a move that is returning benefits in time,
money, and resources for both the university and its researchers.
Henry Neeman, director of the University of Oklahoma Supercomputing
Center for Education & Research (OSCER), puts it plainly: “I’ve
been seeing a growing trend in centralized HPC for two reasons: capability
and practicality.” He explains, “When it comes to capability, you
have to consider: What is the largest job you can run on a given machine?
There are particular large jobs you can’t run on a system that doesn’t have
a lot of capability. And there are the practicalities regarding cooling,
space, power, and labor. If you have dozens of systems dedicated to HPC,
you can’t just stick them in the closet anymore.”
Notably, the movement of cyberinfrastructure to central management
(which includes high-performance computing, computer clusters, and the
underlying network), has been gathering speed as more universities are
making research a vital part of their institutional identity. On point, the July
2006 report IT Engagement in Research, issued by the Educause Center for
Applied Research, highlights the importance that some universities are placing on research. “Many universities have made public bets that they will break into
the top echelons of research institutions,
and this has set off an arms race to find
new sources of funding, to construct new
research centers, and to attract star
researchers with proven grant-magnet
abilities,” the study maintains.
As part of that drive to compete on a
research level, many universities are
seeking to attach themselves to regional
and national research initiatives such as
the Texas Internet Grid for Research and
Education (TIGRE) and the National LambdaRail project.
Having central management of the university’s
cyberinfrastructure helps facilitate
such pairings by pooling resources
and creating a massive computing environment
that wouldn’t be as impressive—
or useful—as separate clusters
distributed throughout the campus. In
addition, central management provides
constant monitoring and upkeep that a
smaller, privately owned cluster might
not enjoy.
How many full-time equivalent (FTE) staff within central IT are currently assigned to the support of research?
Indeed, the oversight aspect of the
issue is not a small one. “The most
important element left out of most IT
plans is the human element,” acknowledges
Brian Voss, Louisiana State University
CIO. “If we don’t have people to
help us use the technology, we don’t get
the most bang for the buck and it’s pretty
much useless.”
Putting Power Behind the Research
Still, university IT directors don’t
capriciously undertake centralizing
management of the institutional cyberinfrastructure.
Rather, most such initiatives
are mandated by the CIO, with the
aim of building out a hefty HPC environment
that will enable the university to
take a leading role in research, or at least
approach such a position. An institution’s
ability to attract and empower a CIO with
experience in this direction can be key.
Jim Bottum, vice provost for computing
& IT and CIO at Clemson University
(SC), was brought on six months
ago specifically to lead the charge to
build such an HPC environment. Formerly
CIO and VP for computing at
Purdue (IN), Bottum also served as
executive director of the National Center
for Supercomputing Applications at
the University of Illinois. “I was hired
by Clemson to come and build a highperformance
computing environment
because I’ve been in the business for a
while,” he admits, a bit coyly.
In his six months on the job, Bottum
has orchestrated Clemson’s membership
in the Open Science Grid, a consortium of universities,
national laboratories, scientific
collaborations, and software developers
that utilizes 1,000 desktops in student
labs for certain applications. He also has
directed the College of Engineering and
Science to move its clusters to the university’s
center, and he has been busy buying
the big iron for the center. In addition,
says Bottum, as part of the Clemson University
International Center for Automotive
Research (CU-ICAR), Clemson will
host CU-ICAR’s 10-teraflop system
along with automaker BMW.
Bottum reports that Clemson will
boast more than 20,000 square feet of
centralized high-performance computing
space when the center is completely
outfitted. “And we have tremendous
expansion capabilities,” he adds, disclosing
that by summer 2007 the center
will house in excess of 12 to 15 teraflops.
“We are putting significant money into
this project and that will include rearchitecting
the campus network. My
charge was to build an aggressive infrastructure
and get involved in national initiatives,
and I feel like we are making
some progress,” he says.
Another goal: Bottum wants to connect
Clemson to the National Science Foundation’s TeraGrid, a research supercomputing
project that boasts more than
102 teraflops of computing capability,
and more than 15 petabytes of online and
archival data storage distributed among
nine partner sites.
Centralization: Aggressive or Organic?
Clearly, Clemson is being aggressive in
its pursuit of research, while other universities
have taken a somewhat more
organic approach. Texas Tech University,
for example, has had a central HPC
environment of sorts since the late 1990s,
set up specifically to facilitate a major
visualization project. Once funding ran
out in 2001, however, the university began
to look at ways to set up the resources for
use by the entire campus community.
“When we first took it over, we had to
look closely at the HPC program and analyze
who was using it,” says Sam Segran,
Texas Tech CIO. “We took a business
approach and discovered not only who
was using it, but who wasn’t—and if not,
then why. What we discovered was that
most colleges were not really using the
system for visualization [the original
intended use]; on the computing side,
researchers don’t have the skill set to do
visualization,” he says. “But the high
computing—the pure data-crunching,
multi-teraflop computing—that’s where
we found a lot of interest. Researchers
wanted to do a lot of that type of computing
in a short amount of time.”
Based on that knowledge, the university
set up a
grid computing
network for such
high computing, purchased
a Dell cluster, and is in the process of
tripling its capacity to close to 5 teraflops.
In addition, Texas Tech developed a community
cluster, which five researchers
have bought into. The concept behind the
community cluster: The IT department
manages the researchers’ systems and
the institution matches the researchers’
investment, dollar for dollar. Researchers
are guaranteed a certain number of nodes
and they can use unused nodes whenever
they need to, greatly improving their output
abilities, says Segran.
Presence of Research Unit in Central IT, by Institutional Mission
At Louisiana State, HPC efforts are comanaged
by Voss’ department and by Ed
Seidel, director of the university’s Center
for Computation and Technology—a
department that was created in 2001 as
part of the Louisiana Governor’s Information
Technology Initiative to advance
the use of information technology in
higher education and scientific research.
The center has more than 26 teraflops
running on eight different machines from
Dell, IBM, and Atipa Technologies. Voss
maintains that co-management makes
sense from both planning and operations
perspectives. “Seidel is Captain Kirk, and
I am Mr. Scott,” he quips.
But the truth is that while the importance
of research at LSU was always wellrecognized
and understood, it remained a
separate entity from the administration
until Voss came on board in 2005. “I
understood the role that HPC plays [in
research as a whole], and so I started taking
over the operational structure,” he
says. “I believe in the importance of all
elements of an IT environment.” What’s
more, he says, “The value of research
beyond advancing science is underrated:
Research is a feeder line for teaching.”
At Princeton University (NJ), the
desire to enhance the institution’s
research reputation, coupled with the
foresight to understand where technology
in general was headed, prompted the
school to rethink its earlier strategy.
“When I came on board in 2001, there
was little central support of IT services,”
recalls Betty Leydon, CIO and VP for
information technology. “So we started
canvassing the faculty and asking what
they needed. We quickly realized the
then-current model of individual research
clusters was inefficient.” After gaining
research faculty acceptance of a progressive,
central IT management model
(and pooling her department’s financial
resources with those of individual faculty
members and, later, faculty groups
and individual colleges), in 2005 Princeton
was able to purchase an IBM Blue
Gene computer.
“When we started asking faculty
members to contribute their research
money toward purchasing this system, it
was obvious we were doing something
right because they said ‘Yes!’ and we got all the money we needed to buy Blue
Gene,” Leydon recounts. “After we
received the machine, everything grew
outward from there. Now, researchers
are advancing their work more quickly
because by pooling their resources,
everyone has gotten more resources
than they would have been able to get
otherwise, on their own.” Princeton has
since been able to fund an additional
two Dell clusters for research, using the
same funding method, Leydon reports.
The university’s HPC power is now up
to 15.5 teraflops. “Once you get a model
that works,” she advises, “it grows by
itself. We’ve also been able to purchase
centrally shared storage the same way.
Faculty members used their research
dollars to purchase this because those
who are using it see the value.”
It’s All About the Resources
Increased HPC capacity is but one
advantage researchers have realized from
central management. They have also discovered
that the more mundane tasks of
providing proper cooling, power, security,
and IT support are no longer their
problem, leaving them with more
resources to devote to pure research.
Researchers at the University of
Oklahoma are taking advantage of
the central HPC resources there
(currently at 7.7 teraflops across two
clusters, and scheduled to increase
to 12.2 teraflops later this year),
because “it is difficult to justify purchasing
hardware when researchers
know it’s already available at the
center and they can use the money to
hire another grad student instead,”
OSCER Director Neeman says.
"IT management of HPC takes more than just an effort to educate the researchers; there has to be buy-in on both sides." — Sam Segran, Texas Tech
And at Indiana University, “The
notion of central management sprang
from the researchers not having the
ability, time, or expertise to run a
large cluster in their own manner,”
points out David Hancock, highperformance
computing and research
manager. “They were instead depending
on other staff or graduate students
to do the management, and there was
such a high turnover rate that it wasn’t
efficient. So we convinced them it
was in their own interest to turn over management.”
In addition to IT support and
management—as well as the infrastructure
necessary to run the systems—IU
offers what Hancock refers to as “streamlined
operations.”
“Researchers contribute the [grant]
funds, and we make the purchases on
their behalf,” he explains. “What we try
to sell is centralized management and
the provision of access to dedicated
time if needed,” he adds. “Most of the
researchers have signed on, and in some
cases, some of them are willing to offer
opportunistic use to other researchers.”
It’s not a difficult sell to most researchers,
Hancock maintains: Indiana has more
than 20 teraflops on its Big Red cluster
alone. And, “Each new cluster we get
enables researchers to extend and take
their research further than they could
before,” he emphasizes.
For its part, Clemson is selling the
idea of central management as one less
headache for the researcher, Bottum
says. “Users receive the benefits of 24/7
service, professional systems administration,
security, backup—things they
would expect in a data center—so the
faculty and students can focus on doing
research.”
Indeed, support for systems has been a
major factor in getting researchers to sign
on with central management. Explains
Neeman, “In technology, you have two
choices: ‘established,’ also known as
obsolete; and ‘emerging,’ also known as
broken. HPC is a ‘broken’ technology
business, so you need to have full-time
technology professionals to keep it
going. At Oklahoma, we have professionals
whose sole job it is to keep HPC
resources working.”
That seems to be a trend among many
universities. According to the ECAR
report, 43 percent of responding institutions
that consider themselves researchintensive
have a research IT unit, and 47
percent of responding institutions that
consider themselves balanced between
academics and research have a research
IT unit. Still, in some cases, the idea of
central management has not been an
easy sell.
“Researchers sometimes think that if
high-performance computing is managed
by IT, the money for it will end up
being cannibalized for use by the
administration, not for the HPC
buildout,” LSU’s Voss explains.
Says Bottum: “At Clemson, we’ve
had to establish credibility with the
faculty, so we’ve had to go out and
build that credibility. But we are not
forcing the issue; faculty members are
getting pressure from the deans and
directors to turn over their systems.”
Texas Tech’s Segran concurs.
“Some researchers don’t see how we
can manage their systems and they
will still be able to do the research
they need,” he says. “So we work with
them to get the right equipment and
still be within their parameters.”
Oklahoma, in contrast, developed
its central IT management at the
behest of its research community,
which was clamoring for a robust
facility. “It didn’t take much selling;
in fact, [the facility] was created in
large part as a result of faculty
groundswell to make it happen, so
an internal HPC group was formed,”
Neeman recalls. “That creation coincided
with the arrival of a new CIO,
Dennis Aebersold, who had a strong
interest in HPC and made it a priority
for the IT department. That’s how
OSCER came about.”
Reputation and Vision
Having a state-of-the-art research facility
not only helps researchers as they
strive for better results and the granting
of ever-more-advanced research projects,
it also helps the university cement its reputation
as a world-class institution. A
central IT management strategy can play
a crucial role in making that happen.
Tips for building an HPC environment
Jim Bottum, vice provost for computing & IT
and CIO at Clemson University, shares his top
considerations for HPC from the ground up.
- User base. Know what your users’ needs are
and tailor your architecture to meet them.
- Facilities. Do the diligence to assess HPC’s
potential impact in all areas, including not
just space but also power and cooling.
- Architecture. Build a balanced environment.
- Support. Decide what business you are in
(e.g., hardware and systems administration
only; application enablement and tuning; or
environments), and staff accordingly.
- Leverage. Determine what you can leverage
(such as national facilities) so that you do
not duplicate efforts and are able to stretch
your resources.
According to Leydon at Princeton, “A
large part of our mission today is
research, and, traditionally, Princeton
has not been a large research university.
The fact that we have been able to bring
on these new systems and advance our
research capabilities has been extraordinary
in helping recruit new faculty.”
Texas Tech’s Segran agrees. “We now
have to make the case to administrators
that while we may not realize an immediate
fiscal benefit to this approach, it
will lead to better research opportunities
and better grants, and ultimately will
help the university in its standing and
reputation.” In fact, centralization can
put real power behind such internal marketing,
argue campus IT execs. “We
have noticed that for universities in
which the HPC unit doesn’t report to the
CIO, there are CIOs who don’t have a
sympathetic ear,” Segran says, adding,
“IT management of HPC takes more
than just an effort to educate the
researchers; there has to be buy-in on
both sides. For us, the relationships are
there, and it’s getting better.”
But there’s even more to making central
IT management work, according to
Princeton’s Leydon. Farsightedness is
crucial. “You have to understand the
landscape and where the technology is
going. Plus, you need to link research
and instruction. These things are integral
to making the right decisions.” And
the “right” decisions are part and parcel
of a CIO with vision. As Leydon puts it:
“It’s simply not an accident that, in
many universities, central computing is
now taking a larger role in supporting
research.”
::WEBEXTRA :: More institutions flying high with supercomputing :: Cyberinfrastructure for the humanities.