Home > Computing Clusters continued, page 2 of 3

About

Computing Clusters continued, page 2 of 3

11/21/2005

The NCSA has about 100 staff members and another "15 or 20" graduate students working in its center to provide 24/7 support for its community of users. CCR has a technical team of 13, consisting of operations people and computational scientists. The former, system administrators, keep the systems running, and the latter work closely with users to figure out, for example, what applications are needed for a particular project or to help optimize code.

Miller estimates that about half of the applications running on the CCR computers are off-the-shelf-code that has been paid for or is freeware or shareware. The other half is "home-grown." In the case of NCSA, Towns says, "By and large [the majority of our users are faculty researchers] using applications they've developed to solve the problems that they're attacking."

What It Takes To Work in Academic Computing

Every wondered what it takes to work in a high-performance computing environment? According to John Towns, senior associate director, Persistent Infrastructure Directorate at NCSA, there's no other job that can prepare you-you arrive with particular qualifications then get on-the-job training.

At NCSA, to obtain a position on the academic professional staff, you're required to have a Bachelor's in Science. Frequently, staff members have worked in a research environment, possibly as a graduate student. Their backgrounds may include hard sciences, or engineering, or computer science. All have "an affinity for computers" and an interest in the high end-whether that is "compute systems, networks, visualization, or data storage," Towns says. Frequently, they don't work normal hours, which is an advantage in an environment that runs 24 hours a day.

Shared-Memory Machines vs. Clusters
One misunderstanding that can crop up about clusters is that they replace the old-style mainframe-type or mass-storage computers. In reality, each setup is advantageous to a specific type of computing work. "Gene sequencing is fairly trivially parallelized. You can spread it across a cluster and use the resources well," Towns says. In other words, every processor is, practically speaking, running a separate copy of the application, and no processor needs to talk much to the other applications.

CCR's Miller refers to this as "capacity computing." In order for a scientist to solve a certain problem, they may need to run a thousand or ten thousand simulations, each of which is best run on a single CPU... It's the aggregate of all those results that will solve their scientific problems."

Another class of programs runs as a single application. As Towns explains it, "Imagine that you're running simulations, and... you create a grid that represents a space and something happens in that space. Often where something interesting is happening, you need to redefine the spacing on the grid to accurately represent what's [going on]. A class of applications has been developed that in a dynamic way redefines the spacing in the grid where it needs to... If you try to represent the entire grid at the finest resolution, you don't have a big enough memory machine to do it. What you do is refine it where it's necessary... You have some nodes that have a lot of work to do and some that don't. In a shared memory system, you can easily redistribute that work among the processors, so you can keep them all busy together, and move the application along much more quickly."