How Network Management Speeds Research at Baylor College of Medicine

Although the research environment in which he works is highly complex, systems administrator Justin King has an uncomplicated goal for his infrastructure: "To simplify it as much as possible from an end user standpoint."

King works as the sole IT person for the Human Neuroimaging Laboratory at Baylor College of Medicine in the Department of Neuroscience in Houston, TX. This private medical school is one of the few places on earth where scientists can have access to more than a single functional Magnetic Resonance Imaging (fMRI) scanner simultaneously for their research. In fact, in early December, after major reconstruction, the lab actually added three additional scanners to the two that have been in place since 2002.


Installation of one of the scanners

King described the giant fMRI machines this way: "It's a huge, super-conducting magnet. They allow us to look inside a subject's brain by just wiggling the water molecules."


Research topics have been as diverse as understanding how trust in economic exchange works to why people might prefer Coca-Cola over Pepsi.

The brainchild of Dr. P. Read Montague, Jr., a professor of neuroscience, the lab employs about 16 people, King said, including post-doctorate and graduate students, developers, scanner technicians, and administrative staff.

The scanners apply magnetic fields to identify areas and create images of brain activity. "For example, as you listen to music, the region of the brain that processes auditory input will 'light up,'" said King. "Oversimplified, that's what the MRI exploits as we're doing our experiments."

A Piece of the Network
The college has a central enterprise server department that handles services, including the overall network setup and management. "For other stuff, we need to be more flexible than the corporate environment allows," said King. "That's just part of research." The lab--one of the biggest on campus--needs to change its server and storage infrastructure frequently to address the demands of its research projects. Those fast reconfigurations fall into King's job description.


Justin King, systems administrator, Human Neuroimaging Laboratory, Baylor College of Medicine

The lab has about 30 servers, running Microsoft Windows Server 2003 and open source Linux operating system CentOS, primarily on Dell hardware. About a third of those servers are attached to a storage area network (SAN) running controllers from IBM and EMC. Switches are from QLogic. King also manages about 100 desktop computers, loaded with either Windows XP or RedHat Enterprise Linux 4 Desktop. He's also seen a recent influx of Macs.

Anything of a sensitive nature, according to King, has two-factor biometric access. The user enters an access ID then has his or her hand analyzed for further authentication.

The fMRI machines themselves aren't connected directly to the network. Made by Siemens, the machines link to the network through gateway PCs connected to each scanner and running a specialized version of Windows XP.

Time-consuming Research
When a scientist performs an experiment, a vast amount of raw data is generated in the form of medical images. In order to be able to make sense of the data, the researcher needs to massage the images.

That preprocessing step, said King, involves doing a "normalization" of the images. "It pushes and pulls the images so that they overlay on top of a 'perfect' brain. This way, we can compare results across people. If you've got 15 people in a study, everybody's brain is a little different, so you have to normalize the brain to a 'perfect' standard."

Previously, the normalization work was done on a desktop PC, one study subject at a time, in a sequence. The scans for those 15 study subjects would require a total of about 15 hours of preprocessing time before the analysis could begin.

A couple of years ago, the lab figured out that a bit of automation could speed up that part of the work. It purchased a 32-node, 64-processor cluster from IBM to run Statistical Parametric Mapping (SPM), open source software especially well suited to this type of analysis. "Now, you just go into our Web interface, pick people you want, submit it, and off you go," said King. "An hour later, [the information] is preprocessed, and you're ready to start analyzing. The computationally heavy part is off-loaded."

Another challenge: managing storage. A single research project could require terabytes of storage space. What King didn't want was a researcher being midway through a project and discovering that he or she had run out of storage space.

"I'd rather come to them and say, 'You're at 95 percent full on this project. Is there a way to remove some stuff? Or do we need to add some more space to this project?'" he said. "These scientific types, they want to work when they want to work. IT shouldn't get in the way of their efforts--it should facilitate them. They shouldn't have to spend hours to move [files] between their local machines and the network so they have space to do things."

Improving Network Management
That desire for proactive management of network resources led King to try out OpenNMS, an open source network management program. The problem, he said, was that the program did a lot with SNMP. "What you ended up with was a whole bunch of false positives, a whole bunch of alerts. That's not something I needed at two in the morning."

It also required what King calls a "fair amount" of customization. "You'd need to get into the guts of it and tear things apart."

He realized that to configure the software properly would probably require a couple of dedicated months. So he sold management on the idea of finding an alternative solution that would cost less than two months of his time.

A casual mention on slashdot.org led King to check out Hyperic HQ, open source systems and application management software. He tried a free version and could see "immediate value." Calling it "ridiculously easy to install," King was sold. He contacted the company and negotiated a purchase of the enterprise version--about $500 per "platform." A platform, said King, is a term coined by the company, and it's roughly equivalent to two physical CPUs.

"It gives me just what I need, not more than I need," King explained. "It took me two hours to set up the whole environment. Literally, two hours." Now, he said, he has a really good view of what's going on, without being overwhelmed with system data. "It gives me the ability to be way out ahead of whatever's coming at me."


Hyperic's monitoring functionality

Also, the caliber of the alert fits the situation. When there's a catastrophic problem--such as switch or storage controller going bad--the software will inform him. "I have it set up to text message me at any time. But if one of my clustered web servers goes down in the middle of the night, it'll send me email because the other one is still working."

Another advantage: It's agent-based. "It doesn't go out and try to blast the whole network," King said. "You can say, 'I'm going to put it on this server.' The agent has the intelligence to say, 'I'm going to start looking for processes on the machine. If I identify something that I recognize, I'm going to gather metrics on that.'" If it's a domain controller, for instance, it'll gather statistics for LDAP connections, directory requests and the like. "It's all about what information do I want and what is too much?"

Finally, the network analysis provided by Hyperic encompasses open source products, a real draw for a lab that uses lots of open source software. "Most of our database-driven applications run off a PostgreSQL backend," King said. "This includes our primary data collection database, which is somewhere in the neighborhood of 300 gigabytes." As King pointed out, while many products--especially those from the largest vendors--will inform the sys admin that a database is available or not, few will actually dive into the tables and spell out specifics.

King said he's hoping a future version will automate the configuration of adding new hardware to his network. For example, currently when a new drive is added, he has to inform the software to add the new hardware to the alert group that tells him when a disk is 95 percent full. He'd prefer that it recognize the type of hardware being added and configure it with parameters that match similar devices.

But he said he's content until then. "Most importantly, it does exactly what I need it to do. And that's really valuable."

That's crucial, he said, because the lab is only growing. With the addition of the new scanners, Baylor has committed to hiring additional faculty, all with their own funding and expectations of using the lab like a service center. Said King, "I went from one boss to five in the last year."

Those new hires will also bring in their own hardware, which will challenge the lab's technical infrastructure. King isn't worried. "I can throw a Hyperic agent on there. Now I know exactly what to expect. I don't have to worry about trying to figure out what happened to all the disk space."

Featured