Home > Computing Clusters: Sometimes You Can't Make It On Your Own

Mystery Content

Computing Clusters: Sometimes You Can't Make It On Your Own

11/18/2005

As John Towns, senior associate director, Persistent Infrastructure Directorate at NCSA, explains, many researchers have multi-stage applications that require different kinds of computing architectures to solve. 'So the TeraGrid is attempting to facilitate the use of these multiple architectures-often sited at different locations-to support their research efforts.'

Learn more about the TeraGrid at www.teragrid.org. Learn more about grid computing at www.ggf.org.

Computer Years
Interestingly, both types of large systems have a fairly short life-about three to five years, usually on the lower end, according to Towns. After that, the maintenance and operational costs for the hardware becomes too high and it's time to 'simply buy a new system.' The NCSA looks at a doubling of computational resources every one to two years, and that's matched by demand. That means the center is in some stage of acquiring new equipment every year.

NCSA expects to submit a proposal looking for either a $15 million or $30 million system, with responses due in February. The winning bidder (Towns says there are typically only three or four companies bidding) will be required to have its solution in 'substantive production state' by about March 2007. From there, he says, 'there's the whole procurement process, deployment, testing, and putting it into production.

The installation and testing process has many stages and is time-consuming. In the case of Tungsten, the Dell cluster at NCSA, the center received the hardware over the course of two months. During that time, it was arriving on large trucks, being unpacked, being set up, and then configured and tested. But that was preceded by several months of software and hardware testing in-house at Dell.

Once the installation at the client site took place, NCSA did a lot of applications testing to verify that the system actually worked. Then, before it went to production state, the center opened up the equipment to what Towns calls the 'friendly user period.' This lasts about two months and allows the general user community to compile and test their applications. There's no charge to their time allocations for this, but they also need to understand that it might be unstable until all the issues are worked out. 'It's a good way for us to shake down the system before we go to production,' says Towns.

For the cluster installation at CCR, Dell was the prime contractor, and it subcontracted aspects of the project to other vendors, including Myricom Myrinet (www.myri.com) network communications, EMC (www.emc.com) storage arrays, Force 10 Networks (www.force10networks.com) for switch/routers, and IBRIX (www.ibrix.com) for file storage. When it was delivered, the prime orchestrated installation, 'making sure the right people from the right vendor showed up at the right time,' says Miller. Why Dell? When CCR went out to bid, Miller recalls, 'We met with all the vendors and had discussions, and Dell was clearly head and shoulders above anything we were looking for at that point in time.'



Recommended Reading
Related Articles

send e-mail link to articleEmail this article

print this articlePrintable Format