Caltech and Partners Set Data-Transfer World Record

A team of physicists, computer scientists, and network engineers led by the California Institute of Technology (Caltech)--with partners from a number of other universities and science organizations--set new records for sustained data transfer among storage systems during the SuperComputing 2008 conference held in Austin, TX.

The effort achieved a bidirectional peak throughput of 114 Gbps and a sustained data flow of more than 110 Gbps among clusters of servers at the conference itself as well as Caltech, Michigan, CERN in Geneva, Fermilab in Batavia, Brazil, Korea, Estonia, and locations in the USLHCNet network in Chicago, New York, Geneva, and Amsterdam. The demonstration was intended to show that a well designed and configured single rack of servers is capable of saturating the highest-speed wide-area network links in production use today, which have a capacity of 40 Gbps in each direction.

The setup, which took three days to build, used a dozen 10-Gbps wide-area network links to feed data to the event and 14 different providers to maintain connections to external servers, as well as equipment encompassing two Cisco 6500E series switch-routers, and a hundred 10 gigabit Ethernet server interfaces provided by Myricom and Intel, two fiber channel S2A9900 storage platforms provided by DataDirect Networks outfitted with 8 Gbps host bus adapters from QLogic, along with five X4500 and X4540 disk servers from Sun Microsystems. The computational nodes consisted of 32 widely available dual-motherboard Supermicro servers housing 128 quad-core Xeon processors on 64 motherboards with a like number of 10-GbE interfaces, as well as Seagate SATA II disks providing 128 terabytes of storage.

A key element in the demonstration was Fast Data Transfer (FTD), an open-source Java application based on TCP, developed by the Caltech team in collaboration with the Politehnica Bucharest team. FTD runs on major platforms and works by streaming data across an open TCP socket, so that a large data set composed of thousands of files, as is typical in high-energy physics applications, can be sent or received at full speed, without the network transfer restarting between files, and without any packets being lost. FDT works with Caltech's MonALISA system to dynamically monitor the capability of the storage systems, as well as the network path, in real time, and sends data out to the network at a moderated rate that is matched to the capacity (measured in real time) of long-range network paths.

FDT was combined with an optimized Linux kernel, known as the "UltraLight kernel," provided by Shawn McKee, and the FAST TCP protocol stack developed by Steven Low, professor of computer science and electrical engineering at Caltech, to reach its sustained throughput level of 14.3 Gbps with a single rack of servers, limited by the speed of the disks.

"This achievement is an impressive example of what a focused network and storage system effort can accomplish," said McKee. McKee is a research scientist in the University of Michigan department of physics and leader of the UltraLight network technical group involved with an experiment taking place on the world's largest particle accelerator, located at CERN. "It is an important step towards the goal of delivering a highly capable end-to-end network-aware system and architecture that meet the needs of next-generation e-science."

About the Author

Dian Schaffhauser is a former senior contributing editor for 1105 Media's education publications THE Journal, Campus Technology and Spaces4Learning.

Featured

  • diverse business people using laptops overlaid with data processing textures

    Copilot Gains Context‑Aware Agents for Teams, SharePoint and Viva Engage

    Microsoft has unveiled a public‑preview of its collaborative agents in Microsoft 365 Copilot, bringing a suite of "always‑on" agents grounded in context for channels, meetings, SharePoint sites, Viva Engage communities, and Planner workloads.

  • stylized figures, resumes, a graduation cap, and a laptop interconnected with geometric shapes

    OpenAI to Launch AI-Powered Jobs Platform

    OpenAI announced it will launch an AI-powered hiring platform by mid-2026, directly competing with LinkedIn and Indeed in the professional networking and recruitment space. The company announced the initiative alongside an expanded certification program designed to verify AI skills for job seekers.

  • cloud with binary code and technology imagery

    Report: Hybrid and AI Expansion Outpacing Cloud Security

    A new survey from the Cloud Security Alliance (CSA) and Tenable finds that rapid adoption of hybrid, multi-cloud and AI systems is outpacing the security measures meant to protect them, leaving organizations exposed to preventable breaches and identity-related risks.

  • young woman studying remotely

    Florida National University Rolls Out Virtual Work-Based Learning Opportunities

    Florida National University is partnering with online learning marketplace platform Riipen to provide its business students with work-based learning opportunities that connect classroom learning to career skills.