Caltech and Partners Set Data-Transfer World Record

A team of physicists, computer scientists, and network engineers led by the California Institute of Technology (Caltech)--with partners from a number of other universities and science organizations--set new records for sustained data transfer among storage systems during the SuperComputing 2008 conference held in Austin, TX.

The effort achieved a bidirectional peak throughput of 114 Gbps and a sustained data flow of more than 110 Gbps among clusters of servers at the conference itself as well as Caltech, Michigan, CERN in Geneva, Fermilab in Batavia, Brazil, Korea, Estonia, and locations in the USLHCNet network in Chicago, New York, Geneva, and Amsterdam. The demonstration was intended to show that a well designed and configured single rack of servers is capable of saturating the highest-speed wide-area network links in production use today, which have a capacity of 40 Gbps in each direction.

The setup, which took three days to build, used a dozen 10-Gbps wide-area network links to feed data to the event and 14 different providers to maintain connections to external servers, as well as equipment encompassing two Cisco 6500E series switch-routers, and a hundred 10 gigabit Ethernet server interfaces provided by Myricom and Intel, two fiber channel S2A9900 storage platforms provided by DataDirect Networks outfitted with 8 Gbps host bus adapters from QLogic, along with five X4500 and X4540 disk servers from Sun Microsystems. The computational nodes consisted of 32 widely available dual-motherboard Supermicro servers housing 128 quad-core Xeon processors on 64 motherboards with a like number of 10-GbE interfaces, as well as Seagate SATA II disks providing 128 terabytes of storage.

A key element in the demonstration was Fast Data Transfer (FTD), an open-source Java application based on TCP, developed by the Caltech team in collaboration with the Politehnica Bucharest team. FTD runs on major platforms and works by streaming data across an open TCP socket, so that a large data set composed of thousands of files, as is typical in high-energy physics applications, can be sent or received at full speed, without the network transfer restarting between files, and without any packets being lost. FDT works with Caltech's MonALISA system to dynamically monitor the capability of the storage systems, as well as the network path, in real time, and sends data out to the network at a moderated rate that is matched to the capacity (measured in real time) of long-range network paths.

FDT was combined with an optimized Linux kernel, known as the "UltraLight kernel," provided by Shawn McKee, and the FAST TCP protocol stack developed by Steven Low, professor of computer science and electrical engineering at Caltech, to reach its sustained throughput level of 14.3 Gbps with a single rack of servers, limited by the speed of the disks.

"This achievement is an impressive example of what a focused network and storage system effort can accomplish," said McKee. McKee is a research scientist in the University of Michigan department of physics and leader of the UltraLight network technical group involved with an experiment taking place on the world's largest particle accelerator, located at CERN. "It is an important step towards the goal of delivering a highly capable end-to-end network-aware system and architecture that meet the needs of next-generation e-science."

About the Author

Dian Schaffhauser is a former senior contributing editor for 1105 Media's education publications THE Journal, Campus Technology and Spaces4Learning.

Featured

  • The AI Show

    Register for Free to Attend the World's Greatest Show for All Things AI in EDU

    The AI Show @ ASU+GSV, held April 5–7, 2025, at the San Diego Convention Center, is a free event designed to help educators, students, and parents navigate AI's role in education. Featuring hands-on workshops, AI-powered networking, live demos from 125+ EdTech exhibitors, and keynote speakers like Colin Kaepernick and Stevie Van Zandt, the event offers practical insights into AI-driven teaching, learning, and career opportunities. Attendees will gain actionable strategies to integrate AI into classrooms while exploring innovations that promote equity, accessibility, and student success.

  • illustrated university campus with modern buildings, glowing binary code streaming straight and dynamically from multiple directions, integrated into the architecture, surrounded by stylized trees, grass, and walkways

    3 Ways Institutions Can Become Data-Driven Organizations

    Faced with declining enrollments and changing demographics, colleges and universities must make use of data and analytics to better serve students.

  • NVIDIA DGX line

    NVIDIA Intros Personal AI Supercomputers

    NVIDIA has introduced a new lineup of AI-powered computing solutions designed to accelerate enterprise workloads.

  • digital network with glowing blue and red lines, featuring multiple red arrows shifting in different directions

    Report: Attackers Change Tactics as Ransomware Payoffs Decline

    Attackers are changing tactics as they collect less money from ransomware payoffs, according to a new report from Chainalysis, a blockchain analytics firm.