Caltech and Partners Set Data-Transfer World Record

A team of physicists, computer scientists, and network engineers led by the California Institute of Technology (Caltech)--with partners from a number of other universities and science organizations--set new records for sustained data transfer among storage systems during the SuperComputing 2008 conference held in Austin, TX.

The effort achieved a bidirectional peak throughput of 114 Gbps and a sustained data flow of more than 110 Gbps among clusters of servers at the conference itself as well as Caltech, Michigan, CERN in Geneva, Fermilab in Batavia, Brazil, Korea, Estonia, and locations in the USLHCNet network in Chicago, New York, Geneva, and Amsterdam. The demonstration was intended to show that a well designed and configured single rack of servers is capable of saturating the highest-speed wide-area network links in production use today, which have a capacity of 40 Gbps in each direction.

The setup, which took three days to build, used a dozen 10-Gbps wide-area network links to feed data to the event and 14 different providers to maintain connections to external servers, as well as equipment encompassing two Cisco 6500E series switch-routers, and a hundred 10 gigabit Ethernet server interfaces provided by Myricom and Intel, two fiber channel S2A9900 storage platforms provided by DataDirect Networks outfitted with 8 Gbps host bus adapters from QLogic, along with five X4500 and X4540 disk servers from Sun Microsystems. The computational nodes consisted of 32 widely available dual-motherboard Supermicro servers housing 128 quad-core Xeon processors on 64 motherboards with a like number of 10-GbE interfaces, as well as Seagate SATA II disks providing 128 terabytes of storage.

A key element in the demonstration was Fast Data Transfer (FTD), an open-source Java application based on TCP, developed by the Caltech team in collaboration with the Politehnica Bucharest team. FTD runs on major platforms and works by streaming data across an open TCP socket, so that a large data set composed of thousands of files, as is typical in high-energy physics applications, can be sent or received at full speed, without the network transfer restarting between files, and without any packets being lost. FDT works with Caltech's MonALISA system to dynamically monitor the capability of the storage systems, as well as the network path, in real time, and sends data out to the network at a moderated rate that is matched to the capacity (measured in real time) of long-range network paths.

FDT was combined with an optimized Linux kernel, known as the "UltraLight kernel," provided by Shawn McKee, and the FAST TCP protocol stack developed by Steven Low, professor of computer science and electrical engineering at Caltech, to reach its sustained throughput level of 14.3 Gbps with a single rack of servers, limited by the speed of the disks.

"This achievement is an impressive example of what a focused network and storage system effort can accomplish," said McKee. McKee is a research scientist in the University of Michigan department of physics and leader of the UltraLight network technical group involved with an experiment taking place on the world's largest particle accelerator, located at CERN. "It is an important step towards the goal of delivering a highly capable end-to-end network-aware system and architecture that meet the needs of next-generation e-science."

About the Author

Dian Schaffhauser is a former senior contributing editor for 1105 Media's education publications THE Journal, Campus Technology and Spaces4Learning.

Featured

  • computer with a red warning icon on its screen, surrounded by digital grids, glowing neural network patterns, and a holographic brain

    Report Highlights Security Risks of Open Source AI

    In these days of rampant ransomware and other cybersecurity exploits, security is paramount to both proprietary and open source AI approaches — and here the open source movement might be susceptible to some inherent drawbacks, such as use of possibly insecure code from unknown sources.

  • The AI Show

    Register for Free to Attend the World's Greatest Show for All Things AI in EDU

    The AI Show @ ASU+GSV, held April 5–7, 2025, at the San Diego Convention Center, is a free event designed to help educators, students, and parents navigate AI's role in education. Featuring hands-on workshops, AI-powered networking, live demos from 125+ EdTech exhibitors, and keynote speakers like Colin Kaepernick and Stevie Van Zandt, the event offers practical insights into AI-driven teaching, learning, and career opportunities. Attendees will gain actionable strategies to integrate AI into classrooms while exploring innovations that promote equity, accessibility, and student success.

  • a professional worker in business casual attire interacting with a large screen displaying a generative AI interface in a modern office

    Study: Generative AI Could Inhibit Critical Thinking

    A new study on how knowledge workers engage in critical thinking found that workers with higher confidence in generative AI technology tend to employ less critical thinking to AI-generated outputs than workers with higher confidence in personal skills.

  • university building with classical columns and a triangular roof displayed on a computer screen, surrounded by minimalist tech elements like circuit lines and abstract digital shapes

    Pima Community College Launches New Portal for a Unified Digital Campus Experience

    Arizona's Pima Community College is elevating the digital campus experience for students, faculty, and staff with a new portal built on the Pathify digital engagement platform.