New Texas A&M Supercomputer 'Grace' Goes Online in December

Texas A&M's Grace supercomputer

Texas A&M University is getting a new supercomputer. The latest system, which goes online for researchers in December, will be 20 times more powerful than the supercomputer it's replacing. "Grace," as it's named, in memory of Vice Admiral Grace Hopper, will replace "Ada" (named for Ada Lovelace), which has served as the lead supercomputer at the university's High Performance Research Computing center, where it has been running since 2014.

As the university explained in a press release, supercomputer computing power is measured as "flops," or "floating point operations per second." A floating-point operation is any calculation that involves numbers with decimal points. One trillion flops equal a teraflop; a thousand teraflops equal a petaflop. While Ada can process up to 337 teraflops, Grace will handle up to 6.2 petaflops.

The new capacity will support Texas A&M research in numerous fields, including drug design, materials science, artificial intelligence and machine learning, geosciences, fluid dynamics, biomedical applications, biophysics, genetics, quantum computing, data analytics, population informatics and autonomous vehicles.

According to Honggao Liu, executive director of the center, the supercomputer will enhance the university's research capabilities and competitiveness and enable researchers "to keep pace with current trends in research computing technologies." The extra capacity is needed for a growing number of research projects. In the last four years, the center has seen a doubling of its user base, from about 1,300 in 2016 to more than 2,600 this year.

"HPRC has a mission to infuse computational and data analysis technologies into the research and creative activities of every academic discipline at Texas A&M," Liu noted. "We support compute- and data-intensive workloads and enable researchers to use cutting-edge processor, accelerator and data analytic technologies to solve complex research problems. In this era of converged demand for advanced computing resources, a new supercomputer like Grace is needed to support complex workflows and allow researchers to continue in their pursuit of discoveries and inventions."

The system combines technology from several companies. While Dell is the primary vendor, the CPUs come from Intel, the storage system from DataDirect Networks (DDN) and the graphics processing units and InfiniBand interconnect network from NVIDIA.

Specifications include:

  • Dell EMC PowerEdge servers with 2nd Gen Intel Xeon Scalable processors, making up 800 regular compute nodes, five login nodes and six management nodes;
  • 100 double-precision NVIDIA A100 compute nodes, eight single-precision NVIDIA T4 GPU compute nodes, and nine single-precision NVIDIA RTX 6000 Tensor Core GPUs;
  • An NVIDIA Mellanox HDR100 InfiniBand network; and
  • 5.12 petabytes of high-performance DDN EXAScaler ES7990X storage running the EXAScaler parallel filesystem.

Each node is outfitted with dual 2nd-generation Intel Xeon Scalable 24-core 3.0GHz processors and 384GB DDR4 3200MHz memory processors.

Eight large memory nodes have four 2nd-generation Intel Xeon Scalable 20-core 2.5 GHz processors and 3.072 terabytes of DDR4 3200MHz memory.

Funding for Grace came from Texas A&M as well as the university's Research Development Fund, Health Science Center, Engineering Experiment Station, Transportation Institute and several individual faculty members in the Colleges of Engineering and Science.

About the Author

Dian Schaffhauser is a former senior contributing editor for 1105 Media's education publications THE Journal, Campus Technology and Spaces4Learning.

Featured

  • student reading a book with a brain, a protective hand, a computer monitor showing education icons, gears, and leaves

    4 Steps to Responsible AI Implementation

    Researchers at the University of Kansas Center for Innovation, Design & Digital Learning (CIDDL) have published a new framework for the responsible implementation of artificial intelligence at all levels of education.

  • three glowing stacks of tech-themed icons

    Research: LLMs Need a Translation Layer to Launch Complex Cyber Attacks

    While large language models have been touted for their potential in cybersecurity, they are still far from executing real-world cyber attacks — unless given help from a new kind of abstraction layer, according to researchers at Carnegie Mellon University and Anthropic.

  • Hand holding a stylus over a tablet with futuristic risk management icons

    Why Universities Are Ransomware's Easy Target: Lessons from the 23% Surge

    Academic environments face heightened risk because their collaboration-driven environments are inherently open, making them more susceptible to attack, while the high-value research data they hold makes them an especially attractive target. The question is not if this data will be targeted, but whether universities can defend it swiftly enough against increasingly AI-powered threats.

  • magnifying glass revealing the letters AI

    New Tool Tracks Unauthorized AI Usage Across Organizations

    DevOps platform provider JFrog is taking aim at a growing challenge for enterprises: users deploying AI tools without IT approval.