U Buffalo Deploys Netezza Data Warehouse Appliance To Speed Research

The University of Buffalo has deployed Netezza's largest data warehouse appliance for performing scientific research. The new system's first query came back in two seconds against a 1.2 billion-row test database.

"The University at Buffalo is excited to be leading this collaborative effort," said Bruce Holm, senior vice-provost and executive director of the New York State Center of Excellence for Bioinformatics and Life Sciences. "The University's commitment to knowledge discovery through computational methods is longstanding. Virtually all science and engineering domains are facing data tsunamis. The latest micro-array, sensor technologies, and computational simulations are creating data sets at an unprecedented scale. Exploiting the data-intensive computing capabilities of the Netezza platform with our partners will help shorten the time to discovery in the science and engineering domains."

"During our testing we performed analytics of micro-array data similar to the real discovery problems the biological sciences are facing. We obtained more than two orders of magnitude performance speedup as compared to traditional high-performance computing (HPC) clusters," said Vipin Chaudhary, associate professor of computer science and engineering at Buffalo. "The mapping of our algorithm onto Netezza took one day to achieve that performance while the same effort took weeks on the HPC cluster."

"Data-intensive science and engineering requires multiple HPC architectures and platforms," Chaudhary added. "The Netezza system is a massively parallel device that combines a storage device, [field programmable gate array], memory, and CPU on each of more than 100 blades in a standard cabinet. The architecture delivers the query to the data and executes it in parallel across the hundreds of blades simultaneously. The result is incredibly fast response times against data sets containing many terabytes of data."

"The Netezza system will enhance our disease and drug discovery research efforts," said Murali Ramanathan, an associate professor of pharmaceutical sciences and neurology at Buffalo. "The cause of many diseases, such as multiple sclerosis and cancer, is a complex problem with many genes and environmental factors involved. These are significant combinatorial problems that need to be solved and require the analysis of data sets of many terabytes. We developed algorithms to help understand some of these gene and environment phenomena. Collaborating with Dr. Chaudhary and Netezza engineers, we are now mapping these algorithms onto the Netezza platform. Once we understand the causes of the diseases, we can identify better drug targets that may lead to better treatments for these devastating diseases."

"Our simulations of combustors in gas turbine engines are run on some of the largest supercomputing assets available at the Department of Defense and NASA facilities," said Suresh Menon, professor of aerospace engineering, and director of the Computational Combustion Laboratory at the Georgia Institute of Technology. "A single simulation can run for days or weeks, and produce output files (in three-dimensional space and time--and hence, often called 4D datasets) in the many tens of terabytes. Having the ability to find, extract and analyze features of importance in these massive (and multiple) 4D datasets will allow us to understand the complex interactions that are taking place. With this kind of knowledge we can introduce computational predictions and analysis into the design cycle and thereby cut down the development time/cost for the next generation power and transportation systems."

About the Author

Dian Schaffhauser is a former senior contributing editor for 1105 Media's education publications THE Journal, Campus Technology and Spaces4Learning.

Featured