Universities Access IBM/Google Cloud Compute Cluster for NSF-Funded Research
The National Science Foundation recently announced approximately $5 Million in new grants to universities to access and continue research using the IBM/Google cloud computing cluster, bringing the number of universities with NSF Cluster Exploratory (CLuE) program grants to 14. IBM and Google began collaborating on an IBM/Google Cloud Computing University Initiative in 2007 to help computer science students gain the skills they need to build cloud applications. Now, the NSF is tapping the same cloud infrastructure to support university research in data-intensive computing in a range of scientific and engineering areas. For example, the University of Utah and the University of Washington are working to expand the capabilities of VisTrails, a system developed at the University of Utah to create high-quality visualizations from very large datasets (pictured).
The NSF-funded university research projects, which use software and services on the IBM/Google cloud infrastructure include:
Carnegie-Mellon University received an award in 2008 to develop more effective processing of Web searches; their 2009 award focuses on machine translation using the Integrated Cluster Computing Architecture (INCA).
With funding started in 2008, Florida International University is leveraging the Hadoop framework to provide a distributed file system that supports analysis of aerial images and related objects, opening up new potential for high-performance geospatial querying.
The Massachusetts Institute of Technology, the University of Wisconsin-Madison, and Yale University are collaborating on a study of cluster-based, large-scale data analysis, comparing Google's MapReduce with other parallel database approaches.
Purdue University is investigating extensions to MapReduce for programming large-scale, distributed systems and applications that manipulate large, unstructured graphs.
The University of California, Irvine is conducting research to support fuzzy queries on large text repositories.
The University of California, San Diego and the San Diego Supercomputer Center are investigating the management and processing of massive spatial data sets on large-scale compute clusters.
At the University of California, Santa Barbara, the Massive Graphs in Clusters (MAGIC) project is developing software to query very large graph datasets efficiently, with implications for highly connected data (such as social networks).
The University of Maryland-College Park received an award in 2008 for machine translation; its 2009 award is focused on the development of parallel algorithms for analyzing new generation sequencing data.
At the University of Massachusetts-Amherst, researchers at the Center for Intelligent Information Retrieval (CIIR) are applying CluE infrastructure to explore word relationships and how they can be used in pre-processing and at search time to improve results from Web searches.
The University of Virginia is exploring super resolution derived by "data-driven image zoom," a process that intelligently enlarges a digital image by computing a new image based in part on patches taken from a 50-million image database.
The University of Washington is using MapReduce to index, access, and analyze astronomical images derived from petascale datasets. The university also received funding in 2008 for its work on preparing students and instructors for large-scale cluster computing.
The University of Washington and University of Utah are collaborating on new infrastructure for computational oceanography that leverages the CluE platform and extends two existing systems: GridFields, a library for manipulation of simulation results; and VisTrails, a comprehensive platform for scientific workflow.
IBM Cloud Labs Vice President Willy Chiu commented on his company's support of the research projects. "IBM is intensely focused on applying technology and science to make the world work better."
Jeff Walz, director of University Relations at Google, reflected on the impact for both research and education, saying, "The movement of the cloud computing model into research could have a tremendous transformative [effect] both on the education side and on the research side." Walz explained that Google has provided a dedicated data center (located in the United States and not part of Google's regular cloud, which is distributed throughout the world), and IBM has provided software. Regarding the future of the collaboration, Walz noted, "We have a three-year commitment to keep it going and work with the NSF and IBM, so we hope to have more grants in the future."
[Photo by Juliana Freire and Claudio Silva. Courtesy University of Utah and PRNewsFoto/IBM Corporation.]