Home > Hadoop Summit: Yahoo Gathers the Stuffed Elephant Crowd

News

Hadoop Summit: Yahoo Gathers the Stuffed Elephant Crowd

3/28/2008

Yahoo hosted the first-ever Apache Hadoop Summit this week in Santa Clara, CA. The day-long event presented a program of speakers from the Hadoop developer and user communities, including representatives from Yahoo, IBM, Microsoft, Facebook, Google, and University of California, Berkeley, among others.

The event drew around 500 attendees, but event organizers were unsure of the exact number. They were, in fact, caught off guard by the turnout and had to change venues to accommodate a standing-room-only crowd.

"We organized the summit because we've been investing a lot in Hadoop ourselves, and we knew there was a large community of Hadoop users out there that mostly haven't met each other," said Yahoo Technical Evangelist Jeremy Zawodny. "I guess it was larger than we thought."

The Hadoop Framework is an open source, Java-based distributed computing platform designed to allow implementations of MapReduce to run on large clusters of commodity hardware. Google's MapReduce is a programming model for processing and generating large data sets. It supports parallel computations over large data sets on unreliable computer clusters.

Yahoo hired Hadoop's creator, Doug Cutting, early last year to work full-time on the framework. Cutting created the Lucene open source information retrieval library with Mike Cafarella, and the Nutch open source search engine based on it. Both projects are now managed through the Apache Software Foundation.

"The momentum around Hadoop is growing every day," Cutting said. "It's really exciting to watch."

Cutting called Yahoo's resource commitment to the Hadoop framework "considerable," but offered no details. Yahoo has made a very public commitment to Hadoop. In February, it launched what company representatives claimed to be the world's largest Hadoop production application. Called the Yahoo Webmap, the application runs a 10,000-plus-core Linux cluster and produces data used in every Yahoo Web search query, according to company literature.

The initial intended use of Hadoop within Yahoo was to support Web search, Cutting said, by building the Web search index and maintaining that massive collection of data. But although it is making the Yahoo search engine more easily scalable and reliable, he said, the majority of in-house users are actually employing Hadoop for data exploration.

"It turns out that there are all these other people within the company who want to be able to access and analyze these massive data sets -- access logs, event logs, Web and geographic data -- and use them to improve the Web search software itself," Cutting said. "So they're using Hadoop for analysis to improve the software, as opposed to actually implementing the Web search. That's where we're seeing the big payoff."



Recommended Reading
  • Tufts Grants Rights for Mileage-Increasing Transportation Technology to Electric Truck

    Tufts University has optioned rights to a technology that can recharge the batteries of any hybrid electric and electric-powered vehicle while it is driven. The Tufts-developed technology could increase by 20 percent to 70 percent the miles per gallon or total driving range performance of vehicles like the Honda Civic, Ford Escape, and Toyota Prius hybrids and the Tesla Motors and Phoenix Motorcars electric vehicles.

  • U Florida and Cyntellect Collaborate to Unlock Mysteries of Cancer Stem Cells

    The University of Florida has entered into a research agreement with life sciences company Cyntellect. The university's Interdisciplinary Center for Biotechnology Research will work with the company to focus on a variety of research areas including the purification and analysis of cancer stem cells (CSCs), rare cells believed to be directly involved in propagating cancers.

  • George Mason U Receives Grant To Deploy Intergraph Apps for Intelligence Curriculum

    George Mason University (GMU) in Fairfax, VA has been awarded a grant from Intergraph to enable students enrolled in GMU's Geospatial Intelligence Graduate Certificate program to use the company's geospatial production and exploitation software as part of their core curriculum.

  • George Mason U Receives Grant To Deploy Intergraph Apps for Intelligence Curriculum

    George Mason University (GMU) in Fairfax, VA has been awarded a grant from Intergraph to enable students enrolled in GMU's Geospatial Intelligence Graduate Certificate program to use the company's geospatial production and exploitation software as part of their core curriculum.

  • Institute for Cyber Security at U Texas, San Antonio Opens Incubator

    The University of Texas at San Antonio (UTSA) Institute for Cyber Security (ICS) has launched a new Internet security incubator. The incubator was developed to commercialize promising technologies that address major cyber security and privacy issues. The first companies to enter the incubator are Denim Labs and SafeMashups.

  • ISO/IEC Publishes Office Open XML Standard

    ISO/IEC has published the Office Open XML (OOXML) file format standard, formally known as ISO/IEC 29500:2008. It describes file formats originally designed by Microsoft for its Office 2007 productivity suite, which are used in presentation, spreadsheet and word processing applications.