Home > Hadoop Summit: Yahoo Gathers the Stuffed Elephant Crowd

News

Hadoop Summit: Yahoo Gathers the Stuffed Elephant Crowd

3/28/2008

Yahoo hosted the first-ever Apache Hadoop Summit this week in Santa Clara, CA. The day-long event presented a program of speakers from the Hadoop developer and user communities, including representatives from Yahoo, IBM, Microsoft, Facebook, Google, and University of California, Berkeley, among others.

The event drew around 500 attendees, but event organizers were unsure of the exact number. They were, in fact, caught off guard by the turnout and had to change venues to accommodate a standing-room-only crowd.

"We organized the summit because we've been investing a lot in Hadoop ourselves, and we knew there was a large community of Hadoop users out there that mostly haven't met each other," said Yahoo Technical Evangelist Jeremy Zawodny. "I guess it was larger than we thought."

The Hadoop Framework is an open source, Java-based distributed computing platform designed to allow implementations of MapReduce to run on large clusters of commodity hardware. Google's MapReduce is a programming model for processing and generating large data sets. It supports parallel computations over large data sets on unreliable computer clusters.

Yahoo hired Hadoop's creator, Doug Cutting, early last year to work full-time on the framework. Cutting created the Lucene open source information retrieval library with Mike Cafarella, and the Nutch open source search engine based on it. Both projects are now managed through the Apache Software Foundation.

"The momentum around Hadoop is growing every day," Cutting said. "It's really exciting to watch."

Cutting called Yahoo's resource commitment to the Hadoop framework "considerable," but offered no details. Yahoo has made a very public commitment to Hadoop. In February, it launched what company representatives claimed to be the world's largest Hadoop production application. Called the Yahoo Webmap, the application runs a 10,000-plus-core Linux cluster and produces data used in every Yahoo Web search query, according to company literature.

The initial intended use of Hadoop within Yahoo was to support Web search, Cutting said, by building the Web search index and maintaining that massive collection of data. But although it is making the Yahoo search engine more easily scalable and reliable, he said, the majority of in-house users are actually employing Hadoop for data exploration.

"It turns out that there are all these other people within the company who want to be able to access and analyze these massive data sets -- access logs, event logs, Web and geographic data -- and use them to improve the Web search software itself," Cutting said. "So they're using Hadoop for analysis to improve the software, as opposed to actually implementing the Web search. That's where we're seeing the big payoff."



Recommended Reading
  • Sun, Stanford Working To Archive History

    In May in San Francisco, experts from leading universities, libraries, and research institutions around the world met as part of an ongoing effort to address a pressing issue: archiving the world's history, right up to today.

  • The Quilt Coalition Rolls Out XO Communications for High-Capacity Network Services

    The Quilt, a coalition of 28 regional network organizations, has added XO Communications Services to its authorized vendor list. The Quilt represents 200 universities and thousands of other educational institutions across the United States. With this new relationship, Quilt members can purchase XO's high-speed IP transit and network transport services at competitive rates.

  • Wimba Classroom 5.2 Expands Classroom Capture Support, Adds MP3 Downloads

    At the NECC 2008 conference in Texas this week, Wimba launched a new version of Wimba Classroom, the virtual classroom component of the company's Collaboration Suite. The new 5.2 release expands options for classroom capture and adds a variety of other functional and ease of use features.

  • Automation Chimera: Education Is Not Management

    The lure of automating workflow online so human intervention is minimized is continually reinforced in the minds of higher education administrators by examples of automated campus systems such as financials, student information systems, and other enterprise systems. But what's good for management is not always good for learning.

  • Cognos Releases BI Software for Linux-based IBM System z Mainframe

    Cognos, which IBM acquired in January, has released an update to its business intelligence software that will run on the Linux operating system on IBM System z mainframes. IBM Cognos 8 BI was being developed by the two companies prior to the acquisition, but assimilation of Cognos into IBM accelerated development.

  • Facebook and Collegiality: A Serendipitous Social Niche

    Facebook is a way to greet a colleague as if she or he is on your own campus: a wave at a distance, a hello at the corner burrito place, a honk as you both leave the campus parking lot. Informal collegiality has been extended over the miles.