Yahoo Develops Interface Classification System for Hadoop

Developers at Yahoo have been working on a new interface classification system in Hadoop to distinguish two facets of an interface from the perspective of backward compatibility: the audience of the interface and the stability of the interface.

The "audience" of the interface refers to its scope or visibility. It's about the potential customers for it. The classifications in the new system include "public," "limited private" (for hooks exposed to peer frameworks or systems), and "private." The "stability" of an interface refers to how changes might or might not break compatibility. The classifications include "stable," "evolving," and "unstable."

Hadoop is the popular Java-based open-source framework for data-intensive distributed computing. The Hadoop Framework is an open-source distributed computing platform designed to support parallel computations over large data sets on so-called unreliable computer clusters. It's based on Google's MapReduce, a programming model for processing and generating large data sets, which divides an application into multiple units of work, each of which can be executed on any node in a server cluster. Hadoop supports the HDFS distributed file system, which designed to scale to petabytes of storage and to run on top of the file systems of the underlying OS.

In his Yahoo Developer Network Blog, Hadoop team member Sanjay Radia wrote: "Hadoop is increasingly being used to run large, long-lived, enterprise-class applications. Porting these applications to non-compatible upgrades of Hadoop is an arduous, expensive task that distracts teams from finding new and better ways of using Hadoop to bring value to their companies. Today, Hadoop users are demanding backwards compatibility and interface stability; these features are necessary for the next growth phase of Hadoop, as it gains wider enterprise adoption."

According to Radia, an interface can be a Java API, a configuration variable, the parameters or output of a command, or metrics variables. The system tags Java APIs using Java Annotations, while other types of interfaces (configuration options and output formats, for example), are tagged using informal documentation conventions. The upcoming release 0.21 of Hadoop will be the first to expose this classification, Radia said.

Yahoo's recommendation to app developers: stick to "public-stable" interfaces. "If you are early adopter, you may use a public-evolving interface," Radia wrote, "but be aware that the interface may change slightly in the future, forcing a change to your application." If you're a framework developer on Hadoop: "You can of course safely use any of the public interfaces, but can also use limited-private interfaces targeted to your framework. For example the Hadoop RPC layer provides limited-private interfaces for HDFS and MapReduce."

The new classification system, which is derived from OpenSolaris and Yahoo's own internal system, has been in the works for the last year. It's part of Yahoo's plan to provide stronger backward compatibility, Radia said.

The details of the classifications for interfaces system can be found here.

About the Author

John K. Waters is the editor in chief of a number of Converge360.com sites, with a focus on high-end development, AI and future tech. He's been writing about cutting-edge technologies and culture of Silicon Valley for more than two decades, and he's written more than a dozen books. He also co-scripted the documentary film Silicon Valley: A 100 Year Renaissance, which aired on PBS.  He can be reached at [email protected].

Featured

  • hooded figure types on a laptop, with abstract manifesto-like posters taped to the wall behind them

    Hacktivism Is a Growing Threat to Higher Education

    In recent years, colleges and universities have faced an evolving array of cybersecurity challenges. But one threat is showing signs of becoming both more frequent and more politically charged: hacktivism.

  • Hand holding a stylus over a tablet with futuristic risk management icons

    Why Universities Are Ransomware's Easy Target: Lessons from the 23% Surge

    Academic environments face heightened risk because their collaboration-driven environments are inherently open, making them more susceptible to attack, while the high-value research data they hold makes them an especially attractive target. The question is not if this data will be targeted, but whether universities can defend it swiftly enough against increasingly AI-powered threats.

  • digital book with circuit patterns

    Turnitin and ACUE Partner on AI Training for Educators

    Turnitin is teaming up with the Association of College and University Educators to create a series of courses on AI and academic integrity designed to help faculty navigate the responsible use of AI in learning and assessment.

  • student with headphones engaged in virtual learning

    Virtual Learning that Works: 4 Ways to Build Real Engagement

    As colleges and universities expand online offerings, the goal now is clear: Build environments where students actively participate, not passively attend.