Research Project Figures Out How to Crowdsource Predictive Models -- Campus Technology

Breaking News

Research Project Figures Out How to Crowdsource Predictive Models

By Dian Schaffhauser
11/09/17

An MIT research project has come up with a way to crowdsource development of features for use in machine learning. Groups of data scientists contribute their ideas for this "feature engineering" into a collaboration tool named "FeatureHub." The idea, according to lead researcher Micah Smith, is to enable contributors to spend a few hours reviewing a modeling problem and proposing various features. Then the software builds the models with those features and determines which ones are the most useful for a particular predictive task.

As described in "FeatureHub: Towards collaborative data science," the platform lets multiple users write scripts for feature extraction and then request an evaluation of their proposed features. The platform aggregates features from multiple users and automatically builds a machine learning model for the problem at hand. The name was inspired by GitHub, a repository for programming projects, some of which have drawn numerous contributors.

To test the platform, the researchers recruited 41 freelance analysts with data science experience, who spent five hours each with the system, familiarizing themselves with it and using it to propose candidate features for each of two data science problems. In one problem, the participants were given data about users of the home rental site Airbnb and their activity on the site. They were asked to predict, for a given user, the country in which the user would book his or her first rental. In the other problem the workers were given data provided by Sberbank, a Russian bank, on apartment sale transactions and economic conditions in Russia. The test subjects were given the job of predicting for a given transaction the apartment's final selling price. Of the 41 workers who logged into the platform, 32 successfully submitted at least one feature. In total, the project collected 1,952 features.

The predictive models produced with FeatureHub were then compared against the ones submitted to Kaggle, a data-science competition service that uses manual effort for its results. The Kaggle entries had been scored on a 100-point scale, and the FeatureHub models fell within three and five points of the winning entries for the two problems. Importantly, however, while the Kaggle entries took weeks or months of work, the FeatureHub entries were produced in days.

Smith is hopeful for the use of the platform. "I do hope that we can facilitate having thousands of people working on a single solution for predicting where traffic accidents are most likely to strike in New York City or predicting which patients in a hospital are most likely to require some medical intervention," he said, in an MIT article about the project. "I think that the concept of massive and open data science can be really leveraged for areas where there's a strong social impact but not necessarily a single profit-making or government organization that is coordinating responses."

A paper on the project was recently presented at the IEEE International Conference on Data Science and Advanced Analytics in Tokyo. Co-authors included Smith's thesis advisor, Kalyan Veeramachaneni, a principal research scientist at MIT's Laboratory for Information and Decision Systems, and Roy Wedge, a former MIT undergraduate who is now a software engineer at Feature Labs, a data science company based on the group's work.

The project was partially funded through a National Science Foundation grant focused on creating a community software infrastructure, called LearnSphere, that supports sharing, analysis and collaboration across the wide variety of educational data.

About the Author

Dian Schaffhauser is a former senior contributing editor for 1105 Media's education publications THE Journal, Campus Technology and Spaces4Learning.

E-Mail this page

Printable Format

Featured

OpenAI Launches AI-Powered Web Browser Built Around User Context

OpenAI has introduced ChatGPT Atlas, a standalone browser that places ChatGPT at the heart of everyday web activity. This release represents a major expansion of the company's efforts to reshape how users search, browse, and complete tasks online.
Campus Technology Announces 2025 Product of the Year Winners

Sixteen companies were selected as winners for their product achievements.
University of Kentucky Initiative to Advance AI Efforts Across the Campus and State

The University of Kentucky has launched CATS AI (Commonwealth AI Transdisciplinary Strategy), a campuswide effort aimed at advancing AI across the institution's 17 colleges, libraries, research centers, and institutes; its academic and healthcare enterprises; and throughout the state.
Report: AI Adoption Leads to Retraining, not Replacing, Workers

Despite fears that artificial intelligence will lead to major workforce reductions, a new report from the Federal Reserve Bank of New York suggests that’s not happening happening ... yet.

CAMPUS TECHNOLOGY NEWS

Email Address*Country*Select primary job title/function*

Please type the letters/numbers you see above.

Research Project Figures Out How to Crowdsource Predictive Models

Featured

OpenAI Launches AI-Powered Web Browser Built Around User Context

Campus Technology Announces 2025 Product of the Year Winners

University of Kentucky Initiative to Advance AI Efforts Across the Campus and State

Report: AI Adoption Leads to Retraining, not Replacing, Workers

Portals

Artificial Intelligence

Cybersecurity

Data & Analytics

Learning Tools

Student Services

WEBCASTS

The AI Threat: Protecting Higher Education from AI-Generated Email Attacks

Securing the Future of Education: A CISO Fireside Chat with St. Petersburg College

Flexible, Scalable, and Cost-Effective: Modernize Your Infrastructure with NaaS

Unifying The University of Connecticut: How Atlassian Transformed Campus Collaboration

Whitepapers

Executive Briefing: Education Sector Priorities and Market Overview

AI-Fueled Collaboration Transforms Campus Connections

Compromising Campus Accounts: Attackers Harvest Credentials and Duo OTPs for Account Takeover

Quick Wins That Drive Digital Operations Excellence

SPONSORED CONTENT