MIT Rethinks Big Data Processing -- Campus Technology

Data and Analytics

MIT Rethinks Big Data Processing

By Dian Schaffhauser
11/29/12

Research by a small team at the Massachusetts Institute of Technology may turn out to help streamline the processing of big data--those terabytes of streaming data that are generated from GPSs in smartphones and a multitude of other sensors. The basic idea is to create "succinct representations" of huge data sets so that existing algorithms can handle them more efficiently.

As described in "The Single Pixel GPS: Learning Big Data Signals from Tiny Coresets," a paper presented at the Association for Computing Machinery's International Conference on Advances in Geographic Information Systems, three MIT researchers have figured out how to represent data so that it takes up less space in memory while still being processed in conventional ways. That's useful because it means the technique can be used with existing algorithms rather than having to replace them with new ones.

The researchers applied the technique to the processing of two-dimensional location data generated by GPS receivers. According to Daniela Rus, a professor of computer science and engineering and director of MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL), these receivers take position readings every 10 seconds. That adds up to about a gigabyte of data each day. Systems that attempt to analyze traffic patterns from readings sent by a massive number of cars can easily be bogged down by the volume of data generated.

What the scientists have figured out is that the analysis doesn't need to encompass each point of data generated by a given car--only some of it, such as when the car is turning. The path between that point and the next turn could be approximated by a straight line. The collection of those sets of data form a new "coreset" that can be compressed on the run, as it were.

The researchers' algorithm has to find a series of line segments that most accurately defines the data points. The algorithm also stores the exact coordinates of a random sampling of the points, which stand in for the potential randomness of the unsampled points in the calculations.

The technique, which encompasses a great deal of mathematics, is a tradeoff between "accuracy and complexity," said Dan Feldman, a post-doctoral student in Rus' group and lead author on the new paper. It's the combination of linear estimates and random sampling that allows the algorithm to compress data in chunks; as new data arrives, the algorithm does recalculations.

What's the point? For all practical purposes, many potential uses for big data don't stand up to the processing they would require. The MIT team's approach suggests that a slightly erroneous approximation is better than a calculation that doesn't get performed at all. Now the scientists must consider uses for the technique that have similar characteristics to the use of GPS receiver data.

One application under consideration by Feldman is the analysis of video data. Each scene might be considered comparable to a line segment; the shift from one scene to another is like the car turning. And sample frames from a scene could provide that random sampling.

This isn't the only research being done on campus in the area of big data. In May 2012 MIT was selected to host "bigdata@CSAIL," a new Intel-sponsored research center focused on developing techniques for working with big data.

About the Author

Dian Schaffhauser is a former senior contributing editor for 1105 Media's education publications THE Journal, Campus Technology and Spaces4Learning.

E-Mail this page

Printable Format

Featured

Rubrik Intros Immutable Backup for Okta Environments

Rubrik has announced Okta Recovery, extending its identity resilience platform to Okta with immutable backups and in-place recovery, while separately detailing its integration with Okta Identity Threat Protection for automated remediation.
Top 3 Faculty Uses of Gen AI

A new report from Anthropic provides insights into how higher education faculty are using generative AI, both in and out of the classroom.
Internet2 Announces a New President and CEO to Step Up in October

Internet2, the member-driven nonprofit offering advanced network technology services and cyberinfrastructure to the research and education community has completed its search, which began this past May, for a new president and CEO to take the helm.
Anthology Restructures, Focuses on Teaching and Learning Business

Anthology has announced a strategic restructuring, divesting its Enterprise Operations, Lifecycle Engagement, and Student Success businesses and filing for Chapter 11 bankruptcy in an effort to right-size its finances and focus on its core teaching and learning products.

CAMPUS TECHNOLOGY NEWS

Email Address*Country*Select primary job title/function*

Please type the letters/numbers you see above.

MIT Rethinks Big Data Processing

Featured

Rubrik Intros Immutable Backup for Okta Environments

Top 3 Faculty Uses of Gen AI

Internet2 Announces a New President and CEO to Step Up in October

Anthology Restructures, Focuses on Teaching and Learning Business

Portals

Artificial Intelligence

Cybersecurity

Data & Analytics

Learning Tools

Student Services

WEBCASTS

The AI Threat: Protecting Higher Education from AI-Generated Email Attacks

Securing the Future of Education: A CISO Fireside Chat with St. Petersburg College

Flexible, Scalable, and Cost-Effective: Modernize Your Infrastructure with NaaS

Unifying The University of Connecticut: How Atlassian Transformed Campus Collaboration

Whitepapers

Executive Briefing: Education Sector Priorities and Market Overview

AI-Fueled Collaboration Transforms Campus Connections

Compromising Campus Accounts: Attackers Harvest Credentials and Duo OTPs for Account Takeover

Quick Wins That Drive Digital Operations Excellence

SPONSORED CONTENT