MIT Rethinks Big Data Processing

Research by a small team at the Massachusetts Institute of Technology may turn out to help streamline the processing of big data--those terabytes of streaming data that are generated from GPSs in smartphones and a multitude of other sensors. The basic idea is to create "succinct representations" of huge data sets so that existing algorithms can handle them more efficiently.

As described in "The Single Pixel GPS: Learning Big Data Signals from Tiny Coresets," a paper presented at the Association for Computing Machinery's International Conference on Advances in Geographic Information Systems, three MIT researchers have figured out how to represent data so that it takes up less space in memory while still being processed in conventional ways. That's useful because it means the technique can be used with existing algorithms rather than having to replace them with new ones.

The researchers applied the technique to the processing of two-dimensional location data generated by GPS receivers. According to Daniela Rus, a professor of computer science and engineering and director of MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL), these receivers take position readings every 10 seconds. That adds up to about a gigabyte of data each day. Systems that attempt to analyze traffic patterns from readings sent by a massive number of cars can easily be bogged down by the volume of data generated.

What the scientists have figured out is that the analysis doesn't need to encompass each point of data generated by a given car--only some of it, such as when the car is turning. The path between that point and the next turn could be approximated by a straight line. The collection of those sets of data form a new "coreset" that can be compressed on the run, as it were.

The researchers' algorithm has to find a series of line segments that most accurately defines the data points. The algorithm also stores the exact coordinates of a random sampling of the points, which stand in for the potential randomness of the unsampled points in the calculations.

The technique, which encompasses a great deal of mathematics, is a tradeoff between "accuracy and complexity," said Dan Feldman, a post-doctoral student in Rus' group and lead author on the new paper. It's the combination of linear estimates and random sampling that allows the algorithm to compress data in chunks; as new data arrives, the algorithm does recalculations.

What's the point? For all practical purposes, many potential uses for big data don't stand up to the processing they would require. The MIT team's approach suggests that a slightly erroneous approximation is better than a calculation that doesn't get performed at all. Now the scientists must consider uses for the technique that have similar characteristics to the use of GPS receiver data.

One application under consideration by Feldman is the analysis of video data. Each scene might be considered comparable to a line segment; the shift from one scene to another is like the car turning. And sample frames from a scene could provide that random sampling.

This isn't the only research being done on campus in the area of big data. In May 2012 MIT was selected to host "bigdata@CSAIL," a new Intel-sponsored research center focused on developing techniques for working with big data.

About the Author

Dian Schaffhauser is a former senior contributing editor for 1105 Media's education publications THE Journal, Campus Technology and Spaces4Learning.

Featured

  • geometric pattern features abstract icons of a dollar sign, graduation cap, and document

    Maricopa Community Colleges Adopts Platform to Combat Student Application Fraud

    In an effort to secure its admissions and financial processes, Maricopa Community Colleges has partnered with A.M. Simpkins and Associates (AMSA) to implement the company's S.A.F.E (Student Application Fraudulent Examination) across the district's 10 institutions.

  • stylized figures, resumes, a graduation cap, and a laptop interconnected with geometric shapes

    OpenAI to Launch AI-Powered Jobs Platform

    OpenAI announced it will launch an AI-powered hiring platform by mid-2026, directly competing with LinkedIn and Indeed in the professional networking and recruitment space. The company announced the initiative alongside an expanded certification program designed to verify AI skills for job seekers.

  • Abstract AI circuit board pattern

    New Nonprofit to Work Toward Safer, Truthful AI

    Turing Award-winning AI researcher Yoshua Bengio has launched LawZero, a new nonprofit aimed at developing AI systems that prioritize safety and truthfulness over autonomy.

  • hooded figure types on a laptop, with abstract manifesto-like posters taped to the wall behind them

    Hacktivism Is a Growing Threat to Higher Education

    In recent years, colleges and universities have faced an evolving array of cybersecurity challenges. But one threat is showing signs of becoming both more frequent and more politically charged: hacktivism.