U Michigan Researchers Win $1.6 Million to Contribute to DARPA Machine-Learning Technique Repository
The University of Michigan has won a $1.6-million grant to develop tools that will help people who are not data scientists better use the data they have access to.
Led by Jason Corso, professor of engineering and computer science, and dubbed SPIDER, for Subspace Primitives that are Interpretable and DivERse, the Michigan project aims to develop new techniques to find meaning in different kinds of datasets by focusing on features within a dataset that are much less variable than the overall set's capacity for variation.
"For instance," according to a U-M news release, "a 128-by-128 image of a face contains 16,384 pixels, but the pixels don't vary independently from one another. In fact, the expected variations can be described by about 10 or 20 dimensions, said Corso — down from 16,384 assuming that each pixel is independent of the others. By looking at 'subspaces' like this, he and [Laura] Balzano simplify the problem of interpreting images and other arrays of data."
By looking at how the pixels relate to each other instead of comparing them to all possible variations, software can be trained to identify faces in light, shadow or even if part of the face is missing. The team plans to develop new techniques that also use clustering to meaningfully segment large data sets.
The team has already developed tools that break video down into a text summary of what's visible on screen, analyzing videos of car crashes on YouTube, for example, to determine information such as how fast the vehicles were traveling and rates of deceleration or analyzing body camera videos of police interactions to determine what characteristics suggest escalating tension.
The SPIDER team is one of 24 selected by the Defense Advanced Research Projects Agency (DARPA) to participate in its Data-Driven Discovery of Models (D3M) program. As the teams develop new techniques, their approaches will go into a repository where they will be made available to researchers.
"This repository will assemble these algorithms into models that use vastly different types of data sets to make predictions and draw conclusions," according to a news release. "Already, a few thousand methods from currently available software systems are being added to the repository."
"You always need new algorithms and new ways of modeling data," Balzano said in a prepared statement. "This project puts them into a system."
About the Author
Joshua Bolkan is contributing editor for Campus Technology, THE Journal and STEAM Universe. He can be reached at [email protected].