Science and Engineering | News

Carnegie Mellon Figures Out How To Match Images Across Media

Carnegie Mellon University researchers have developed what they're calling a "surprisingly simple method" for identifying visually similar images that don't match at the pixel level. This method would enable a computer to identify similar images even when factors within the images vary, such as lighting, season, or medium. For example, the technique can find photographic matches to a sketch of a bicycle, a problem that would typically be beyond the capabilities of a computer.

The major obstacle still to conquer: The processing power required to perform the operation is so excessive, adding the functionality to a search site won't be happening in the short term.

The image matching challenge is relevant in a number of activities, such as automatic colorization, scene and video completion, photo restoration, and even making computer graphics imagery more realistic, the researchers explained in their paper, "Data-driven Visual Similarity for Cross-domain Image Matching."

The research team, part of the university's School of Computer Science, is being led by Alexei Efros, an associate professor of computer science and robotics, and Abhinav Gupta, an assistant research professor of robotics. First author is Abhinav Shrivastava, a master's degree student in robotics. They'll be presenting their findings at a mid-December SIGGRAPH Asia conference.

The researchers said that image matching currently is done by pixel matching. For example, Google Goggles, an app created by Google developers, can make matches by examining shapes, colors, and compositions. But when there are variances in the images, such as a painting versus a photograph, "pixel-wise matching fares quite poorly," the researchers reported. "Small perceptual differences can result in arbitrarily large pixel-wise differences."

What's needed, they said, is a way to capture the important visual structures that make two images appear similar, yet can also take into account "small, unimportant visual details." In other words, a "visual similarity algorithm" needs to be able to figure out which parts of an image are important to the human observer and which aren't.

In an image of somebody in front of the Arc de Triomphe in Paris, for example, the presence of the person is usually similar to people in other photos and would thus be given little weight in calculating uniqueness. The Arc itself, however, would be given greater weight because few photos include anything like it.

The technique can also be combined with GPS-tagged photo collections to determine the location for a particular landmark and used to assemble a "visual memex," a data set that explores the connections among a set of photos. The researchers have posted a video on YouTube showing the technique, which can build a path through image data to uncover additional information about any individual image.

   

"We didn't expect this approach to work as well as it did," Efros said. "We don't know if this is anything like how humans compare images, but it's the best approximation we've been able to achieve."

Speed remains the "central limitation of the proposed approach," the researchers wrote. One implementation they developed took three minutes per query, and that was on a 200-node cluster. "This is still too slow for many practical applications at this time," they noted.

The research received financial support from the Computer Science Department's Center for Computational Thinking, the Office of Naval Research, and Google.

comments powered by Disqus