Brown, Johns Hopkins U Partner on 'Visual Turing Test' -- Campus Technology

Share this Page

Image Recognition

Brown, Johns Hopkins U Partner on 'Visual Turing Test'

By Joshua Bolkan
03/13/15

A team of researchers from Brown University and Johns Hopkins University have come up with a system, dubbed a "visual Turing test," to evaluate how well computers can wring meaning from images.

Rather than simply evaluating whether the computer can identify objects within an image, the team's new test aims to identify if a computer can recognize more complex relationships, such as two people walking and talking or a person entering a building.

"We think it's time to think about how to do something deeper — something more at the level of human understanding of an image," said Stuart Geman, the James Manning Professor of Applied Mathematics at Brown, in a prepared statement.

The test uses a string of yes or no computer-generated questions to determine if the computer can build a storyline for the image.

"For example," according to a news release, "an initial question might ask a computer if there's a person in a given region of a photo. If the computer says yes, then the test might ask if there's anything else in that region — perhaps another person. If there are two people, the test might ask: 'Are person1 and person2 talking?'"

The questions are computer-generated to make them more objective, but a human is still required to tell the system when a question is unanswerable, such as a question about what a person facing away from the camera is carrying.

"Geman and his colleagues hope that this new test might spur computer vision researchers to explore new ways of teaching computers how to look at images," according to a news release. "Most current computer vision algorithms are taught how to look at images using training sets in which objects are annotated by humans. By looking at millions of annotated images, the algorithms eventually learn how to identify objects. But it would be very difficult to develop a training set with all the possible contextual attributes of a photo annotated. So true context understanding may require a new machine learning technique."

"As researchers, we tend to 'teach to the test,'" Geman said in a prepared statement. "If there are certain contests that everybody's entering and those are the measures of success, then that's what we focus on. So it might be wise to change the test, to put it just out of reach of current vision systems."

About the Author

Joshua Bolkan is contributing editor for Campus Technology, THE Journal and STEAM Universe. He can be reached at [email protected].