Stanford Researchers Create Computer Vision Algorithm for Describing Visual Scenes
Researchers at Stanford University have created a computer vision algorithm that can analyse an unknown image and
describe it using words and phrases.
While previous computer vision algorithms have been able to identify
individual objects in pictures, this new algorithm takes the next step of
telling a basic story about the image, such as "cat sits on keyboard" or "girl
rides on horse in field." Since the majority of Internet traffic is visual data,
this new computer vision algorithm could improve online search tools, according
to a news release from Stanford.
The algorithm works by identifying objects in an image and putting them in
context, something that humans learn to do as children but that has been
difficult to achieve using computers. Fei-Fei Li, a professor of computer
science and director of the Stanford Artificial Intelligence Lab, was the lead
researcher on this project. She was also a lead researcher on the ImageNet project, a precursor
to this latest project, which uses a large visual database to describe objects
in mathematical terms that machines can understand and link them to words that
humans can understand.
The researchers developed a second visual dictionary that describes scenes,
rather than just objects, in both mathematical terms and human phrases. The
computer vision algorithm uses both the visual object dictionary and the visual
scene dictionary as training material. It can analyse the patterns in those
dictionaries and learn to identify individual objects and put them in a simple
context to describe new scenes.
In the short term, this new computer vision algorithm could help people
search photo and video archives to find specific images. In the long term, it
could lead to the development of robotic systems that can navigate unknown
situations, according to the news release from Stanford.
The researchers have written a paper describing their approach and will
present the paper at the computer vision conference, CVPR 2015, taking place in Boston in
June 2015.
About the Author
Leila Meyer is a technology writer based in British Columbia. She can be reached at [email protected].