Stanford Researchers Create Computer Vision Algorithm for Describing Visual Scenes

Researchers at Stanford University have created a computer vision algorithm that can analyse an unknown image and describe it using words and phrases.

While previous computer vision algorithms have been able to identify individual objects in pictures, this new algorithm takes the next step of telling a basic story about the image, such as "cat sits on keyboard" or "girl rides on horse in field." Since the majority of Internet traffic is visual data, this new computer vision algorithm could improve online search tools, according to a news release from Stanford.

The algorithm works by identifying objects in an image and putting them in context, something that humans learn to do as children but that has been difficult to achieve using computers. Fei-Fei Li, a professor of computer science and director of the Stanford Artificial Intelligence Lab, was the lead researcher on this project. She was also a lead researcher on the ImageNet project, a precursor to this latest project, which uses a large visual database to describe objects in mathematical terms that machines can understand and link them to words that humans can understand.

The researchers developed a second visual dictionary that describes scenes, rather than just objects, in both mathematical terms and human phrases. The computer vision algorithm uses both the visual object dictionary and the visual scene dictionary as training material. It can analyse the patterns in those dictionaries and learn to identify individual objects and put them in a simple context to describe new scenes.

In the short term, this new computer vision algorithm could help people search photo and video archives to find specific images. In the long term, it could lead to the development of robotic systems that can navigate unknown situations, according to the news release from Stanford.

The researchers have written a paper describing their approach and will present the paper at the computer vision conference, CVPR 2015, taking place in Boston in June 2015.

About the Author

Leila Meyer is a technology writer based in British Columbia. She can be reached at [email protected].

Featured

  • glowing digital brain above a chessboard with data charts and flowcharts

    Why AI Strategy Matters (and Why Not Having One Is Risky)

    If your institution hasn't started developing an AI strategy, you are likely putting yourself and your stakeholders at risk, particularly when it comes to ethical use, responsible pedagogical and data practices, and innovative exploration.

  • people collaborating around tables with a giant glowing lightbulb, surrounded by futuristic data visuals and technology icons

    California Community Colleges Google, Partner to Provide Students with AI Skills

    A new collaboration between the California Community Colleges system and Google will provide free access to AI tools and training for more than 2 million students and faculty across the state.

  • server racks, a human head with a microchip, data pipes, cloud storage, and analytical symbols

    OpenAI, Oracle Expand AI Infrastructure Partnership

    OpenAI and Oracle have announced they will develop an additional 4.5 gigawatts of data center capacity, expanding their artificial intelligence infrastructure partnership as part of the Stargate Project, a joint venture among OpenAI, Oracle, and Japan's SoftBank Group that aims to deploy 10 gigawatts of computing capacity over four years.

  • interconnected blocks of data

    Rubrik Intros Immutable Backup for Okta Environments

    Rubrik has announced Okta Recovery, extending its identity resilience platform to Okta with immutable backups and in-place recovery, while separately detailing its integration with Okta Identity Threat Protection for automated remediation.