MIT Machine Learning Model Learns from Audio Descriptions -- Campus Technology

Breaking News

MIT Machine Learning Model Learns from Audio Descriptions

By Becky Nagel
09/18/18

abstract depiction of brain with audio signal

Computer scientists at the Massachusetts Institute of Technology have invented a new machine learning model for object recognition that incorporates audio descriptions (versus transcripts of audio) along with images.

"The model doesn't require manual transcriptions and annotations of the example [speech] it's trained on," the official announcement explained of the new method. "Instead, it learns words directly from recorded speech clips and objects in raw images, and associates them with one another."

Typically, most machine learning models that incorporate audio require transcriptions of that audio versus using the audio itself. While this current system only recognizes "several hundred words and object types," the researchers who developed it have high hopes for its future.

"We wanted to do speech recognition in a way that's more natural, leveraging additional signals and information that humans have the benefit of using, but that machine learning algorithms don't typically have access to," commented David Harwath, a researcher in MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL) and the Spoken Language Systems Group.

"There's potential there for a Babel Fish-type of mechanism," he continued.

This specific experiment is built upon a 2016 project, but with more images and data added, and with a new approach to the training. Details on how the model was trained can be found in the official announcement of the new project here.

About the Author

Becky Nagel serves as vice president of AI for 1105 Media specializing in developing media, events and training for companies around AI and generative AI technology. She also regularly writes and reports on AI news for PureAI.com, a site she founded, among others. She's the author of "ChatGPT Prompt 101 Guide for Business Users" and other popular AI resources with a real-world business perspective. She regularly speaks, writes and develops content around AI, generative AI and other business tech. Find her on X/Twitter @beckynagel.

E-Mail this page

Printable Format

Featured

Stanford 2025 AI Index Reveals Surge in Adoption, Investment, and Global Impact as Trust and Regulation Lag Behind

Stanford University's Institute for Human-Centered Artificial Intelligence (HAI) has released its AI Index Report 2025, measuring AI's diverse impacts over the past year.
Anthropic Launches Claude for Education

Anthropic has announced a version of its Claude AI assistant tailored for higher education institutions. Claude for Education "gives academic institutions secure, reliable AI access for their entire community," the company said, to enable colleges and universities to develop and implement AI-enabled approaches across teaching, learning, and administration.
Call for Speakers Now Open for Tech Tactics in Education: Overcoming Roadblocks to Innovation

The annual virtual conference from the producers of Campus Technology and THE Journal will return on September 25, 2025, with a focus on emerging trends in cybersecurity, data privacy, AI implementation, IT leadership, building resilience, and more.
From Fire TV to Signage Stick: University of Utah's Digital Signage Evolution

Jake Sorensen, who oversees sponsorship and advertising and Student Media in Auxiliary Business Development at the University of Utah, has navigated the digital signage landscape for nearly 15 years. He was managing hundreds of devices on campus that were incompatible with digital signage requirements and needed a solution that was reliable and lowered labor costs. The Amazon Signage Stick, specifically engineered for digital signage applications, gave him the stability and design functionality the University of Utah needed, along with the assurance of long-term support.

CAMPUS TECHNOLOGY NEWS

Email Address*Country*Select primary job title/function*

Please type the letters/numbers you see above.

MIT Machine Learning Model Learns from Audio Descriptions

Featured

Stanford 2025 AI Index Reveals Surge in Adoption, Investment, and Global Impact as Trust and Regulation Lag Behind

Anthropic Launches Claude for Education

Call for Speakers Now Open for Tech Tactics in Education: Overcoming Roadblocks to Innovation

From Fire TV to Signage Stick: University of Utah's Digital Signage Evolution

Portals

Artificial Intelligence

Cybersecurity

Data & Analytics

Learning Tools

Student Services

WEBCASTS

How Colleges and Universities Can Take the First Step with a Managed Services Provider

Avoiding Shiny Object Syndrome: How to Lay a Solid Foundation for Emerging Tech

How an AI-Powered Admissions Modernization effort Kicked Off a Data Transformation Journey at Illinois Tech

Whitepapers

The Pros and Cons of In-House IT

How Managed IT Services Solve Cybersecurity and Staffing Challenges and Save Colleges up to 30%

Transforming Student Engagement and Support with AI-Powered Communication

What Is Your Campus Security Posture in the Age of AI?

SPONSORED CONTENT