MIT CSAIL Creates Wearable AI System That Detects Conversation Tones

Mohammad Ghassemi and Tuka Alhanai (pictured above) have analyzed audio and vital-sign data to develop a deep-learning system that has the potential to serve as a "social coach" for individuals that need help navigating social situations. (Image Credit: Jason Dorfman, MIT CSAIL)

A single conversation can be interpreted in many different ways, which can make social encounters difficult for some individuals. But what if there were a way to measure social cues, like tone of voice or body language, to help us understand our interactions with other people?

Researchers from the Massachusetts Institute of Technology’s Computer Science and Artificial Intelligence Laboratory (MIT CSAIL) have come up with a potential solution: a wearable device that utilizes artificial intelligence (AI) to detect the tone of a conversation.

The research team, comprising graduate student Tuka Alhanai and PhD candidate Mohammad Ghassemi, developed a wearable AI system capable of predicting whether a conversation’s tone is happy, sad or neutral based on an individual’s speech patterns and vitals. It works by using deep-learning techniques to analyze audio, text transcriptions and physiological signals as it listens to an individual tell a story.

The team says their system could serve as a “social coach” for individuals with anxiety or other conditions, such as Asperger’s or Autism.

“Imagine if, at the end of a conversation, you could rewind it and see the moments when the people around you felt the most anxious,” said Alhanai. “Our work is a step in this direction, suggesting that we may not be that far away from a world where people can have an AI social coach right in their pocket.”

To develop the system, the researchers had individuals wear a Samsung Simband wristband, which captures high-resolution physiological waveforms to measure features like movement, heart rate and blood pressure. It also captures audio data and text transcripts to analyze tone, pitch, energy and vocabulary. Subjects were then asked to tell a happy or sad story of their choosing. A total of 31 conversations of several minutes each were collected. The team extracted 386 audio and 222 physiological features and trained two algorithms on the data. The first algorithm determined the overall tone of a conversation as either happy or sad, while the second classified each five-second block in every conversation as positive, negative or neutral.

The findings align closely with what people might expect to find in real life: long pauses and monotonous vocal tones indicated stories were judged as more sad than happy, while energetic stories had varied speech patterns. The system on average could classify the overall tone of the story with 83 percent accuracy. The mood of five-second intervals could be classified with an accuracy of about 18 percent above chance.

The researchers published their findings in the paper, “Predicting Latent Narrative Mood Using Audio and Physiological Data,” which they are presenting this week at the Association for the Advancement of Artificial Intelligence (AAAI) conference in San Francisco, CA.   

“Our next step is to improve the algorithm’s emotional granularity so it can call out boring, tense and excited moments with greater accuracy instead of just labeling interactions as ‘positive’ or ‘negative,’” said Alhani. “Developing technology that can take the pulse of human emotions has the potential to dramatically improve how we communicate with each other.”

To learn more about how the wearable AI device system works, read the paper or watch the video below.

About the Author

Sri Ravipati is Web producer for THE Journal and Campus Technology. She can be reached at [email protected].

Featured

  • glowing brain above stacked coins

    The Higher Ed Playbook for AI Affordability

    Fulfilling the promise of AI in higher education does not require massive budgets or radical reinvention. By leveraging existing infrastructure, embracing edge and localized AI, collaborating across institutions, and embedding AI thoughtfully across the enterprise, universities can move from experimentation to impact.

  • business man using smart phone in office

    Microsoft Copilot Adds Voice Commands, Teams Collaboration, Local Data Processing

    Microsoft has introduced new features within its Microsoft 365 Copilot offering, aimed at making further foothold in the enterprise, including voice-based interaction, group collaboration tools, and an expansion of in-country data processing.

  • hand typing on laptop with security and email icons

    Copilot Gets Expanded Role in Office, Outlook, and Security

    Microsoft has doubled down on its Copilot strategy, announcing new agents and capabilities that bring deeper intelligence and automation to everyday workflows in Microsoft 365.

  • Santa Clara University School of Engineering

    "Engineering and the Good Life" at Santa Clara University

    An ethics across the curriculum program at Santa Clara University's School of Engineering supports ethical reflection in engineering design and encourages each student to consider what it means to them to be an engineer.