MIT CSAIL Creates Wearable AI System That Detects Conversation Tones

Mohammad Ghassemi and Tuka Alhanai (pictured above) have analyzed audio and vital-sign data to develop a deep-learning system that has the potential to serve as a "social coach" for individuals that need help navigating social situations. (Image Credit: Jason Dorfman, MIT CSAIL)

A single conversation can be interpreted in many different ways, which can make social encounters difficult for some individuals. But what if there were a way to measure social cues, like tone of voice or body language, to help us understand our interactions with other people?

Researchers from the Massachusetts Institute of Technology’s Computer Science and Artificial Intelligence Laboratory (MIT CSAIL) have come up with a potential solution: a wearable device that utilizes artificial intelligence (AI) to detect the tone of a conversation.

The research team, comprising graduate student Tuka Alhanai and PhD candidate Mohammad Ghassemi, developed a wearable AI system capable of predicting whether a conversation’s tone is happy, sad or neutral based on an individual’s speech patterns and vitals. It works by using deep-learning techniques to analyze audio, text transcriptions and physiological signals as it listens to an individual tell a story.

The team says their system could serve as a “social coach” for individuals with anxiety or other conditions, such as Asperger’s or Autism.

“Imagine if, at the end of a conversation, you could rewind it and see the moments when the people around you felt the most anxious,” said Alhanai. “Our work is a step in this direction, suggesting that we may not be that far away from a world where people can have an AI social coach right in their pocket.”

To develop the system, the researchers had individuals wear a Samsung Simband wristband, which captures high-resolution physiological waveforms to measure features like movement, heart rate and blood pressure. It also captures audio data and text transcripts to analyze tone, pitch, energy and vocabulary. Subjects were then asked to tell a happy or sad story of their choosing. A total of 31 conversations of several minutes each were collected. The team extracted 386 audio and 222 physiological features and trained two algorithms on the data. The first algorithm determined the overall tone of a conversation as either happy or sad, while the second classified each five-second block in every conversation as positive, negative or neutral.

The findings align closely with what people might expect to find in real life: long pauses and monotonous vocal tones indicated stories were judged as more sad than happy, while energetic stories had varied speech patterns. The system on average could classify the overall tone of the story with 83 percent accuracy. The mood of five-second intervals could be classified with an accuracy of about 18 percent above chance.

The researchers published their findings in the paper, “Predicting Latent Narrative Mood Using Audio and Physiological Data,” which they are presenting this week at the Association for the Advancement of Artificial Intelligence (AAAI) conference in San Francisco, CA.   

“Our next step is to improve the algorithm’s emotional granularity so it can call out boring, tense and excited moments with greater accuracy instead of just labeling interactions as ‘positive’ or ‘negative,’” said Alhani. “Developing technology that can take the pulse of human emotions has the potential to dramatically improve how we communicate with each other.”

To learn more about how the wearable AI device system works, read the paper or watch the video below.

About the Author

Sri Ravipati is Web producer for THE Journal and Campus Technology. She can be reached at [email protected].

Featured

  • cybersecurity book with a shield and padlock

    NIST Proposes New Cybersecurity Guidelines for AI Systems

    The National Institute of Standards and Technology has unveiled plans to issue a new set of cybersecurity guidelines aimed at safeguarding artificial intelligence systems, citing rising concerns over risks tied to generative models, predictive analytics, and autonomous agents.

  • student and teacher using AI-enabled laptops, with rising arrows on a graph

    Student and Teacher AI Use Jumps Nearly 30% in One Year

    In a recent survey from learning platform Quizlet, 85% of high school and college students and teachers said they use AI technology, compared to 66% in 2024 — a 29% increase year over year.

  • central cloud platform connected to various AI icons—including a brain, robot, and network nodes

    Linux Foundation to Host Protocol for AI Agent Interoperability

    The Linux Foundation has announced it will host the Agent2Agent (A2A) protocol project, an open standard originally developed by Google to support secure communication and interoperability among AI agents.

  • A Comprehensive Guide to the Best Value Evaluation Systems

    Choosing the most cost-effective evaluation system requires balancing price, usability and insight quality. In a landscape full of digital tools and data demands, it is important to prioritize platforms that deliver clear results without complicating operations.