Open Menu Close Menu


MIT CSAIL Creates Wearable AI System That Detects Conversation Tones

Mohammad Ghassemi and Tuka Alhanai (pictured above) have analyzed audio and vital-sign data to develop a deep-learning system that has the potential to serve as a "social coach" for individuals that need help navigating social situations. (Image Credit: Jason Dorfman, MIT CSAIL)

A single conversation can be interpreted in many different ways, which can make social encounters difficult for some individuals. But what if there were a way to measure social cues, like tone of voice or body language, to help us understand our interactions with other people?

Researchers from the Massachusetts Institute of Technology’s Computer Science and Artificial Intelligence Laboratory (MIT CSAIL) have come up with a potential solution: a wearable device that utilizes artificial intelligence (AI) to detect the tone of a conversation.

The research team, comprising graduate student Tuka Alhanai and PhD candidate Mohammad Ghassemi, developed a wearable AI system capable of predicting whether a conversation’s tone is happy, sad or neutral based on an individual’s speech patterns and vitals. It works by using deep-learning techniques to analyze audio, text transcriptions and physiological signals as it listens to an individual tell a story.

The team says their system could serve as a “social coach” for individuals with anxiety or other conditions, such as Asperger’s or Autism.

“Imagine if, at the end of a conversation, you could rewind it and see the moments when the people around you felt the most anxious,” said Alhanai. “Our work is a step in this direction, suggesting that we may not be that far away from a world where people can have an AI social coach right in their pocket.”

To develop the system, the researchers had individuals wear a Samsung Simband wristband, which captures high-resolution physiological waveforms to measure features like movement, heart rate and blood pressure. It also captures audio data and text transcripts to analyze tone, pitch, energy and vocabulary. Subjects were then asked to tell a happy or sad story of their choosing. A total of 31 conversations of several minutes each were collected. The team extracted 386 audio and 222 physiological features and trained two algorithms on the data. The first algorithm determined the overall tone of a conversation as either happy or sad, while the second classified each five-second block in every conversation as positive, negative or neutral.

The findings align closely with what people might expect to find in real life: long pauses and monotonous vocal tones indicated stories were judged as more sad than happy, while energetic stories had varied speech patterns. The system on average could classify the overall tone of the story with 83 percent accuracy. The mood of five-second intervals could be classified with an accuracy of about 18 percent above chance.

The researchers published their findings in the paper, “Predicting Latent Narrative Mood Using Audio and Physiological Data,” which they are presenting this week at the Association for the Advancement of Artificial Intelligence (AAAI) conference in San Francisco, CA.   

“Our next step is to improve the algorithm’s emotional granularity so it can call out boring, tense and excited moments with greater accuracy instead of just labeling interactions as ‘positive’ or ‘negative,’” said Alhani. “Developing technology that can take the pulse of human emotions has the potential to dramatically improve how we communicate with each other.”

To learn more about how the wearable AI device system works, read the paper or watch the video below.

About the Author

Sri Ravipati is Web producer for THE Journal and Campus Technology. She can be reached at [email protected].

comments powered by Disqus