Researchers Add Sound-Based Gesture Recognition to Commodity Computer

A small team of Microsoft and University of Washington researchers are developing a technology that will allow ordinary computers--and eventually mobile devices--to detect gestures and motions in order to control them. SoundWave, as it's called, uses the speaker and microphone already built into most computers to sense in-air actions, such as a wave of the hand to specify an action like, "scroll the screen up" or "scroll it down."

In a short report on their work, the researchers have explained that the approach uses an "inaudible tone" that gets "frequency-shifted" when it bounces off of moving objects, such as a waving hand. The shift is measured with the microphone.

Other recognition mechanisms, such as that found in the Xbox Kinect, require ample processing power and are sometimes sensitive to environmental conditions or show a lag in recognizing a movement. In a video demonstration, however, SoundWave worked in real-time in a fairly dark environment with the ambient sounds of a coffee house. Low lighting situations don't create a barrier because it doesn't require line of site. Also, it continued working while the computer was playing music in a separate window.

Besides flapping a hand to generate scrolling, other gestures the scientists have programmed into SoundWave include:

  • Two hands moving in opposite directions to have the program rotate an object in the application;
  • A single tap or a double tap to mimic mouse or touchpad activities;
  • A hand flick to specify browsing, such as with photos;
  • Walking toward or away from the device--a "sustained motion"--to wake the system up or put it to sleep.

The approach does have several limitations, the researchers noted. The primary one is that the tone used to recognize the gesture may distress children and animals, whose hearing is more sensitive to sounds playing at a higher frequency. Also some devices filter out tones over 18 kilohertz, and this technique generates tones between 18 and 22 kHz. That obstacle could be mitigated by "piggy-backing" a tone onto a user's digital music, the report suggested. Also, the software can't detect the lack of motion, which means it would need to integrate complementary techniques to cover static designations.

The project is a result of work done by Microsoft Research and ubicomp lab, the ubiquitous computing research lab at U Washington in Seattle. That lab does research in a number of areas, including user interface technology, energy sensing, and activity recognition.

About the Author

Dian Schaffhauser is a former senior contributing editor for 1105 Media's education publications THE Journal, Campus Technology and Spaces4Learning.

Featured