Beyond Voice Recognition, to a Computer That Reads Lips From: New York Times - September 11, 2003 By: Anne Eisenberg Enabling a computer to read lip movements could significantly improve the accuracy of automatic speech recognition even in noisy environments, and researchers at IBM, Intel, and elsewhere are working on such a capability. IBM's Chalapathy Neti says a computer can be taught to integrate audio and visual input to determine what is being said with the help of cameras, statistical models, and vision algorithms. The camera picks up skin-tone pixels, the statistical models look for face-like objects, and the algorithms concentrate on the mouth area and ascertain where specific physical features - the center and corners of the lips, for instance - are located; statistical models are also employed to combine visual and audio features and predict the speaker's words. Neti and colleagues are working on systems designed to handle variables that may affect the accuracy of the camera-based system, such as inconstant lighting: Currently in the prototype stage is an audiovisual headset that features a small camera attached to a boom so that the mouth region remains visible even when the subject is walking or moving his head. Neti says the research group has also developed a feedback system that monitors confidence levels. Meanwhile, Intel researcher Ara V. Nefian says his company has created audiovisual analysis software and made it available to the public through the Open Source Computer Vision Library. The system, which recognizes four out of five words in noisy environments, can "extract visual features and then acoustic features, and combine them using a model that analyzes them jointly," Nefian explains. An audiovisual speech recognition system being developed by Northwestern University's Aggelos Katsaggelos could be used to boost security. http://www.nytimes.com/2003/09/11/technology/circuits/11next.html (Access to this site is free; however, first-time visitors must register.)