The task of Lip Reading involves two main steps. The first step is localization of the user's mouth and the second step is tracking the movement of the lips. For the first step, we are investigating the use of a novel technique known as Orientation Template Correlation which searches for features such as the nose and eyes based on their characteristic orientation maps. This technique provides us with a rectangle containing the lips of the speaker. A number of methods have been explored in the literature for the second step. We are currently considering employing an optical flow technique (similar to that of Mase and Pentland) within the box for the purposes of measuring the mouth elongation E(t) and mouth separation O(t) as a function of time. A number of other characteristics, such as presence of the tongue for the formation of the th sound, may additionally be measured, also as a function of time. These temporal waveforms will then be sent to an as yet unexplored classification stage for the purposes of word or short sentence recognition.
Back to the home page for CNS/EE248