Repository logo

Voice stream-based lip animation for audio-video communication.

Loading...
Thumbnail ImageThumbnail Image

Date

Journal Title

Journal ISSN

Volume Title

Publisher

University of Ottawa (Canada)

Abstract

This thesis describes a system that uses the voice track to determine the shape of a speaker's lips for use in a model-based audio-video communication system. A parametric model (deformable template) is used to measure the shape of a speaker's lips. The system uses Linear Predictive Coding (LPC) analysis and the LPC cepstral coefficients for audio recognition. As each individual has his/her own typical lip positions while uttering various speech sounds the system is speaker dependent. A vector quantization algorithm is used to create a compact and speaker dependent cepstral coefficient to lip shape parameter mapping. This mapping is used along with the audio analysis to determine the shape of a speaker's lips from the voice stream. Animation of a parametric lip contour from the voice track provides a convenient solution to the non-trivial problem of voice-image synchronization in audio-video communication. As strong correlations can be found between the lip contour shape and the properties of the voice track it becomes possible to use the voice to directly drive the lip contour shape in the model-based video rendering/animation process. This allows the further reduction of the bit rate for audio-video storage or transmission. Parametric animation of the lip contour from the voice track signal can find obvious application to the animation of talking avatars (virtual humans). The visual information added by animating the lip contours can also increase the intelligibility of audio messages for persons with impaired hearing.

Description

Keywords

Citation

Source: Masters Abstracts International, Volume: 41-02, page: 0588.

Related Materials

Alternate Version