Translingual visual speech synthesis
First Claim
Patent Images
1. A method of translingual synthesis of visual speech from a given audio signal in a first language, comprising the steps of:
- receiving input audio and text of the first language;
generating a phonetic alignment based on best phone boundaries using the speech recognition system of the second language and its own set of phones and mapping to convert the phones from the second language to the phones in the first language so as to get an effective alignment in the phone set of the first language;
performing a phone to viseme mapping to get a corresponding visemic alignment which generates a sequence of visemes which are to be animated to get a desired video; and
animating the sequence of viseme images to get a desired video synthesized output aligned with the input audio signals of the first language.
3 Assignments
0 Petitions
Accused Products
Abstract
A computer implemented method in a language independent system generates audio-driven facial animation given the speech recognition system for just one language. The method is based on the recognition that once alignment is generated, the mapping and the animation hardly have any language dependency in them. Translingual visual speech synthesis can be achieved if the first step of alignment generation can be made speech independent. Given a speech recognition system for a base language, the method synthesizes video with speech of any novel language as the input.
202 Citations
9 Claims
-
1. A method of translingual synthesis of visual speech from a given audio signal in a first language, comprising the steps of:
-
receiving input audio and text of the first language;
generating a phonetic alignment based on best phone boundaries using the speech recognition system of the second language and its own set of phones and mapping to convert the phones from the second language to the phones in the first language so as to get an effective alignment in the phone set of the first language;
performing a phone to viseme mapping to get a corresponding visemic alignment which generates a sequence of visemes which are to be animated to get a desired video; and
animating the sequence of viseme images to get a desired video synthesized output aligned with the input audio signals of the first language. - View Dependent Claims (2, 3)
-
-
4. A computer implemented method of implementing audio driven facial animation system in a first language, referred to as the novel language using a speech recognition system of a second language, referred to as the base language, the method comprising the steps of:
-
determining whether a correspondence exists between an audio speech signal of the novel language and a phone of the base language, and, if there is no correspondence between audio data of the novel language and a phone of the base language, identify a closest phone of the base language which best matches that of the novel language;
writing a word of the novel language into a base language database and adding it to a new vocabulary of a speech recognition system of the base language; and
using the new vocabulary to generate a line alignment of the audio speech signal with a corresponding word of the base language vocabulary. - View Dependent Claims (5, 6, 7, 8, 9)
-
Specification