Translingual visual speech synthesis

US 6,813,607 B1
Filed: 01/31/2000
Issued: 11/02/2004
Est. Priority Date: 01/31/2000
Status: Expired due to Fees

First Claim

Patent Images

1. A method of translingual synthesis of visual speech from a given audio signal in a first language, comprising the steps of:

receiving input audio and text of the first language;

generating a phonetic alignment based on best phone boundaries using the speech recognition system of the second language and its own set of phones and mapping to convert the phones from the second language to the phones in the first language so as to get an effective alignment in the phone set of the first language;

performing a phone to viseme mapping to get a corresponding visemic alignment which generates a sequence of visemes which are to be animated to get a desired video; and

animating the sequence of viseme images to get a desired video synthesized output aligned with the input audio signals of the first language.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A computer implemented method in a language independent system generates audio-driven facial animation given the speech recognition system for just one language. The method is based on the recognition that once alignment is generated, the mapping and the animation hardly have any language dependency in them. Translingual visual speech synthesis can be achieved if the first step of alignment generation can be made speech independent. Given a speech recognition system for a base language, the method synthesizes video with speech of any novel language as the input.

202 Citations

9 Claims

1. A method of translingual synthesis of visual speech from a given audio signal in a first language, comprising the steps of:
- receiving input audio and text of the first language;
  
  generating a phonetic alignment based on best phone boundaries using the speech recognition system of the second language and its own set of phones and mapping to convert the phones from the second language to the phones in the first language so as to get an effective alignment in the phone set of the first language;
  
  performing a phone to viseme mapping to get a corresponding visemic alignment which generates a sequence of visemes which are to be animated to get a desired video; and
  
  animating the sequence of viseme images to get a desired video synthesized output aligned with the input audio signals of the first language.
- View Dependent Claims (2, 3)
- - 2. The method of translingual synthesis of visual speech of claim 1, wherein the step of performing phone to viseme mapping is performed using a viseme database in the second language.
  - 3. The method of translingual synthesis of visual speech of claim 1, wherein the step of performing phone to viseme mapping is performed using a viseme database in the first language.

4. A computer implemented method of implementing audio driven facial animation system in a first language, referred to as the novel language using a speech recognition system of a second language, referred to as the base language, the method comprising the steps of:
- determining whether a correspondence exists between an audio speech signal of the novel language and a phone of the base language, and, if there is no correspondence between audio data of the novel language and a phone of the base language, identify a closest phone of the base language which best matches that of the novel language;
  
  writing a word of the novel language into a base language database and adding it to a new vocabulary of a speech recognition system of the base language; and
  
  using the new vocabulary to generate a line alignment of the audio speech signal with a corresponding word of the base language vocabulary.
- View Dependent Claims (5, 6, 7, 8, 9)
- - 5. The computer implemented method of implementing audio driven facial animation system of claim 4, wherein the phonetically closest phone is chosen.
  - 6. The computer implemented method of implementing audio driven facial animation system of claim 4, wherein the visemically closest phone is chosen.
  - 7. The computer implemented method of implementing audio driven facial animation system of claim 4, the corresponding word of the base language vocabulary is a phonetic word.
  - 8. The computer implemented method of implementing audio driven facial animation system of claim 4, the corresponding word of the base language vocabulary is a visemic word.
  - 9. The computer implemented method of implementing audio driven facial animation system of claim 8, further comprising the step of using the time alignment system of the audio speech signal with a corresponding visemic word of the base language vocabulary to drive images in video animation for generating an animated video in the facial animation system in the first language.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Pendragon Networks LLC (Pendrell Corporation)
Original Assignee
International Business Machines Corporation
Inventors
Neti, Chalapathy, Subramaniam, L. Venkata, Rajput, Nitendra, Verma, Ashish, Faruquie, Tanveer Afzal
Primary Examiner(s)
To, Doris H.
Assistant Examiner(s)
Opsasnick, Michael N.

Application Number

US09/494,582
Time in Patent Office

1,737 Days
Field of Search

704/270, 704/258, 704/235, 704/275, 704/260, 704/276, 345/473, 434/185
US Class Current

704/276
CPC Class Codes

G10L 13/00   Speech synthesis; Text to s...

G10L 13/08   Text analysis or generation...

G10L 15/00   Speech recognition G10L17/0...

G10L 15/26   Speech to text systems G10L...

G10L 2021/105   Synthesis of the lips movem...

G10L 21/06   Transformation of speech in...

Translingual visual speech synthesis

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

202 Citations

9 Claims

Specification

Solutions

Use Cases

Quick Links

Translingual visual speech synthesis

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

202 Citations

9 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links