Coarticulation method for audio-visual text-to-speech synthesis

US 6,662,161 B1
Filed: 09/07/1999
Issued: 12/09/2003
Est. Priority Date: 11/07/1997
Status: Expired due to Term

First Claim

Patent Images

1. A method for generating a photorealistic talking head, comprising:

receiving an input stimulus;

reading data from a first library comprising one or more parameters associated with mouth shape images of sequences of at least three concatenated phonemes which correspond to the input stimulus;

reading, based on the data read from the first library, corresponding data from a second library comprising images of a talking subject; and

generating, using the data read from the second library, an animated sequence of a talking head tracking the input stimulus.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method for generating animated sequences of talking heads in text-to-speech applications wherein a processor samples a plurality of frames comprising image samples. Representative parameters are extracted from the image samples and stored in an animation library. The processor also samples a plurality of multiphones comprising images together with their associated sounds. The processor extracts parameters from these images comprising data characterizing mouth shapes, maps, rules, or equations, and stores the resulting parameters and sound information in a coarticulation library. The animated sequence begins with the processor considering an input phoneme sequence, recalling from the coarticulation library parameters associated with that sequence, and selecting appropriate image samples from the animation library based on that sequence. The image samples are concatenated together, and the corresponding sound is output, to form the animated synthesis.

Citations

15 Claims

1. A method for generating a photorealistic talking head, comprising:
- receiving an input stimulus;
  
  reading data from a first library comprising one or more parameters associated with mouth shape images of sequences of at least three concatenated phonemes which correspond to the input stimulus;
  
  reading, based on the data read from the first library, corresponding data from a second library comprising images of a talking subject; and
  
  generating, using the data read from the second library, an animated sequence of a talking head tracking the input stimulus.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. The method of claim 1, further comprising the steps of:
3. The method of claim 2, wherein the data read from the first library comprises one or more equations characterizing mouth shapes.
4. The method of claim 2, wherein said converting step is performed using a data-to-voice converter.
5. The method of claim 2, wherein the data read from the second library comprises segments of sampled images of a talking subject.
6. The method of claim 5, wherein said first library comprises a coarticulation library, and wherein said second library comprises an animation library.
7. The method of claim 5, wherein said generating step is performed by overlaying the segments onto a common interface to create frames comprising the animated sequence.
8. The method of claim 2, wherein the data read from the first library comprises mouth parameters characterizing degree of lip opening.
9. The method of claim 2, wherein said receiving, said generating, said converting, and all said reading steps are performed on a personal computer.
10. The method of claim 2, wherein said first and second libraries reside in a memory device on a computer.
11. The method of claim 1, wherein the data read from the first library comprises one or more equations characterizing mouth shapes.

12. A method for generating a photorealistic talking entity, comprising:
- receiving an input stimulus;
  
  reading, first data from a library comprising one or more parameters associated with mouth shape images of sequences of two concatenated phonemes and images of commonly-used sequences of at least three concatenated phonemes which correspond to the input stimulus;
  
  reading, based on the first data, corresponding second data comprising stored images; and
  
  generating, using the second data, an animated sequence of a talking entity tracking the input stimulus.

13. A method for generating a photorealistic talking entity, comprising:
- receiving an input stimulus;
  
  reading, based on at least one diphone, first data comprising one or more parameters associated with mouth shape images of sequences of concatenated phonemes which correspond to the input stimulus, the first data stored in a library comprising images of sequences associated with diphones and the most common images associated with triphones;
  
  reading, based on the first data, corresponding second data comprising stored images; and
  
  generating, using the second data, an animated sequence of a talking entity tracking the input stimulus.
- View Dependent Claims (14, 15)
- - 14. The method of claim 13, wherein reading first data is based on at least one triphone.
  - 15. The method of claim 13, wherein reading first data is based on at least one quadriphone.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
AT&T Corporation (AT&T, Inc.)
Inventors
Cosatto, Eric, Graf, Hans Peter, Schroeter, Juergen
Primary Examiner(s)
ABEBE, DANIEL DEMELASH

Application Number

US09/390,704
Time in Patent Office

1,554 Days
Field of Search

704/235, 704/258, 704/260, 704/246, 704/276, 704/263, 345/473, 345/474
US Class Current

704/260
CPC Class Codes

G10L 13/00 Speech synthesis; Text to s...

G10L 2021/105 Synthesis of the lips movem...

Coarticulation method for audio-visual text-to-speech synthesis

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

Citations

15 Claims

Specification

Solutions

Use Cases

Quick Links

Coarticulation method for audio-visual text-to-speech synthesis

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

15 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links