Coarticulation method for audio-visual text-to-speech synthesis

US 7,117,155 B2
Filed: 10/01/2003
Issued: 10/03/2006
Est. Priority Date: 09/07/1999
Status: Active Grant

First Claim

Patent Images

1. A method for generating a noise-producing entity, comprising:

reading first data comprising one or more parameters associated with noise-producing orifice images of sequences of at least three concatenated phonemes which correspond to an input stimulus;

reading, based on the first data, corresponding second data comprising images of a noise-producing entity; and

generating, using the second data, an animated sequence of the noise-producing entity tracking the input stimulus.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method for generating animated sequences of talking heads in text-to-speech applications wherein a processor samples a plurality of frames comprising image samples. Representative parameters are extracted from the image samples and stored in an animation library. The processor also samples a plurality of multiphones comprising images together with their associated sounds. The processor extracts parameters from these images comprising data characterizing mouth shapes, maps, rules, or equations, and stores the resulting parameters and sound information in a coarticulation library. The animated sequence begins with the processor considering an input phoneme sequence, recalling from the coarticulation library parameters associated with that sequence, and selecting appropriate image samples from the animation library based on that sequence. The image samples are concatenated together, and the corresponding sound is output, to form the animated synthesis.

Citations

22 Claims

1. A method for generating a noise-producing entity, comprising:
- reading first data comprising one or more parameters associated with noise-producing orifice images of sequences of at least three concatenated phonemes which correspond to an input stimulus;
  
  reading, based on the first data, corresponding second data comprising images of a noise-producing entity; and
  
  generating, using the second data, an animated sequence of the noise-producing entity tracking the input stimulus.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. The method of claim 1, further comprising:
    - reading acoustic data associated with the second data;
      
      converting the acoustic data into sound; and
      
      outputting the sound synchronously with the animated sequence of the noise-producing entity.
  - 3. The method of claim 1, wherein the first data comprises one or more equations characterizing noise-producing orifice shapes.
  - 4. The method of claim 2, wherein the first data comprises one or more equations characterizing noise-producing orifice shapes.
  - 5. The method of claim 2, wherein the converting step is performed using a data-to-sound converter.
  - 6. The method of claim 2, wherein the first data comprises segments of sampled images of a noise-producing subject.
  - 7. The method of claim 2, wherein the second data comprises parameters associated with a noise-producing orifice degree of opening.
  - 8. The method of claim 2, wherein the receiving, generating, converting and reading steps are performed on a personal computer.
  - 9. The method of claim 2, wherein the first data and second data reside in a memory device on a computing device.
  - 10. The method of claim 6, wherein the first data comprises animation data, and the second data comprises coarticulation data.
  - 11. The method of claim 6, wherein the generating step is performed by overlaying the segments onto a common interface to create frames comprising the animation sequence.

12. A noise-producing animated entity generated by a method comprising:
- reading first data comprising one or more parameters associated with noise-producing orifice images of sequences of at least three concatenated phonemes which correspond to an input stimulus;
  
  reading, based on the first data, corresponding second data comprising images of a noise-producing entity; and
  
  generating, using the second data, an animated sequence of the noise-producing entity tracking the input stimulus.
- View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
- - 13. The noise-producing animated entity of claim 12, wherein the method further comprises:
    - reading acoustic data associated with the second data;
      
      converting the acoustic data into sound; and
      
      outputting the sound synchronously with the animated sequence of the noise-producing entity.
  - 14. The noise-producing animated entity of claim 12, wherein the first data comprises one or more equations characterizing noise-producing orifice shapes.
  - 15. The noise-producing animated entity of claim 13, wherein the first data comprises one or more equations characterizing noise-producing orifice shapes.
  - 16. The noise-producing animated entity of claim 13, wherein the converting step is performed using a data-to-sound converter.
  - 17. The noise-producing animated entity of claim 13, wherein the first data comprises segments of sampled images of a noise-producing subject.
  - 18. The noise-producing animated entity of claim 13, wherein the second data comprises parameters associated with a noise-producing orifice degree of opening.
  - 19. The noise-producing animated entity of claim 13, wherein the receiving, generating, converting and reading steps are performed on a personal computer.
  - 20. The noise-producing animated entity of claim 13, wherein the first data and second data reside in a memory device on a computing device.
  - 21. The noise-producing animated entity of claim 17, wherein the first data comprises animation data, and the second data comprises coarticulation data.
  - 22. The noise-producing animated entity of claim 17, wherein the generating step is performed by overlaying the segments onto a common interface to create frames comprising the animation sequence.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
AT&T Corporation (AT&T, Inc.)
Inventors
Cosatto, Eric, Graf, Hans Peter, Schroeter, Juergen
Primary Examiner(s)
ABEBE, DANIEL DEMELASH

Application Number

US10/676,630
Publication Number

US 20040064321A1
Time in Patent Office

1,098 Days
Field of Search

704/260, 345/706
US Class Current

704/260
CPC Class Codes

G10L 13/00 Speech synthesis; Text to s...

G10L 2021/105 Synthesis of the lips movem...

Coarticulation method for audio-visual text-to-speech synthesis

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

Citations

22 Claims

Specification

Solutions

Use Cases

Quick Links

Coarticulation method for audio-visual text-to-speech synthesis

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

22 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links