Text-to-speech conversion system for interlocking with multimedia and a method for organizing input data of the same

US 6,088,673 A
Filed: 02/09/1998
Issued: 07/11/2000
Est. Priority Date: 05/08/1997
Status: Expired

First Claim

Patent Images

1. A text-to-speech conversion system for interlocking with multimedia comprising;

a multimedia information input unit for organizing text, prosody information, information on synchronization with a moving picture, lip-shape information, picture information, and individual property information including a gender, age, accent, pronunciation and speech rate of synthesized speech;

a data distributor for distributing the information from said multimedia information input unit into information for each media;

a language processor for converting the text distributed by said data distributor into a phoneme stream, presuming prosody information and symbolizing the presumed prosody information;

a prosody processor for calculating a prosody control parameter value from the symbolized prosody information from the language processor;

a synchronization adjuster for adjusting a duration of each phoneme using the synchronization information distributed by said data distributor;

a synthesis unit database for receiving the individual property information from said data distributor, selecting synthesis units adaptable to gender and age and outputting data required for synthesis;

a signal processor for producing a synthesized speech using the prosody control parameter and the data output from said synthesis unit database; and

a picture output apparatus for outputting the picture information distributed by said data distributor onto a screen.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present invention provides a text-to-speech conversion system (TTS) for interlocking with multimedia and a method for organizing input data of the TTS which can enhance the natural of synthesized speech and accomplish the synchronization of multimedia with TTS by defining additional prosody information, the information required to interlock TTS with multimedia, and interface between these information and TTS for use in the production of the synthesized speech.

Citations

13 Claims

1. A text-to-speech conversion system for interlocking with multimedia comprising;
- a multimedia information input unit for organizing text, prosody information, information on synchronization with a moving picture, lip-shape information, picture information, and individual property information including a gender, age, accent, pronunciation and speech rate of synthesized speech;
  
  a data distributor for distributing the information from said multimedia information input unit into information for each media;
  
  a language processor for converting the text distributed by said data distributor into a phoneme stream, presuming prosody information and symbolizing the presumed prosody information;
  
  a prosody processor for calculating a prosody control parameter value from the symbolized prosody information from the language processor;
  
  a synchronization adjuster for adjusting a duration of each phoneme using the synchronization information distributed by said data distributor;
  
  a synthesis unit database for receiving the individual property information from said data distributor, selecting synthesis units adaptable to gender and age and outputting data required for synthesis;
  
  a signal processor for producing a synthesized speech using the prosody control parameter and the data output from said synthesis unit database; and
  
  a picture output apparatus for outputting the picture information distributed by said data distributor onto a screen.

2. A method for organizing input data of a text-to-speech conversion system for interlocking with multimedia, said method comprising the steps of:
- (a) classifying multimedia input information organized for enhancing natural synthesized speech and implementing synchronization of multimedia with text-to-speech into text, prosody information, information on synchronization with a moving picture, lip-shaped information, picture information, and individual property information using a multimedia information input unit;
  
  (b) distributing using a data distributor the multimedia input information classified in the multimedia information input unit based on respective information;
  
  (c) converting the text distributed by the data distributor into a phoneme stream, presuming prosody information and symbolizing the presumed prosody information using a language processor;
  
  (d) calculating a prosody control parameter value which is not included in the multimedia input information using a prosody processor;
  
  (e) adjusting a duration of each phoneme using a synchronization adjuster so as to synchronize a processing result of the prosody processor with a picture signal according to the synchronization information distributed by the data distributor;
  
  (f) selecting synthesis units adaptable to gender and age based on the individual property information from the data distributor using a synthesis unit database and outputting data required for synthesis;
  
  (g) producing synthesized speech using a signal processor based on the prosody information distributed by the data distributor, a processing result of the synchronization adjuster, and the data from the synthesis unit database; and
  
  (h) outputting the picture information distributed by the data distributor onto a screen using a picture output unit.
- View Dependent Claims (3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
- - 3. The method in accordance with claim 2, wherein the organized multimedia information comprises text information, prosody information, information on synchronization with a moving picture, lip-shaped information, and individual property information.
  - 4. The method in accordance with claim 3, wherein the prosody information comprises a number of phoneme, phoneme stream information, duration of each phoneme, pitch pattern of the phoneme, and energy pattern of the phoneme.
  - 5. The method in accordance with claim 4, wherein the duration time of the phoneme is indicative of a value of pitch at a beginning point, a mid point, and an end point within the phoneme.
  - 6. The method in accordance with claim 5, wherein the energy pattern of the phoneme is indicative of a value of energy in decibels at the beginning point, the mid point, and the end point within the phoneme.
  - 7. The method in accordance with claim 3, wherein the synchronization information comprises text, lip-shape, location information with a moving picture, and duration information.
  - 8. The method in accordance with claim 3, wherein the synchronization information comprises a beginning point, duration and delay time information of a starting point, and duration of each phoneme is controlled by the synchronization information.
  - 9. The method in accordance with claim 3, wherein the synchronization information is composed of a duration of a beginning point of a sentence, a duration information of a starting point, and duration of each phoneme is controlled by forecast lip-shape considered an articulation manner of the phoneme and articulation control of lip-shape within the synchronization and duration information of the synchronization information.
  - 10. The method in accordance with claim 3, wherein the synthesized speech is produced based on beginning point information, end point information, and phoneme information for each phoneme within an interval associated with a speech signal.
  - 11. The method in accordance with claim 3, wherein the synthesized speech is produced based on a distance of an opening between an upper lip and a lower lip, a distance between end points of the lips, and an extent of projection of a lip, and a lip-shape quantized and normalized pattern is defined depending on articulation location and articulation manner of the phoneme on a basis of pattern with discriminative property.
  - 12. The method in accordance with claim 3, wherein if the multimedia input information comprises prosody information, further comprising the steps of:
    - (i) converting the prosody information into a data structure recognizable by the signal processor; and
      
      (j) transmitting the converted prosody information the prosody processor and the synchronization adjustor.
  - 13. The method in accordance with claim 3, wherein if the multimedia input information includes individual property information, further comprising the steps of:
    - (k) converting the individual property information into a data structure recognizable by the synthesis unit database and the prosody processor within the text-to-speech;
      
      (l) transmitting the converted individual property information to the synthesis unit database and the prosody processor.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Electronics and Telecommunications Research Institute
Original Assignee
Electronics and Telecommunications Research Institute
Inventors
Hahn, Min Soo, Lee, Hang Seop, Lee, Jung Chul
Primary Examiner(s)
Hudspeth, David R.
Assistant Examiner(s)
Chawan, Vijay B

Application Number

US09/020,712
Time in Patent Office

883 Days
Field of Search

704/260, 704/275, 704/278, 704/276, 704/220, 704/266, 704/257, 704/267, 345/302, 379/93.17, 707/515
US Class Current

704/260
CPC Class Codes

G10L 13/00 Speech synthesis; Text to s...

G10L 2021/105 Synthesis of the lips movem...

Text-to-speech conversion system for interlocking with multimedia and a method for organizing input data of the same

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

13 Claims

Specification

Solutions

Use Cases

Quick Links

Text-to-speech conversion system for interlocking with multimedia and a method for organizing input data of the same

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

13 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links