Front-end architecture for a multi-lingual text-to-speech system

US 7,496,498 B2
Filed: 03/24/2003
Issued: 02/24/2009
Est. Priority Date: 03/24/2003
Status: Expired due to Fees

First Claim

Patent Images

1. A text processing system for processing a sentence of multi-lingual text for a speech synthesizer, the text processing system comprising:

a database having sampled speech units of a first language and of a second language;

a first language dependent module for performing at least one of text and prosody analysis on a first portion of the sentence comprising the first language;

a second language dependent module for performing at least one of text and prosody analysis on a second portion of the sentence comprising the second language;

a third module adapted to receive outputs from the first and second language dependent modules and perform prosodic and phonetic context modification over the outputs based on an intonation for the entire sentence, the third module generating an output sentence; and

a speech unit concatenation module for receiving the output sentence, selecting speech units from the database corresponding to the output sentence, and concatenating the speech units to form an utterance of the output sentence.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A text processing system for processing multi-lingual text for a speech synthesizer includes a first language dependent module for performing at least one of text and prosody analysis on a portion of input text comprising a first language. A second language dependent module performs at least one of text and prosody analysis on a second portion of input text comprising a second language. A third module is adapted to receive outputs from the first and second dependent module and performs prosodic and phonetic context abstraction over the outputs based on multi-lingual text.

415 Citations

23 Claims

1. A text processing system for processing a sentence of multi-lingual text for a speech synthesizer, the text processing system comprising:
- a database having sampled speech units of a first language and of a second language;
  
  a first language dependent module for performing at least one of text and prosody analysis on a first portion of the sentence comprising the first language;
  
  a second language dependent module for performing at least one of text and prosody analysis on a second portion of the sentence comprising the second language;
  
  a third module adapted to receive outputs from the first and second language dependent modules and perform prosodic and phonetic context modification over the outputs based on an intonation for the entire sentence, the third module generating an output sentence; and
  
  a speech unit concatenation module for receiving the output sentence, selecting speech units from the database corresponding to the output sentence, and concatenating the speech units to form an utterance of the output sentence.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The text processing system of claim 1 and further comprising a text normalization module for normalizing text for processing by the first language dependent module and the second language dependent module.
  - 3. The text processing system of claim 1 and further comprising a language identifier module adapted to receive multi-lingual text and associate identifiers for portions comprising the first language and for portions comprising the second language.
  - 4. The text processing system of claim 3 and further comprising an integrator module adapted to receive outputs from each module and forward said outputs for processing to another module as appropriate.
  - 5. The text processing system of claim 4 wherein the integrator forwards said outputs to the first language dependent module and the second language dependent module as a function of associated identifiers.
  - 6. The text processing system of claim 5 wherein the first language dependent module and the second language dependent module are adapted to perform morphological analysis.
  - 7. The text processing system of claim 5 wherein the first language dependent module and the second language dependent module are adapted to perform breaking analysis.
  - 8. The text processing system of claim 5 wherein the first language dependent module and the second language dependent module are adapted to perform stress analysis.
  - 9. The text processing system of claim 5 wherein the first language dependent module and the second language dependent module are adapted to perform grapheme-to-phoneme conversion.

10. A method for text processing of multi-lingual text for a speech synthesizer, the method comprising:
- storing in a database sampled speech units of a first language and of a second language;
  
  receiving input text forming a sentence and identifying portions comprising the first language and portions comprising the second language;
  
  performing at least one of text and prosody analysis on the portions comprising the first language with a first language dependent module and performing at least one of text and prosody analysis on the portions comprising the second language with a second language dependent module;
  
  receiving outputs from the first and second language dependent modules;
  
  performing prosodic and phonetic context analysis over the outputs together based on a position in the sentence of each portion relative to the other portions and generating an output sentence;
  
  selecting speech units from the database corresponding to the output sentence; and
  
  concatenating the selected speech units to form an utterance of the output sentence.
- View Dependent Claims (11, 12, 13, 14, 15, 16)
- - 11. The method of claim 10 and further comprising normalizing the input text.
  - 12. The method of claim 10 wherein identifying portions comprises associating identifiers to each of the portions.
  - 13. The method of claim 12 and further comprising forwarding portions to the first language dependent module and the second language dependent module as a function of identifiers associated with the portions.
  - 14. The method of claim 10 and further comprising identifying portions of the text as a function of order in the text.
  - 15. The method of claim 10 wherein performing prosodic and phonetic context analysis comprises outputting a symbolic description of prosody for the multi-lingual text.
  - 16. The method of claim 10 wherein performing prosodic and phonetic context analysis comprises outputting a numerical description of prosody for the multi-lingual text.

17. A computer readable storage media having instructions stored thereon, that when executed by a processor, perform speech synthesis, the instructions comprising:
- a database having sampled speech units of a first language and of a second language;
  
  a text processing module including;
  
  a first language dependent module for performing at least one of text and prosody analysis on a first portion of input text from a sentence comprising the first language;
  
  a second language dependent module for performing at least one of text and prosody analysis on a second portion of input text from the sentence comprising a second language;
  
  a third module adapted to receive outputs from the first and second language dependent modules and perform prosodic and phonetic context modification over the outputs based on an intonation for the sentence using a combination of the first portion and the second portion of input text; and
  
  a speech unit concatenation and synthesis module adapted to receive an output from the third module, select speech units from the database corresponding to the output from the third module, concatenate the selected speech units to form an utterance of the output from the third module, and generate synthesized speech waveforms of the utterance.
- View Dependent Claims (18, 19, 20, 21, 22, 23)
- - 18. The computer readable media claim of 17 wherein the third module provides a symbolic description of prosody for the output and wherein the synthesis module comprises a concatenation module.
  - 19. The computer readable media claim of 17 wherein the third module provides a numeric description of prosody for the output and wherein the synthesis module comprises a generation module.
  - 20. The computer readable media claim of 17 and further comprising a text normalization module for normalizing text for processing by the first language dependent module and the second language dependent module.
  - 21. The computer readable media of claim 17 and further comprising a language identifier module adapted to receive multi-lingual text and associate identifiers for portions comprising the first language and for portions comprising the second language.
  - 22. The computer readable media of claim 21 and further comprising an integrator module adapted to receive outputs from each module and forward said outputs for processing to another module as appropriate.
  - 23. The computer readable media of claim 22 wherein the integrator forwards said outputs to the first language dependent module and the second language dependent module as a function of associated identifiers.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Zhao, Yong, Peng, Hu, Chu, Min
Primary Examiner(s)
Hudspeth; David R
Assistant Examiner(s)
Rider; Justin W

Application Number

US10/396,944
Publication Number

US 20040193398A1
Time in Patent Office

2,164 Days
Field of Search

704/260, 704/4
US Class Current

704/4
CPC Class Codes

G10L 13/08 Text analysis or generation...

Front-end architecture for a multi-lingual text-to-speech system

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

415 Citations

23 Claims

Specification

Solutions

Use Cases

Quick Links

Front-end architecture for a multi-lingual text-to-speech system

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

415 Citations

23 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links