Text-to-speech method and system, computer program product therefor

US 8,121,841 B2
Filed: 12/16/2003
Issued: 02/21/2012
Est. Priority Date: 12/16/2003
Status: Active Grant

First Claim

Patent Images

1. A method for text-to-speech conversion of a text in a first language comprising sections in at least one second language, comprising the steps of:

converting said sections in said second language into phonemes of said second language;

mapping at least part of said phonemes of said second language onto sets of phonemes of said first language;

including said sets of phonemes of said first language resulting from said mapping in the stream of phonemes of said first language representative of said text to produce a resulting stream of phonemes; and

generating a speech signal from said resulting stream of phonemes,wherein said step of mapping comprises;

carrying out non-acoustic similarity tests between each phoneme of said phonemes of said second language being mapped and a set of candidate mapping phonemes of said first language, said similarity tests performing a category-to-category comparison between a vector representative of phonetic categories of each of said phonemes of said second language and a vector representative of phonetic categories of each of said set of candidate mapping phonemes, said similarity test being independent of said first language and said second language;

assigning respective scores to the results of said tests; and

mapping each said phoneme of said second language onto a set of mapping phonemes of said first language selected from said candidate mapping phonemes as a function of said scores.

View all claims

8 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A text-to-speech system adapted to operate on text in a first language including sections in a second language, includes a grapheme/phoneme transcriptor for converting the sections in the second language into phonemes of the second language; a mapping module configured for mapping at least part of the phonemes of the second language onto sets of phonemes of the first language; and a speech-synthesis module adapted to be fed with a resulting stream of phonemes including the sets of phonemes of the first language resulting from mapping and the stream of phonemes of the first language representative of the text, and to generate a speech signal from the resulting stream of phonemes.

Citations

17 Claims

1. A method for text-to-speech conversion of a text in a first language comprising sections in at least one second language, comprising the steps of:
- converting said sections in said second language into phonemes of said second language;
  
  mapping at least part of said phonemes of said second language onto sets of phonemes of said first language;
  
  including said sets of phonemes of said first language resulting from said mapping in the stream of phonemes of said first language representative of said text to produce a resulting stream of phonemes; and
  
  generating a speech signal from said resulting stream of phonemes,wherein said step of mapping comprises;
  
  carrying out non-acoustic similarity tests between each phoneme of said phonemes of said second language being mapped and a set of candidate mapping phonemes of said first language, said similarity tests performing a category-to-category comparison between a vector representative of phonetic categories of each of said phonemes of said second language and a vector representative of phonetic categories of each of said set of candidate mapping phonemes, said similarity test being independent of said first language and said second language;
  
  assigning respective scores to the results of said tests; and
  
  mapping each said phoneme of said second language onto a set of mapping phonemes of said first language selected from said candidate mapping phonemes as a function of said scores.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 17)
- - 2. The method of claim 1, comprising the step of mapping said phoneme of said second language into a set of mapping phonemes of said first language selected from:
    - a set of phonemes of said first language including three, two or one phonemes of said first language, oran empty set, whereby no phoneme is included in said resulting stream for said phoneme in said second language.
  - 3. The method of claim 2, wherein said step of mapping comprises:
    - defining a threshold value for the results of said tests; and
      
      mapping onto said empty set of phonemes of said first language any phoneme of said second language for which any of said scores fails to reach said threshold value.
  - 4. The method of claim 1, comprising the step of representing said phonemes of said second language and said candidate mapping phonemes of said first language as phonetic category vectors.
  - 5. The method of claim 4, comprising selecting said phonetic categories from the group of:
    - (a) two basic categories of vowel and consonant;
      
      (b) a category diphthong;
      
      (c) vowel characteristics unstressed/stressed, non-syllabic, long, nasalized, rhoticized, or rounded;
      
      (d) vowel categories front, central, or back;
      
      (e) vowel categories close, close-close-mid, close-mid, mid, open-mid, open-open-mid, or open;
      
      (f) consonant mode categories plosive, nasal, trill, tapflap, fricative, lateral-fricative, approximant, lateral, or affricate;
      
      (g) consonant place categories bilabial, labiodental, dental, alveolar, postalveolar, retroflex, palatal, velar, uvular, pharyngeal, or glottal; and
      
      (h) other consonant categories voiced, long, syllabic, aspirated, unreleased, voiceless, or semiconsonant.
  - 6. The method of claim 1, wherein said comparison is carried out on a category-to-category basis by allotting respective score values to said category-by-category comparisons, said respective score values being aggregated to generate said scores.
  - 7. The method of claim 6, comprising the step of allotting differentiated weights to said score values in aggregating said respective score values to generate said scores.
  - 8. The method of claim 1, comprising the step of pronouncing said resulting stream of phonemes by means of a speaker voice of said first language.
  - 9. The system of claim 8, wherein said speech-synthesis module is configured for pronouncing said resulting stream of phonemes by means of a speaker voice of said first language.
  - 17. A non-transitory computer readable medium encoded with a computer program product loadable in a memory of at least one computer, the computer program product comprising software portions for performing the steps of the method of claim 1.

10. A system for text-to-speech conversion of a text in a first language comprising sections in at least one second language, comprising:
- a grapheme/phoneme transcriptor for converting said sections in said second language into phonemes of said second language;
  
  a mapping module configured for mapping at least part of said phonemes of said second language onto sets of phonemes of said first language;
  
  a speech-synthesis module adapted to be fed with a resulting stream of phonemes including said sets of phonemes of said first language resulting from said mapping and the stream of phonemes of said first language representative of said text, and to generate a speech signal from said resulting stream of phonemes,wherein said mapping module is configured for;
  
  carrying out non-acoustic similarity tests between each phoneme of said phonemes of said second language being mapped and a set of candidate mapping phonemes of said first language, said similarity tests performing a category-to-category comparison between a vector representative of phonetic categories of each of said phonemes of said second language and a vector representative of phonetic categories of each of said set of candidate mapping phonemes, said similarity test being independent of said first language and said second language;
  
  assigning respective scores to the results of said tests; and
  
  mapping each said phoneme of said second language onto a set of mapping phonemes of said first language selected from said candidate mapping phonemes as a function of said scores.
- View Dependent Claims (11, 12, 13, 14, 15, 16)
- - 11. The system of claim 10, wherein said mapping module is configured for mapping said phoneme of said second language into a set of mapping phonemes of said first language selected from:
    - a set of phonemes of said first language including three, two or one phonemes of said first language, oran empty set, whereby no phoneme is included in said resulting stream for said phoneme in said second language.
  - 12. The system of claim 11, wherein said mapping module is configured for:
    - defining a threshold value for the results of said tests; and
      
      mapping onto said empty set of phonemes of said first language any phoneme of said second language for which any of said scores fails to reach said threshold value.
  - 13. The system of claim 10, wherein said phonemes of said second language and said candidate mapping phonemes of said first language are represented as phonetic category vectors.
  - 14. The system of claim 13, wherein said mapping module is configured for operating based on phonetic categories from the group of:
    - (a) two basic categories of vowel and consonant;
      
      (b) the category diphthong;
      
      (c) vowel characteristics unstressed/stressed, non-syllabic, long, nasalized, rhoticized, or rounded;
      
      (d) vowel categories front, central, or back;
      
      (e) vowel categories close, close-close-mid, close-mid, mid, open-mid, open-open-mid, or open;
      
      (f) consonant mode categories plosive, nasal, trill, tapflap, fricative, lateral-fricative, approximant, lateral, or affricate;
      
      (g) consonant place categories bilabial, labiodental, dental, alveolar, postalveolar, retroflex, palatal, velar, uvular, pharyngeal, or glottal; and
      
      (h) other consonant categories voiced, long, syllabic, aspirated, unreleased, voiceless, or semiconsonant.
  - 15. The system of claim 10, wherein said mapping module is configured for carrying out said comparison on a category-to-category basis by allotting respective score values to said category-by-category comparisons, said respective score values being aggregated to generate said scores.
  - 16. The system of claim 15, wherein said mapping module is configured for allotting differentiated weights to said score values in aggregating said respective score values to generate said scores.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Cerence Operating Company (Cerence Inc.)
Original Assignee
Loquendo SpA (Microsoft Corporation)
Inventors
Badino, Leonardo, Barolo, Claudia, Quazza, Silvia
Primary Examiner(s)
Smits, Talivaldis Ivars
Assistant Examiner(s)
ROBERTS, SHAUN A

Application Number

US10/582,849
Publication Number

US 20070118377A1
Time in Patent Office

2,989 Days
Field of Search

704/260, 704/258, 704/269
US Class Current

704/260
CPC Class Codes

G10L 13/08 Text analysis or generation...

Text-to-speech method and system, computer program product therefor

First Claim

8 Assignments

0 Petitions

Accused Products

Abstract

Citations

17 Claims

Specification

Solutions

Use Cases

Quick Links

Text-to-speech method and system, computer program product therefor

First Claim

8 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

17 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links