Pronunciation correction of text-to-speech systems between different spoken languages

US 8,290,775 B2
Filed: 06/29/2007
Issued: 10/16/2012
Est. Priority Date: 06/29/2007
Status: Active Grant

First Claim

Patent Images

1. A method of correcting pronunciation generation of a language pronunciation system, comprising:

receiving a word according to an incoming language requiring electronic pronunciation according to a target language;

determining whether the word requiring electronic pronunciation is a word of the target language;

if the word requiring electronic pronunciation is not a word of the target language, retrieving a language locale for the word;

determining whether the language locale for the word matches a language locale for a pronunciation system responsible for converting the word to speech or recognizing a spoken form of the word;

generating a number of phoneme mapping tables, the number of phoneme mapping tables being governed by N=L²−

L, wherein N comprises the number of phoneme mapping tables and L comprises a number of the language locales between which translation is accomplished, each of the language locales comprising a country known to speak a foreign language;

if the language locale for the word does not match the language locale for a pronunciation system responsible for converting the word to speech or for recognizing an audible form of the word, mapping phonemes comprising the word according to the incoming language to corresponding phonemes associated with the target language, wherein mapping the phonemes comprises mapping at least one diphone from the incoming language to at least one diphone in the target language, the at least one diphone comprising two adjacent speech segments, the two adjacent speech segments comprising two adjacent letters in an actual spelling of the word according to the incoming language, wherein mapping the phonemes further comprises utilizing contextual data, the contextual data comprising at least one of;

at least one of a starting phoneme and a next phoneme before a subject phoneme in the incoming language word, wherein the at least one of the starting phoneme and the next phoneme contributes to the determination of a phoneme in the target language selected for mapping to the subject phoneme in the incoming language word; and

at least one of a starting phoneme and a next phoneme after a subject phoneme in the starting language word, wherein the at least one of the starting phoneme and the next phoneme contributes to the determination of a phoneme in the target language selected for mapping to the subject phoneme in the incoming language word; and

passing an output of the mapping of phonemes comprising the word according to the incoming language to corresponding phonemes associated with the target language to the pronunciation system for converting the word to speech or for recognizing an audible form of the word.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Pronunciation correction for text-to-speech (TTS) systems and speech recognition (SR) systems between different languages is provided. If a word requiring pronunciation by a target language TTS or SR is from a same language as the target language, but is not found in a lexicon of words from the target language, a letter-to-speech (LTS) rules set of the target language is used to generate a letter-to-speech output for the word for use by the TTS or SR configured according to the target language. If the word is from a different language as the target language, phonemes comprising the word according to its native language are mapped to phonemes of the target language. The phoneme mapping is used by the TTS or SR configured according to the target language for generating or recognizing an audible form of the word according to the target language.

Citations

19 Claims

1. A method of correcting pronunciation generation of a language pronunciation system, comprising:
- receiving a word according to an incoming language requiring electronic pronunciation according to a target language;
  
  determining whether the word requiring electronic pronunciation is a word of the target language;
  
  if the word requiring electronic pronunciation is not a word of the target language, retrieving a language locale for the word;
  
  determining whether the language locale for the word matches a language locale for a pronunciation system responsible for converting the word to speech or recognizing a spoken form of the word;
  
  generating a number of phoneme mapping tables, the number of phoneme mapping tables being governed by N=L²−
  
  L, wherein N comprises the number of phoneme mapping tables and L comprises a number of the language locales between which translation is accomplished, each of the language locales comprising a country known to speak a foreign language;
  
  if the language locale for the word does not match the language locale for a pronunciation system responsible for converting the word to speech or for recognizing an audible form of the word, mapping phonemes comprising the word according to the incoming language to corresponding phonemes associated with the target language, wherein mapping the phonemes comprises mapping at least one diphone from the incoming language to at least one diphone in the target language, the at least one diphone comprising two adjacent speech segments, the two adjacent speech segments comprising two adjacent letters in an actual spelling of the word according to the incoming language, wherein mapping the phonemes further comprises utilizing contextual data, the contextual data comprising at least one of;
  
  at least one of a starting phoneme and a next phoneme before a subject phoneme in the incoming language word, wherein the at least one of the starting phoneme and the next phoneme contributes to the determination of a phoneme in the target language selected for mapping to the subject phoneme in the incoming language word; and
  
  at least one of a starting phoneme and a next phoneme after a subject phoneme in the starting language word, wherein the at least one of the starting phoneme and the next phoneme contributes to the determination of a phoneme in the target language selected for mapping to the subject phoneme in the incoming language word; and
  
  passing an output of the mapping of phonemes comprising the word according to the incoming language to corresponding phonemes associated with the target language to the pronunciation system for converting the word to speech or for recognizing an audible form of the word.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1, wherein determining whether the word requiring electronic pronunciation is a word of the target language includes passing the word to a word lexicon associated with the target language to determine whether the word is contained in the word lexicon of the target language.
  - 3. The method of claim 1, wherein retrieving language locale for the word includes parsing metadata associated with a word to determine a language locale and corresponding language associated with the word.
  - 4. The method of claim 1, wherein retrieving language locale for the word includes comparing the word to one or more databases including language locale information about the word.
  - 5. The method of claim 1, wherein retrieving language locale for the word includes passing the word to a database of information about words for finding a language locale for the word.
  - 6. The method of claim 1, wherein prior to mapping phonemes comprising the word according to the incoming language to corresponding phonemes associated with the target language, further comprising:
    - retrieving a word lexicon associated with the incoming language and a language-to-speech (LTS) rules set associated with the incoming language, and retrieving a word lexicon associated with the target language and an LTS rules set associated with the target language; and
      
      determining from the word lexicon and LTS rules sets associated with each of the incoming language and the target language how to map phonemes from the incoming language to the target language.
  - 7. The method of claim 1, wherein passing an output of the mapping of phonemes comprising the word according to the incoming language to corresponding phonemes associated with the target language to the pronunciation system for converting the word to speech or for recognizing an audible form of the word, includes passing the mapping to a text-to-speech system operative to convert text to speech for generating an audible output from the mapping.
  - 8. The method of claim 1, wherein passing an output of the mapping of phonemes comprising the word according to the incoming language to corresponding phonemes associated with the target language to the pronunciation system for converting the word to speech or for recognizing an audible form of the word, includes passing the mapping to a speech recognition system operative to recognize audible input corresponding to the mapping.

9. A tangible computer readable storage medium containing computer executable instructions which when executed by a computer perform a method of correcting pronunciation generation of a language pronunciation system, comprising:
- receiving a word according to an incoming language requiring electronic pronunciation according to a target language;
  
  determining whether the word requiring electronic pronunciation is a word of the target language;
  
  if the word requiring electronic pronunciation is not a word of the target language, retrieving language locale for the word;
  
  determining whether a language locale for the word matches a language locale for a pronunciation system responsible for converting the word to speech or recognizing a spoken form of the word;
  
  if a language locale for the word matches a language locale for a pronunciation system responsible for converting the word to speech or for recognizing an audible form of the word, applying a letter-to-speech (LTS) rules system associated with the target language to the word for generating an audible form of the word according to the LTS rules system;
  
  passing an output of the application of the LTS rules associated with the target language to the word to the pronunciation system for converting the word to speech or for recognizing an audible form of the word;
  
  generating a number of phoneme mapping tables, the phoneme mapping tables having dimensions m by n, where m is a number of phonemes in a source language and n is a number of phonemes in the target language;
  
  if a language locale for the word does not match a language locale for a pronunciation system responsible for converting the word to speech or for recognizing an audible form of the word, mapping phonemes comprising the word according to the incoming language to corresponding phonemes associated with the target language; and
  
  passing an output of the mapping of phonemes comprising the word according to the incoming language to corresponding phonemes associated with the target language to the pronunciation system for converting the word to speech or for recognizing an audible form of the word.
- View Dependent Claims (10, 11)
- - 10. The tangible computer readable storage medium of claim 9, wherein passing an output of the application of the LTS rules associated with the target language to the word to the pronunciation system for converting the word to speech or for recognizing an audible form of the word, includes passing the output to a speech recognition system operative to recognize audible input corresponding to the application of the LTS rules.
  - 11. The tangible computer readable storage medium of claim 9, wherein passing an output of the application of the LTS rules associated with the target language to the word to the pronunciation system for converting the word to speech or for recognizing an audible form of the word, includes passing the output to a text-to-speech system operative to convert text to speech for generating an audible output from the application of the LTS rules.

12. A tangible computer readable storage medium containing computer executable instructions which when executed by a computer perform a method of correcting pronunciation generation of a language pronunciation system, comprising:
- receiving a word according to an incoming language requiring electronic pronunciation according to a target language;
  
  determining whether the word requiring electronic pronunciation is a word of the target language;
  
  if the word requiring electronic pronunciation is not a word of the target language, retrieving language locale for the word;
  
  determining whether a language locale for the word matches a language locale for a pronunciation system responsible for converting the word to speech or recognizing a spoken form of the word;
  
  generating a number of phoneme mapping tables, the number of phoneme mapping tables being governed by N=L²−
  
  L, wherein N comprises the number of phoneme mapping tables and L comprises a number of the language locales between which translation is accomplished, each of the language locales comprising a country known to speak a foreign language;
  
  if a language locale for the word does not match a language locale for a pronunciation system responsible for converting the word to speech or for recognizing an audible form of the word, mapping phonemes comprising the word according to the incoming language to corresponding phonemes associated with the target language; and
  
  passing an output of the mapping of phonemes comprising the word according to the incoming language to corresponding phonemes associated with the target language to the pronunciation system for converting the word to speech or for recognizing an audible form of the word.
- View Dependent Claims (13, 14, 15, 16, 17, 18, 19)
- - 13. The tangible computer readable storage medium of claim 12, wherein determining whether the word requiring electronic pronunciation is a word of the target language includes passing the word to a word lexicon associated with the target language to determine whether the word is contained in the word lexicon of the target language.
  - 14. The tangible computer readable storage medium of claim 12, wherein retrieving language locale for the word includes parsing metadata associated with a word to determine a language locale and corresponding language associated with the word.
  - 15. The tangible computer readable storage medium of claim 12, wherein retrieving language locale for the word includes comparing the word to one or more databases including language locale information about the word.
  - 16. The tangible computer readable storage medium of claim 12, wherein retrieving language locale for the word includes passing the word to a database of information about words for finding a language locale for the word.
  - 17. The tangible computer readable storage medium of claim 12, wherein prior to mapping phonemes comprising the word according to the incoming language to corresponding phonemes associated with the target language, further comprising:
    - retrieving a word lexicon associated with the incoming language and a language- to-speech (LTS) rules set associated with the incoming language, and retrieving a word lexicon associated with the target language and an LTS rules set associated with the target language; and
      
      determining from the word lexicon and LTS rules sets associated with each of the incoming language and the target language how to map phonemes from the incoming language to the target language.
  - 18. The tangible computer readable storage medium of claim 12, wherein passing an output of the mapping of phonemes comprising the word according to the incoming language to corresponding phonemes associated with the target language to the pronunciation system for converting the word to speech or for recognizing an audible form of the word, includes passing the mapping to a text-to-speech system operative to convert text to speech for generating an audible output from the mapping.
  - 19. The tangible computer readable storage medium of claim 12, wherein passing an output of the mapping of phonemes comprising the word according to the incoming language to corresponding phonemes associated with the target language to the pronunciation system for converting the word to speech or for recognizing an audible form of the word, includes passing the mapping to a speech recognition system operative to recognize audible input corresponding to the mapping.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Etezadi, Cameron Ali, Sharpe, Timothy David
Primary Examiner(s)
Armstrong, Angela A

Application Number

US11/824,491
Publication Number

US 20090006097A1
Time in Patent Office

1,936 Days
Field of Search

704/8, 704/260, 704/277, 704/231
US Class Current

704/260
CPC Class Codes

G10L 13/08 Text analysis or generation...

Pronunciation correction of text-to-speech systems between different spoken languages

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

19 Claims

Specification

Solutions

Use Cases

Quick Links

Pronunciation correction of text-to-speech systems between different spoken languages

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

19 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links