Annotating phonemes and accents for text-to-speech system

US 8,751,235 B2
Filed: 08/03/2009
Issued: 06/10/2014
Est. Priority Date: 07/12/2005
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method for processing an input text, the input text comprising an input character string, the method comprising acts of:

identifying a first segmentation of the input character string, the first segmentation forming a first candidate sequence of words corresponding to the input character string, wherein the first candidate sequence of words comprises at least one first word having at least one character and a first pronunciation;

determining, based at least in part on statistical information regarding phonemes and/or accents for pronouncing character strings, a first occurrence probability for the first candidate sequence of words, wherein the statistical information comprises information indicative of a frequency at which the at least one character is associated with the first pronunciation;

identifying a second segmentation of the input character string, the second segmentation being different from the first segmentation and forming a second candidate sequence of words corresponding to the input character string, wherein the second candidate sequence of words comprises at least one second word having the same at least one character as the first word but a second pronunciation that is different from the first pronunciation of the first word;

determining, based at least in part on the statistical information regarding phonemes and/or accents for pronouncing character strings, a second occurrence probability for the second candidate sequence of words, wherein the statistical information further comprises information indicative of a frequency at which the at least one character is associated with the second pronunciation; and

selecting, based at least in part on the first and second occurrence probabilities, a selected sequence of words from a plurality of candidate sequences of words comprising the first and second candidate sequences of words.

View all claims

6 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system that outputs phonemes and accents of texts. The system has a storage section storing a first corpus in which spellings, phonemes, and accents of a text input beforehand are recorded separately for individual segmentations of the words that are contained in the text. A text for which phonemes and accents are to be output is acquired and the first corpus is searched to retrieve at least one set of spellings that match the spellings in the text from among sets of contiguous spellings. Then, the combination of a phoneme and an accent that has a higher probability of occurrence in the first corpus than a predetermined reference probability is selected as the phonemes and accent of the text.

Citations

30 Claims

1. A computer-implemented method for processing an input text, the input text comprising an input character string, the method comprising acts of:
- identifying a first segmentation of the input character string, the first segmentation forming a first candidate sequence of words corresponding to the input character string, wherein the first candidate sequence of words comprises at least one first word having at least one character and a first pronunciation;
  
  determining, based at least in part on statistical information regarding phonemes and/or accents for pronouncing character strings, a first occurrence probability for the first candidate sequence of words, wherein the statistical information comprises information indicative of a frequency at which the at least one character is associated with the first pronunciation;
  
  identifying a second segmentation of the input character string, the second segmentation being different from the first segmentation and forming a second candidate sequence of words corresponding to the input character string, wherein the second candidate sequence of words comprises at least one second word having the same at least one character as the first word but a second pronunciation that is different from the first pronunciation of the first word;
  
  determining, based at least in part on the statistical information regarding phonemes and/or accents for pronouncing character strings, a second occurrence probability for the second candidate sequence of words, wherein the statistical information further comprises information indicative of a frequency at which the at least one character is associated with the second pronunciation; and
  
  selecting, based at least in part on the first and second occurrence probabilities, a selected sequence of words from a plurality of candidate sequences of words comprising the first and second candidate sequences of words.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The computer-implemented method of claim 1, wherein the input text is in a language in which word boundaries are not explicitly indicated.
  - 3. The computer-implemented method of claim 1, wherein at least one word in the selected sequence of words comprises at least one character string for the at least one word and pronunciation information for the at least one character string.
  - 4. The computer-implemented method of claim 3, wherein the pronunciation information for the at least one character string comprises a combination of at least one phoneme and at least one accent for the at least one character string, and wherein the method further comprises:
    - using the pronunciation information to generate synthetic speech corresponding to the input character string.
  - 5. The computer-implemented method of claim 3, wherein the at least one word further comprises part of speech information for the at least one character string.
  - 6. The computer-implemented method of claim 1, wherein the statistical information regarding phonemes and/or accents for pronouncing character strings comprises an occurrence probability for a combination of at least one phoneme and at least one accent for at least one character string.
  - 7. The computer-implemented method of claim 6, wherein the occurrence probability for the combination of the at least one phoneme and the at least one accent for the at least one character string is conditioned upon the at least one character string occurring in a particular context, the particular context comprising one or more particular words preceding the at least one character string and/or one or more particular words following the at least one character string.
  - 8. The computer-implemented method of claim 1, wherein the selected sequence of words is the first candidate sequence of words, and wherein the first candidate sequence of words is selected at least in part because the first occurrence probability is higher than the second occurrence probability.
  - 9. The computer-implemented method of claim 1, wherein the selected sequence of words is the first candidate sequence of words, and wherein the first candidate sequence of words is selected at least in part because the first occurrence probability is higher than a reference probability.
  - 10. The computer-implemented method of claim 1, wherein the at least one first word is preceded in the first candidate sequence of words by at least one third word, and wherein the frequency at which the at least one character is associated with the first pronunciation comprises a frequency at which the at least one character is associated with the first pronunciation given that the at least one character is preceded by the at least one third word.

11. A computer system for processing an input text, the input text comprising an input character string, the computer system comprising at least one processor programmed to:
- identify a first segmentation of the input character string, the first segmentation forming a first candidate sequence of words corresponding to the input character string, wherein the first candidate sequence of words comprises at least one first word having at least one character and a first pronunciation;
  
  determine, based at least in part on statistical information regarding phonemes and/or accents for pronouncing character strings, a first occurrence probability for the first candidate sequence of words, wherein the statistical information comprises information indicative of a frequency at which the at least one character is associated with the first pronunciation;
  
  identify a second segmentation of the input character string, the second segmentation being different from the first segmentation and forming a second candidate sequence of words corresponding to the input character string, wherein the second candidate sequence of words comprises at least one second word having the same at least one character as the first word but a second pronunciation that is different from the first pronunciation of the first word;
  
  determine, based at least in part on the statistical information regarding phonemes and/or accents for pronouncing character strings, a second occurrence probability for the second candidate sequence of words, wherein the statistical information further comprises information indicative of a frequency at which the at least one character is associated with the second pronunciation; and
  
  select, based at least in part on the first and second occurrence probabilities, a selected sequence of words from a plurality of candidate sequences of words comprising the first and second candidate sequences of words.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
- - 12. The computer system of claim 11, wherein the input text is in a language in which word boundaries are not explicitly indicated.
  - 13. The computer system of claim 11, wherein at least one word in the selected sequence of words comprises at least one character string for the at least one word and pronunciation information for the at least one character string.
  - 14. The computer system of claim 13, wherein the pronunciation information for the at least one character string comprises a combination of at least one phoneme and at least one accent for the at least one character string, and wherein the at least one processor is further programmed to:
    - use the pronunciation information to generate synthetic speech corresponding to the input character string.
  - 15. The computer system of claim 13, wherein the at least one word further comprises part of speech information for the at least one character string.
  - 16. The computer system of claim 11, wherein the statistical information regarding phonemes and/or accents for pronouncing character strings comprises an occurrence probability for a combination of at least one phoneme and at least one accent for at least one character string.
  - 17. The computer system of claim 16, wherein the occurrence probability for the combination of the at least one phoneme and the at least one accent for the at least one character string is conditioned upon the at least one character string occurring in a particular context, the particular context comprising one or more particular words preceding the at least one character string and/or one or more particular words following the at least one character string.
  - 18. The computer system of claim 11, wherein the selected sequence of words is the first candidate sequence of words, and wherein the first candidate sequence of words is selected at least in part because the first occurrence probability is higher than the second occurrence probability.
  - 19. The computer system of claim 11, wherein the selected sequence of words is the first candidate sequence of words, and wherein the first candidate sequence of words is selected at least in part because the first occurrence probability is higher than a reference probability.
  - 20. The computer system of claim 11, wherein the at least one first word is preceded in the first candidate sequence of words by at least one third word, and wherein the frequency at which the at least one character is associated with the first pronunciation comprises a frequency at which the at least one character is associated with the first pronunciation given that the at least one character is preceded by the at least one third word.

21. An article of manufacture comprising a computer-readable storage medium encoded with computer code for execution on at least one processor in a system, the computer code, when executed on the at least one processor, performing a method for processing an input text, the input text comprising an input character string, the method comprising acts of:
- identifying a first segmentation of the input character string, the first segmentation forming a first candidate sequence of words corresponding to the input character string, wherein the first candidate sequence of words comprises at least one first word having at least one character and a first pronunciation;
  
  determining, based at least in part on statistical information regarding phonemes and/or accents for pronouncing character strings, a first occurrence probability for the first candidate sequence of words, wherein the statistical information comprises information indicative of a frequency at which the at least one character is associated with the first pronunciation;
  
  identifying a second segmentation of the input character string, the second segmentation different from the first segmentation and forming a second candidate sequence of words corresponding to the input character string, wherein the second candidate sequence of words comprises at least one second word having the same at least one character as the first word but a second pronunciation that is different from the first pronunciation of the first word;
  
  determining, based at least in part on the statistical information regarding phonemes and/or accents for pronouncing character strings, a second occurrence probability for the second candidate sequence of words, wherein the statistical information further comprises information indicative of a frequency at which the at least one character is associated with the second pronunciation; and
  
  selecting, based at least in part on the first and second occurrence probabilities, a selected sequence of words from a plurality of candidate sequences of words comprising the first and second candidate sequences of words.
- View Dependent Claims (22, 23, 24, 25, 26, 27, 28, 29, 30)
- - 22. The article of manufacture of claim 21, wherein the input text is in a language in which word boundaries are not explicitly indicated.
  - 23. The article of manufacture of claim 21, wherein at least one word in the selected sequence of words comprises at least one character string for the at least one word and pronunciation information for the at least one character string.
  - 24. The article of manufacture of claim 23, wherein the pronunciation information for the at least one character string comprises a combination of at least one phoneme and at least one accent for the at least one character string, and wherein the method further comprises:
    - using the pronunciation information to generate synthetic speech corresponding to the input character string.
  - 25. The article of manufacture of claim 23, wherein the at least one word is further associated with part of speech information for the at least one character string.
  - 26. The article of manufacture of claim 21, wherein the statistical information regarding phonemes and/or accents for pronouncing character strings comprises an occurrence probability for a combination of at least one phoneme and at least one accent for at least one character string.
  - 27. The article of manufacture of claim 26, wherein the occurrence probability for the combination of the at least one phoneme and the at least one accent for the at least one character string is conditioned upon the at least one character string occurring in a particular context, the particular context comprising one or more particular words preceding the at least one character string and/or one or more particular words following the at least one character string.
  - 28. The article of manufacture of claim 21, wherein the selected sequence of words is the first candidate sequence of words, and wherein the first candidate sequence of words is selected at least in part because the first occurrence probability is higher than the second occurrence probability.
  - 29. The article of manufacture of claim 21, wherein the selected sequence of words is the first candidate sequence of words, and wherein the first candidate sequence of words is selected at least in part because the first occurrence probability is higher than a reference probability.
  - 30. The article of manufacture of claim 21, wherein the at least one first word is preceded in the first candidate sequence of words by at least one third word, and wherein the frequency at which the at least one character is associated with the first pronunciation comprises a frequency at which the at least one character is associated with the first pronunciation given that the at least one character is preceded by the at least one third word.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Cerence Operating Company (Cerence Inc.)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Nishimura, Masafumi, Mori, Shinsuke, Nagano, Toru
Primary Examiner(s)
He, Jialong

Application Number

US12/534,808
Publication Number

US 20100030561A1
Time in Patent Office

1,772 Days
Field of Search

704258-269
US Class Current

704/258
CPC Class Codes

G10L 13/04   Details of speech synthesis...

G10L 13/08   Text analysis or generation...

G10L 13/086   Detection of language

G10L 13/10   Prosody rules derived from ...

Annotating phonemes and accents for text-to-speech system

First Claim

6 Assignments

0 Petitions

Accused Products

Abstract

Citations

30 Claims

Specification

Solutions

Use Cases

Quick Links

Annotating phonemes and accents for text-to-speech system

First Claim

6 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

30 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links