STOCHASTIC PHONEME AND ACCENT GENERATION USING ACCENT CLASS

US 20100125459A1
Filed: 07/01/2009
Published: 05/20/2010
Est. Priority Date: 11/18/2008
Status: Abandoned Application

First Claim

Patent Images

1. (canceled)

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Exemplary embodiments provide for determining a sequence of words in a TTS system. An input text is analyzed using two models, a word n-gram model and an accent class n-gram model. A list of all possible words for each word in the input is generated for each model. Each word in each list for each model is given a score based on the probability that the word is the correct word in the sequence, based on the particular model. The two lists are combined and the two scores are combined for each word. A set of sequences of words are generated. Each sequence of words comprises a unique combination of an attribute and associated word for each word in the input. The combined score of each of word in the sequence of words is combined. A sequence of words having the highest score is selected and presented to a user.

33 Citations

View as Search Results

13 Claims

1. (canceled)

2. A method for selecting a sequence of words for text-to-speech synthesis, the method comprising:
- receiving an input comprising a set of words;
  
  determining a first list of potential word types for each of the words in the set of words;
  
  assigning a first score to each potential word type in each list of potential word types based on the likelihood the corresponding word type is correct;
  
  determining a second list of potential word parameters for each of the words in the set of words;
  
  assigning a second score to each potential word parameter in each list of potential word parameters based on the likelihood the corresponding word parameter is correct;
  
  forming a plurality of pairs for each word in the set of words, each pair comprising a unique pair of word type and word parameter from the first list and the second list for the corresponding word;
  
  forming a plurality of word sequences, each word sequence comprising the set of words combined with unique combinations of pairs for each word in the word sequence;
  
  scoring each word sequence by combining the first score and the second score for each pair and summing the combined scores over each unique combination of pairs for each of the plurality of word sequences; and
  
  selecting the word sequence with the highest score as the correct word sequence.
- View Dependent Claims (3, 4, 5)
- - 3. The method of claim 2, wherein the potential word types are parts of speech.
  - 4. The method of claim 2, wherein the potential word parameters are accents.
  - 5. The method of claim 2, further comprising performing text-to-speech on the selected word sequence.

6. At least one computer readable storage medium storing instructions that, when executed on at least one processor, performs a method for selecting a sequence of words for text-to-speech synthesis, the method comprising:
- receiving an input comprising a set of words;
  
  determining a first list of potential word types for each of the words in the set of words;
  
  assigning a first score to each potential word type in each list of potential word types based on the likelihood the corresponding word type is correct;
  
  determining a second list of potential word parameters for each of the words in the set of words;
  
  assigning a second score to each potential word parameter in each list of potential word parameters based on the likelihood the corresponding word parameter is correct;
  
  forming a plurality of pairs for each word in the set of words, each pair comprising a unique pair of word type and word parameter from the first list and the second list for the corresponding word;
  
  forming a plurality of word sequences, each word sequence comprising the set of words combined with unique combinations of pairs for each word in the word sequence;
  
  scoring each word sequence by combining the first score and the second score for each pair and summing the combined scores over each unique combination of pairs for each of the plurality of word sequences; and
  
  selecting the word sequence with the highest score as the correct word sequence.
- View Dependent Claims (7, 8, 9)
- - 7. The least one computer readable storage medium of claim 6, wherein the potential word types are parts of speech.
  - 8. The least one computer readable storage medium of claim 6, wherein the potential word parameters are accents.
  - 9. The least one computer readable storage medium of claim 6, further comprising performing text-to-speech on the selected word sequence.

10. A system for selecting a sequence of words for text-to-speech synthesis, the method comprising:
- at least one input for receiving an input comprising a set of words; and
  
  at least one computer configured to determine a first list of potential word types for each of the words in the set of words, assign a first score to each potential word type in each list of potential word types based on the likelihood the corresponding word type is correct, determine a second list of potential word parameters for each of the words in the set of words, assign a second score to each potential word parameter in each list of potential word parameters based on the likelihood the corresponding word parameter is correct, form a plurality of pairs for each word in the set of words, each pair comprising a unique pair of word type and word parameter from the first list and the second list for the corresponding word, form a plurality of word sequences, each word sequence comprising the set of words combined with unique combinations of pairs for each word in the word sequence, score each word sequence by combining the first score and the second score for each pair and summing the combined scores over each unique combination of pairs for each of the plurality of word sequences, and select the word sequence with the highest score as the correct word sequence.
- View Dependent Claims (11, 12, 13)
- - 11. The system of claim 10, wherein the potential word types are parts of speech.
  - 12. The system of claim 10, wherein the potential word parameters are accents.
  - 13. The system of claim 10, further comprising performing text-to-speech on the selected word sequence.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Nagano, Tohru, Itoh, Nobuyasu, Tachibana, Ryuki, Nishimura, Masafumi

Application Number

US12/496,366
Publication Number

US 20100125459A1
Time in Patent Office

Days
Field of Search
US Class Current

704/260
CPC Class Codes

G10L 13/08 Text analysis or generation...

STOCHASTIC PHONEME AND ACCENT GENERATION USING ACCENT CLASS

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

33 Citations

13 Claims

Specification

Solutions

Use Cases

Quick Links

STOCHASTIC PHONEME AND ACCENT GENERATION USING ACCENT CLASS

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

33 Citations

13 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links