Statistical pronunciation model for text to speech
First Claim
Patent Images
1. A method, comprising:
- establishing at least one statistical pronounciation model based on annotated training data;
receiving input text;
determining a pronounciation of a word in the input text based on zero or more of the statistical pronounciation models; and
synthesizing speech signal corresponding to the input text through synthesizing the acoustic signal of each word in the input text using the pronounciation determined in said determining.
1 Assignment
0 Petitions
Accused Products
Abstract
An arrangement is provided for speech synthesis using statistical pronunciation models established based on annotated training data. When input text is received, pronunciations of words in the input text are determined based on the use of relevant statistical pronunciation models. The speech signal corresponding to the input text is then synthesized using the determined pronunciations.
174 Citations
29 Claims
-
1. A method, comprising:
-
establishing at least one statistical pronounciation model based on annotated training data;
receiving input text;
determining a pronounciation of a word in the input text based on zero or more of the statistical pronounciation models; and
synthesizing speech signal corresponding to the input text through synthesizing the acoustic signal of each word in the input text using the pronounciation determined in said determining. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A method to establish a statistical pronounciation model, comprising:
-
retrieving annotated training data wherein words are annotated in terms of their pronounciations taking into acount of context of the words;
performing statistical analysis of the annotated training data with respect to the context; and
building a statistical pronounciation model for each pronounciation of the annotated words in the annotated training data based on the statistical analysis. - View Dependent Claims (7, 8)
-
-
9. A method to synthesizing speech data, comprising:
-
receiving input text;
analyzing the input text to identify contextual features of words in the input text;
determining a pronounciation of each word according to a statistical pronounciation model of the word relevant to the contextual features of the word; and
synthesizing acoustic signal of the word based on the pronounciation. - View Dependent Claims (10, 11)
-
-
12. A system, comprising:
-
a statistical pronounciation modeling mechanism for establishing at least one statistical pronounciation model based on annotated training data; and
a speech synthesis mechanism for synthesizing speech from input text based on the statistical pronounciation models. - View Dependent Claims (13, 14, 15)
-
-
16. A statistical pronounciation modeling mechanism, comprising:
-
a context sensitive pronounciation annotation mechanism for generating annotated training data in which words are annotated with their pronounciations; and
a statistical pronounciation model generation mechanism for creating statistical pronounciation models based on the annotated training data. - View Dependent Claims (17, 18)
-
-
19. A speech synthesis mechanism, comprising:
-
a text processing mechanism for processing the input text to identify contextual features;
a pronounciation determiner for determining a pronounciation of each word in the input text according to a statistical pronounciation model and the pronounciation rules relevant to the contextual features; and
a text to speech engine for producing acoustic signal for each word in the input text using the pronounciation of each word, retrieved from a dictionary, to generate the speech of the input text. - View Dependent Claims (20)
-
-
21. A machine-accessible medium encoded with data, the data, when accessed, causing:
-
establishing at least one statistical pronounciation model based on annotated training data;
receiving input text;
determining a pronounciation of a word in the input text based on at least some of the statistical pronounciation models; and
synthesizing speech signal corresponding to the input text through synthesizing the acoustic signal of each word in the input text using the pronounciation determined in said determining. - View Dependent Claims (22, 23)
-
-
24. A machine-accessible medium encoded with data for establishing a statistical pronounciation model, the data, when accessed, causing:
-
retrieving annotated training data wherein words are annotated in terms of their pronounciations taking into acount of context of the words;
performing statistical analysis of the annotated training data with respect to the context; and
building a statistical pronounciation model for each pronounciation of the annotated words in the annotated training data based on the statistical analysis. - View Dependent Claims (25, 26)
-
-
27. A machine-accessible medium encoded with data for synthesizing speech data, the data, when accessed, causing:
-
receiving input text;
analyzing the input text to identify contextual features of words in the input text;
determining a pronounciation of each word according to a statistical pronounciation model of the word relevant to the contextual features of the word; and
synthesizing acoustic signal of the word based on the pronounciation. - View Dependent Claims (28, 29)
-
Specification