New-word pronunciation learning using a pronunciation graph
First Claim
1. A computer-readable storage medium having computer-executable instructions stored thereon that when executed by a computer cause the computer to perform steps comprising:
- generating a set of syllable-like units using mutual information before decoding a speech signal to identify a sequence of syllable-like units;
generating a speech-based phonetic description of a word without reference to the text of the word by decoding a speech signal representing the user'"'"'s pronunciation of the word to generate the speech-based phonetic description of the word, wherein decoding a speech signal comprises identifying a sequence of syllable-like units from the speech signal;
generating a text-based phonetic description of the word based on the text of the word;
aligning the speech-based phonetic description and the text-based phonetic description on a phone-by-phone basis to form a single graph; and
selecting a phonetic description from the single graph.
2 Assignments
0 Petitions
Accused Products
Abstract
A method and computer-readable medium convert the text of a word and a user'"'"'s pronunciation of the word into a phonetic description to be added to a speech recognition lexicon. Initially, a plurality of at least two possible phonetic descriptions are generated. One phonetic description is formed by decoding a speech signal representing a user'"'"'s pronunciation of the word. At least one other phonetic description is generated from the text of the word. The plurality of possible sequences comprising speech-based and text-based phonetic descriptions are aligned and scored in a single graph based on their correspondence to the user'"'"'s pronunciation. The phonetic description with the highest score is then selected for entry in the speech recognition lexicon.
-
Citations
23 Claims
-
1. A computer-readable storage medium having computer-executable instructions stored thereon that when executed by a computer cause the computer to perform steps comprising:
-
generating a set of syllable-like units using mutual information before decoding a speech signal to identify a sequence of syllable-like units; generating a speech-based phonetic description of a word without reference to the text of the word by decoding a speech signal representing the user'"'"'s pronunciation of the word to generate the speech-based phonetic description of the word, wherein decoding a speech signal comprises identifying a sequence of syllable-like units from the speech signal; generating a text-based phonetic description of the word based on the text of the word; aligning the speech-based phonetic description and the text-based phonetic description on a phone-by-phone basis to form a single graph; and selecting a phonetic description from the single graph. - View Dependent Claims (2, 3, 4)
-
-
5. A computer-readable storage medium having computer-executable instructions stored thereon that when executed by a computer cause the computer to perform steps comprising:
-
receiving text of a word for which a phonetic pronunciation is to be added to a speech recognition lexicon; receiving a representation of a speech signal produced by a person pronouncing the word; converting the text of the word into at least one text-based phonetic sequence of phonetic units; generating a speech-based phonetic sequence of phonetic units from the representation of the speech signal; placing the phonetic units of the at least one text-based phonetic sequence and the speech-based phonetic sequence in a search structure that allows for transitions between phonetic units in the text-based phonetic sequence and phonetic units in the speech-based phonetic description; and selecting a phonetic pronunciation from the search structure, wherein the selected phonetic pronunciation comprises phonetic units of the speech-based phonetic sequence that differ from phonetic units of the at least one text-based phonetic sequence and phonetic units other than phonetic units of the speech-based phonetic sequence. - View Dependent Claims (6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A method for adding an acoustic description of a word to a speech recognition lexicon, the method comprising:
-
generating a text-based phonetic description based on the text of a word; generating a speech-based phonetic description without reference to the text of the word; aligning the text-based phonetic description and the speech based phonetic description in a structure, the structure comprising paths representing phonetic units, at least one path for a phonetic unit from the text-based phonetic description being connected to a path for a phonetic unit from the speech-based phonetic description; selecting a sequence of paths through the structure; and generating the acoustic description of the word based on the selected sequence of paths wherein the acoustic description comprises a phonetic unit found in the speech-based phonetic description but not in the text-based phonetic description and a second phonetic unit found in the text-based phonetic description but not in the speech-based phonetic description. - View Dependent Claims (16, 17, 18, 19, 20, 21, 22, 23)
-
Specification