SPEECH RECOGNITION USING VARIABLE-LENGTH CONTEXT
First Claim
1. A system comprising:
- one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising;
receiving speech data and data indicating a candidate transcription for the speech data;
accessing a phonetic representation for the candidate transcription;
extracting, from the phonetic representation, multiple test sequences for a particular phone in the phonetic representation, each of the multiple test sequences including a different set of contextual phones surrounding the particular phone;
receiving data indicating that an acoustic model includes data corresponding to one or more of the multiple test sequences;
selecting, from among the one or more test sequences for which the acoustic model includes data, the test sequence that includes the highest number of contextual phones;
accessing data from the acoustic model corresponding to the selected test sequence; and
generating a score for the candidate transcription based on the accessed data from the acoustic model that corresponds to the selected test sequence.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for recognizing speech using a variable length of context. Speech data and data identifying a candidate transcription for the speech data are received. A phonetic representation for the candidate transcription is accessed. Multiple test sequences are extracted for a particular phone in the phonetic representation. Each of the multiple test sequences includes a different set of contextual phones surrounding the particular phone. Data indicating that an acoustic model includes data corresponding to one or more of the multiple test sequences is received. From among the one or more test sequences, the test sequence that includes the highest number of contextual phones is selected. A score for the candidate transcription is generated based on the data from the acoustic model that corresponds to the selected test sequence.
29 Citations
20 Claims
-
1. A system comprising:
one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising; receiving speech data and data indicating a candidate transcription for the speech data; accessing a phonetic representation for the candidate transcription; extracting, from the phonetic representation, multiple test sequences for a particular phone in the phonetic representation, each of the multiple test sequences including a different set of contextual phones surrounding the particular phone; receiving data indicating that an acoustic model includes data corresponding to one or more of the multiple test sequences; selecting, from among the one or more test sequences for which the acoustic model includes data, the test sequence that includes the highest number of contextual phones; accessing data from the acoustic model corresponding to the selected test sequence; and generating a score for the candidate transcription based on the accessed data from the acoustic model that corresponds to the selected test sequence. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
15. A computer-implemented method, comprising:
-
receiving speech data and data identifying a candidate transcription for the speech data; accessing a phonetic representation for the candidate transcription; extracting, from the phonetic representation, multiple test sequences for a particular phone in the phonetic representation, each of the multiple test sequences including a different set of contextual phones surrounding the particular phone; determining that an acoustic model includes data corresponding to one or more of the multiple test sequences; selecting, from among the one or more test sequences for which the acoustic model includes data, the test sequence that includes the highest number of contextual phones; accessing data from the acoustic model corresponding to the selected test sequence; and generating a score for the candidate transcription based on the accessed data from the acoustic model that corresponds to the selected test sequence. - View Dependent Claims (16, 17, 18, 19)
-
-
20. A computer storage medium encoded with a computer program, the program comprising instructions that when executed by one or more computers cause the one or more computers to perform operations comprising:
-
receiving speech data and data identifying a candidate transcription for the speech data; accessing a phonetic representation for the candidate transcription; extracting, from the phonetic representation, multiple test sequences for a particular phone in the phonetic representation, each of the multiple test sequences including a different set of contextual phones surrounding the particular phone; determining that an acoustic model includes data corresponding to one or more of the multiple test sequences; selecting, from among the one or more test sequences for which the acoustic model includes data, the test sequence that includes the highest number of contextual phones; accessing data from the acoustic model corresponding to the selected test sequence; and generating a score for the candidate transcription based on the accessed data from the acoustic model that corresponds to the selected test sequence.
-
Specification