Training acoustic models using distributed computing techniques
First Claim
1. A system comprising:
- one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising;
receiving speech data and data identifying a transcription for the speech data;
accessing a phonetic representation for the transcription;
extracting training sequences from the phonetic representation for a particular phone in the phonetic representation, the training sequences comprising two or more training sequences that include (i) a particular sequence of multiple phones and (ii) a different number of contextual phones surrounding the particular phone;
identifying a partitioning key for the training sequences based on the particular sequence of multiple phones that occurs in the two or more training sequences;
selecting, from among a plurality of processing modules, a processing module to which the identified partitioning key is assigned, the processing module being designated to train a portion of an acoustic model that corresponds to the identified partitioning key; and
transmitting, to the selected processing module, (i) data identifying the training sequences and (ii) a portion of the speech data that corresponds to the training sequence that includes the most contextual phones.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training acoustic models. Speech data and data identifying a transcription for the speech data are received. A phonetic representation for the transcription is accessed. Training sequences are identified for a particular phone in the phonetic representation. Each of the training sequences includes a different set of contextual phones surrounding the particular phone. A partitioning key is identified based on a sequence of phones that occurs in each of the training sequences. A processing module to which the identified partitioning key is assigned is selected. Data identifying the training sequences and a portion of the speech data are transmitted to the selected processing module.
-
Citations
21 Claims
-
1. A system comprising:
one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising; receiving speech data and data identifying a transcription for the speech data; accessing a phonetic representation for the transcription; extracting training sequences from the phonetic representation for a particular phone in the phonetic representation, the training sequences comprising two or more training sequences that include (i) a particular sequence of multiple phones and (ii) a different number of contextual phones surrounding the particular phone; identifying a partitioning key for the training sequences based on the particular sequence of multiple phones that occurs in the two or more training sequences; selecting, from among a plurality of processing modules, a processing module to which the identified partitioning key is assigned, the processing module being designated to train a portion of an acoustic model that corresponds to the identified partitioning key; and transmitting, to the selected processing module, (i) data identifying the training sequences and (ii) a portion of the speech data that corresponds to the training sequence that includes the most contextual phones. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
-
18. A computer-implemented method, comprising:
-
receiving speech data and data identifying a transcription for the speech data; accessing a phonetic representation for the transcription; extracting training sequences from the phonetic representation for a particular phone in the phonetic representation, the training sequences comprising two or more training sequences that include (i) a particular sequence of multiple phones and (ii) a different number of contextual phones surrounding the particular phone; identifying a partitioning key for the training sequences based on the particular sequence of multiple phones that occurs in the two or more training sequences; selecting, from among a plurality of processing modules, a processing module to which the identified partitioning key is assigned, the processing module being designated to train a portion of an acoustic model that corresponds to the identified partitioning key; and transmitting, to the selected processing module, (i) data identifying the training sequences and (ii) a portion of the speech data that corresponds to the training sequence that includes the most contextual phones. - View Dependent Claims (19)
-
-
20. A non-transitory computer storage medium encoded with a computer program, the program comprising instructions that when executed by one or more computers cause the one or more computers to perform operations comprising:
-
receiving speech data and data identifying a transcription for the speech data; accessing a phonetic representation for the transcription; extracting training sequences from the phonetic representation for a particular phone in the phonetic representation, the training sequences comprising two or more training sequences that include (i) a particular sequence of multiple phones and (ii) a different number of contextual phones surrounding the particular phone; identifying a partitioning key for the training sequences based on the particular sequence of multiple phones that occurs in the two or more training sequences; selecting, from among a plurality of processing modules, a processing module to which the identified partitioning key is assigned, the processing module being designated to train a portion of an acoustic model that corresponds to the identified partitioning key; and transmitting, to the selected processing module, (i) data identifying the training sequences and (ii) a portion of the speech data that corresponds to the training sequence that includes the most contextual phones.
-
-
21. A system comprising:
one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising; assigning a plurality of partitioning keys to a plurality of processing modules, each partitioning key being assigned to only one of the plurality of processing modules, the partitioning keys corresponding to non-overlapping sets of phonetic sequences; receiving speech data for an utterance and data identifying a transcription for the utterance; accessing a phonetic representation for the transcription; extracting, for a particular phone in the phonetic representation, multiple training sequences from the phonetic representation, each of the multiple training sequences including (i) a particular sequence of multiple phones and (ii) a different number of contextual phones surrounding the particular phone, wherein the particular phone corresponds to a central position in each of the multiple training sequences; selecting, from among the plurality of assigned partitioning keys, a partitioning key that corresponds to each of the multiple training sequences based on a sequence of multiple phones that occurs in each of the multiple training sequences; selecting a processing module from among the plurality of processing modules based on the identified partitioning key, the selected processing module being designated to train a portion of an acoustic model corresponding to the identified partitioning key; identifying a portion of the speech data that corresponds to a training sequence of the multiple training sequences that includes the most contextual phones; and transmitting, to the selected processing module, (i) data identifying the training sequences and (ii) data indicating the portion of the speech data that corresponds to the training sequence that includes the most contextual phones.
Specification