Acoustic model training
First Claim
Patent Images
1. An acoustic modeling system, comprising:
- under control of one or more computing devices configured with specific computer-executable instructions,receiving a plurality of characteristics of utterances to be used to create an acoustic model;
for each characteristic in the plurality of characteristics;
identifying an utterance within a corpus of utterances having the characteristic; and
associating at least a portion of the utterance with a tag indicative of the characteristic;
receiving an identification of a desired training utterance, wherein the desired training utterance comprises a first portion associated with a first desired characteristic and a second portion associated with a second desired characteristic, and wherein the desired training utterance is not included in the corpus;
selecting, from the corpus, a first utterance,wherein a portion of the first utterance comprises at least the first portion of the desired training utterance, andwherein the portion of the first utterance is associated with a tag corresponding to the first desired characteristic;
extracting the portion of the first utterance from the first utterance;
selecting, from the corpus, a second utterance,wherein a portion of the second utterance comprises at least the second portion of the desired training utterance, andwherein the portion of the second utterance is associated with a tag corresponding to the second desired characteristic;
extracting the portion of the second utterance from the second utterance;
concatenating the portion of the first utterance with the portion of the second utterance to generate the desired training utterance; and
training an acoustic model, wherein;
the acoustic model comprises statistical representations of possible sounds of subword units; and
the statistical representations are generated based on a comparison between audio data associated with the desired training utterance that is generated and a textual transcription of the desired training utterance that is generated.
1 Assignment
0 Petitions
Accused Products
Abstract
Features are disclosed for generating acoustic models from an existing corpus of data. Methods for generating the acoustic models can include receiving at least one characteristic of a desired acoustic model, selecting training utterances corresponding to the characteristic from a corpus comprising audio data and corresponding transcription data, and generating an acoustic model based on the selected training utterances.
40 Citations
26 Claims
-
1. An acoustic modeling system, comprising:
under control of one or more computing devices configured with specific computer-executable instructions, receiving a plurality of characteristics of utterances to be used to create an acoustic model; for each characteristic in the plurality of characteristics; identifying an utterance within a corpus of utterances having the characteristic; and associating at least a portion of the utterance with a tag indicative of the characteristic; receiving an identification of a desired training utterance, wherein the desired training utterance comprises a first portion associated with a first desired characteristic and a second portion associated with a second desired characteristic, and wherein the desired training utterance is not included in the corpus; selecting, from the corpus, a first utterance, wherein a portion of the first utterance comprises at least the first portion of the desired training utterance, and wherein the portion of the first utterance is associated with a tag corresponding to the first desired characteristic; extracting the portion of the first utterance from the first utterance; selecting, from the corpus, a second utterance, wherein a portion of the second utterance comprises at least the second portion of the desired training utterance, and wherein the portion of the second utterance is associated with a tag corresponding to the second desired characteristic; extracting the portion of the second utterance from the second utterance; concatenating the portion of the first utterance with the portion of the second utterance to generate the desired training utterance; and training an acoustic model, wherein; the acoustic model comprises statistical representations of possible sounds of subword units; and the statistical representations are generated based on a comparison between audio data associated with the desired training utterance that is generated and a textual transcription of the desired training utterance that is generated. - View Dependent Claims (2, 3, 4, 5)
-
6. A computer-implemented method, comprising:
under control of one or more computing devices configured with specific computer-executable instructions, receiving an identification of a desired training utterance having a first portion and a second portion, wherein the desired training utterance is not included in a corpus; selecting, from the corpus, a first utterance comprising at least a first portion of the desired training utterance; extracting, from the first utterance, a portion of the first utterance comprising the first portion of the desired training utterance, wherein the first portion of the desired training utterance and at least the portion of the first utterance are associated with a first desired characteristic; selecting, from the corpus, a second utterance comprising at least a second portion of the desired training utterance; extracting, from the second utterance, a portion of the second utterance comprising the second portion of the desired training utterance, wherein the second portion of the desired training utterance and at least the portion of the second utterance are associated with a second desired characteristic; concatenating at least the portion of the first utterance and the portion of the second utterance to generate the desired training utterance; and training an acoustic model using the desired training utterance that is generated. - View Dependent Claims (7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
-
18. A system comprising:
-
an electronic data store configured to store a corpus of audio data and corresponding transcription data; and at least one computing device in communication with the electronic data store and configured to; receive an identification of a desired training utterance having a first portion and a second portion, wherein the desired training utterance is not included in the corpus; select, from the corpus, a first utterance comprising at least a first portion of the desired training utterance; extract, from the first utterance, a portion of the first utterance comprising the first portion of the desired training utterance, wherein the first portion of the desired training utterance and at least the portion of the first utterance are associated with a first desired characteristic; select, from the corpus, a second utterance comprising a second portion of the desired training utterance; extract, from the second utterance, a portion of the second utterance comprising the second portion of the desired training utterance, wherein the second portion of the desired training utterance and at least the portion of the second utterance are associated with a second desired characteristic; concatenating at least the portion of the first utterance and the portion of the second utterance to generate the desired training utterance; and training an acoustic model using the desired training utterance that is generated. - View Dependent Claims (19, 20, 21)
-
-
22. A non-transitory computer-readable medium comprising one or more computer-executable modules, the one or more computer-executable modules configured to:
-
receive an identification of a desired training utterance having a first portion and a second portion, wherein the desired training utterance is not included in a corpus; select, from the corpus, a first utterance comprising at least a first portion of the desired training utterance; extract, from the first utterance, a portion of the first utterance comprising the first portion of the desired training utterance, wherein the first portion of the desired training utterance and at least the portion of the first utterance are associated with a first desired characteristic; select, from the corpus, a second utterance comprising at least a second portion of the desired training utterance; extract, from the second utterance, a portion of the second utterance comprising the second portion of the desired training utterance, wherein the second portion of the desired training utterance and at least the portion of the second utterance are associated with a second desired characteristic; concatenating at least the portion of the first utterance the portion of the second utterance to generate the desired training utterance; and train an acoustic model using the desired training utterance that is generated. - View Dependent Claims (23, 24, 25, 26)
-
Specification