Identifying mismatches between assumed and actual pronunciations of words
First Claim
1. A method of identifying mismatches between acoustic data and a corresponding transcription, the transcription being expressed in terms of basic units, the method comprising the steps of:
- (a) aligning the acoustic data with the corresponding transcription;
(b) computing a probability score for each instance of a basic unit in the acoustic data based upon an alignment of the acoustic data with the corresponding transcription;
(c) generating a distribution function on the probability score for the each instance of a basic unit;
(d) tagging, as mismatches, instances of a basic unit corresponding to a particular range of scores in the distribution for each basic unit based on a threshold value; and
(e) correcting the mismatches, wherein the acoustic data corresponds to training data provided during a training phase of a speech recognition system or to recognition data provided during a recognition phase of the speech recognition system, but not to adaptation data.
1 Assignment
0 Petitions
Accused Products
Abstract
A method of identifying mismatches between acoustic data and a corresponding transcription, the transcription being expressed in terms of basic units, comprises the steps of: aligning the acoustic data with the corresponding transcription; computing a probability score for each instance of a basic unit in the acoustic data with respect to the transcription; generating a distribution for each basic unit; tagging, as mismatches, instances of a basic unit corresponding to a particular range of scores in the distribution for each basic unit based on a threshold value; and correcting the mismatches.
39 Citations
19 Claims
-
1. A method of identifying mismatches between acoustic data and a corresponding transcription, the transcription being expressed in terms of basic units, the method comprising the steps of:
-
(a) aligning the acoustic data with the corresponding transcription;
(b) computing a probability score for each instance of a basic unit in the acoustic data based upon an alignment of the acoustic data with the corresponding transcription;
(c) generating a distribution function on the probability score for the each instance of a basic unit;
(d) tagging, as mismatches, instances of a basic unit corresponding to a particular range of scores in the distribution for each basic unit based on a threshold value; and
(e) correcting the mismatches, wherein the acoustic data corresponds to training data provided during a training phase of a speech recognition system or to recognition data provided during a recognition phase of the speech recognition system, but not to adaptation data. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
(a) computing log likelihoods of each instance of a basic unit; and
(b) normalizing the log likelihoods.
-
-
4. The method of claim 3, wherein the step of normalizing the log likelihoods includes computing average per-frame log likelihoods of each instance of a basic unit.
-
5. The method of claim 1, wherein the step of generating a distribution function includes forming a histogram of probability scores for each basic unit of the acoustic data.
-
6. The method of claim 1, wherein the tagging step includes:
-
(a) for each basic unit, determining whether the probability score of the basic unit is below the threshold value as compared to other instances of the same basic unit in the acoustic data; and
(b) if so, tagging the basic unit as a mismatch.
-
-
7. The method of claim 1, wherein the threshold value is a percentage.
-
8. The method of claim 1, further including the step of tagging a word containing an instance of a basic unit corresponding to a lowest score in the distribution for each basic phonetic unit.
-
9. The method of claim 8, wherein, for each tagged word, the correcting step includes:
-
(a) determining whether the transcription pertaining to the word is correct; and
(b)) if the word is incorrect, correcting the word in the transcription to correspond to the acoustic data.
-
-
10. The method of claim 8, wherein, for each tagged word, the correcting step includes determining if there is a co-articulation between the tagged word and surrounding words in the transcription.
-
11. The method of claim 10, wherein, if a co-articulation is detected, the correcting step includes:
-
(a) constructing a compound word which models the co-articulated words;
(b) constructing a baseform for the compound word; and
(c) replacing at least the tagged word with the compound word in the transcription.
-
-
12. The method of claim 8, wherein, for each tagged word, the correcting step includes:
-
(a) determining whether a baseform associated with the tagged word is correct; and
(b) if not, correcting the baseform.
-
-
13. The method of claim 8, wherein, for each tagged word, the correcting step includes determining whether a portion of the acoustic data corresponding to the tagged word includes noise.
-
14. The method of claim 13, wherein, if noise is included in the portion of the acoustic data corresponding to the tagged word, discarding the portion of the acoustic data.
-
15. Computer-based apparatus for identifying mismatches between acoustic data and a corresponding transcription associated with a speech recognition engine, the transcription being expressed in terms of basic units, the apparatus comprising:
-
a processor, operatively coupled to the speech recognition engine, for;
(a) aligning the acoustic data with the corresponding transcription;
(b) computing a probability score for each instance of a basic unit in the acoustic data based upon an alignment of the acoustic data with the corresponding transcription;
(c) generating a distribution function on the probability score for the each instance of a basic unit;
(d) tagging, as mismatches, instances of a basic unit corresponding to a particular range of scores in the distribution for each basic unit based on a threshold value; and
(e) correcting the mismatches, wherein the acoustic data corresponds to training data provided during a training phase of the speech recognition engine or to recognition data provided during a recognition phase of the speech recognition engine, but not to adaptation data. - View Dependent Claims (16, 17, 18)
-
-
19. Computer-based apparatus for identifying mismatches between acoustic data and a corresponding transcription associated with a speech recognition engine, the transcription being expressed in terms of basic units, the apparatus comprising:
-
(a) means for aligning the acoustic data with the corresponding transcription;
(b) means for computing a probability score for each instance of a basic unit in the acoustic data based upon an alignment of the acoustic data with the corresponding transcription;
(c) means for generating a distribution function on the probability score for the each instance of a basic unit;
(d) means for tagging, as mismatches, instances of a basic unit corresponding to a particular range of scores in the distribution for each basic unit based on a threshold value; and
(e) means for correcting the mismatches, wherein the acoustic data corresponds to training data provided during a training phase of a speech recognition system or to recognition data provided during a recognition phase of the speech recognition system, but not to adaptation data.
-
Specification