Identifying mismatches between assumed and actual pronunciations of words

US 6,377,921 B1
Filed: 06/26/1998
Issued: 04/23/2002
Est. Priority Date: 06/26/1998
Status: Expired due to Fees

First Claim

Patent Images

1. A method of identifying mismatches between acoustic data and a corresponding transcription, the transcription being expressed in terms of basic units, the method comprising the steps of:

(a) aligning the acoustic data with the corresponding transcription;

(b) computing a probability score for each instance of a basic unit in the acoustic data based upon an alignment of the acoustic data with the corresponding transcription;

(c) generating a distribution function on the probability score for the each instance of a basic unit;

(d) tagging, as mismatches, instances of a basic unit corresponding to a particular range of scores in the distribution for each basic unit based on a threshold value; and

(e) correcting the mismatches, wherein the acoustic data corresponds to training data provided during a training phase of a speech recognition system or to recognition data provided during a recognition phase of the speech recognition system, but not to adaptation data.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method of identifying mismatches between acoustic data and a corresponding transcription, the transcription being expressed in terms of basic units, comprises the steps of: aligning the acoustic data with the corresponding transcription; computing a probability score for each instance of a basic unit in the acoustic data with respect to the transcription; generating a distribution for each basic unit; tagging, as mismatches, instances of a basic unit corresponding to a particular range of scores in the distribution for each basic unit based on a threshold value; and correcting the mismatches.

39 Citations

View as Search Results

19 Claims

1. A method of identifying mismatches between acoustic data and a corresponding transcription, the transcription being expressed in terms of basic units, the method comprising the steps of:
- (a) aligning the acoustic data with the corresponding transcription;
  
  (b) computing a probability score for each instance of a basic unit in the acoustic data based upon an alignment of the acoustic data with the corresponding transcription;
  
  (c) generating a distribution function on the probability score for the each instance of a basic unit;
  
  (d) tagging, as mismatches, instances of a basic unit corresponding to a particular range of scores in the distribution for each basic unit based on a threshold value; and
  
  (e) correcting the mismatches, wherein the acoustic data corresponds to training data provided during a training phase of a speech recognition system or to recognition data provided during a recognition phase of the speech recognition system, but not to adaptation data.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
- - 2. The method of claim 1, wherein the step of aligning the acoustic data with the corresponding transcription includes performing a Viterbi alignment.
  - 3. The method of claim 1, wherein the step of computing a probability score includes:
4. The method of claim 3, wherein the step of normalizing the log likelihoods includes computing average per-frame log likelihoods of each instance of a basic unit.
5. The method of claim 1, wherein the step of generating a distribution function includes forming a histogram of probability scores for each basic unit of the acoustic data.
6. The method of claim 1, wherein the tagging step includes:
- (a) for each basic unit, determining whether the probability score of the basic unit is below the threshold value as compared to other instances of the same basic unit in the acoustic data; and
  
  (b) if so, tagging the basic unit as a mismatch.
7. The method of claim 1, wherein the threshold value is a percentage.
8. The method of claim 1, further including the step of tagging a word containing an instance of a basic unit corresponding to a lowest score in the distribution for each basic phonetic unit.
9. The method of claim 8, wherein, for each tagged word, the correcting step includes:
- (a) determining whether the transcription pertaining to the word is correct; and
  
  (b)) if the word is incorrect, correcting the word in the transcription to correspond to the acoustic data.
10. The method of claim 8, wherein, for each tagged word, the correcting step includes determining if there is a co-articulation between the tagged word and surrounding words in the transcription.
11. The method of claim 10, wherein, if a co-articulation is detected, the correcting step includes:
- (a) constructing a compound word which models the co-articulated words;
  
  (b) constructing a baseform for the compound word; and
  
  (c) replacing at least the tagged word with the compound word in the transcription.
12. The method of claim 8, wherein, for each tagged word, the correcting step includes:
- (a) determining whether a baseform associated with the tagged word is correct; and
  
  (b) if not, correcting the baseform.
13. The method of claim 8, wherein, for each tagged word, the correcting step includes determining whether a portion of the acoustic data corresponding to the tagged word includes noise.
14. The method of claim 13, wherein, if noise is included in the portion of the acoustic data corresponding to the tagged word, discarding the portion of the acoustic data.

15. Computer-based apparatus for identifying mismatches between acoustic data and a corresponding transcription associated with a speech recognition engine, the transcription being expressed in terms of basic units, the apparatus comprising:
- a processor, operatively coupled to the speech recognition engine, for;
  
  (a) aligning the acoustic data with the corresponding transcription;
  
  (b) computing a probability score for each instance of a basic unit in the acoustic data based upon an alignment of the acoustic data with the corresponding transcription;
  
  (c) generating a distribution function on the probability score for the each instance of a basic unit;
  
  (d) tagging, as mismatches, instances of a basic unit corresponding to a particular range of scores in the distribution for each basic unit based on a threshold value; and
  
  (e) correcting the mismatches, wherein the acoustic data corresponds to training data provided during a training phase of the speech recognition engine or to recognition data provided during a recognition phase of the speech recognition engine, but not to adaptation data.
- View Dependent Claims (16, 17, 18)
- - 16. The apparatus of claim 15, further comprising an input device for permitting a user to correct at least a portion of the tagged mismatches.
  - 17. The apparatus of claim 15, further comprising an output device permitting a user to correct at least a portion of the tagged mismatches.
  - 18. The apparatus of claim 15, further comprising an acoustic data playback device for permitting a user to correct at least a portion of the tagged mismatches.

19. Computer-based apparatus for identifying mismatches between acoustic data and a corresponding transcription associated with a speech recognition engine, the transcription being expressed in terms of basic units, the apparatus comprising:
- (a) means for aligning the acoustic data with the corresponding transcription;
  
  (b) means for computing a probability score for each instance of a basic unit in the acoustic data based upon an alignment of the acoustic data with the corresponding transcription;
  
  (c) means for generating a distribution function on the probability score for the each instance of a basic unit;
  
  (d) means for tagging, as mismatches, instances of a basic unit corresponding to a particular range of scores in the distribution for each basic unit based on a threshold value; and
  
  (e) means for correcting the mismatches, wherein the acoustic data corresponds to training data provided during a training phase of a speech recognition system or to recognition data provided during a recognition phase of the speech recognition system, but not to adaptation data.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Padmanabhan, Mukund, Bahl, Lalit R.
Primary Examiner(s)
Knepper, David D.

Application Number

US09/105,763
Time in Patent Office

1,397 Days
Field of Search

704/240-245, 704/250, 704/233, 704/255-257
US Class Current

704/243
CPC Class Codes

G10L 15/063 Training

G10L 2015/0631 Creating reference template...

Identifying mismatches between assumed and actual pronunciations of words

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

39 Citations

19 Claims

Specification

Use Cases

Quick Links

Others

Identifying mismatches between assumed and actual pronunciations of words

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

39 Citations

19 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others