Method and apparatus for discriminative training of acoustic models of a speech recognition system

US 7,216,079 B1
Filed: 11/02/1999
Issued: 05/08/2007
Est. Priority Date: 11/02/1999
Status: Expired due to Fees

First Claim

Patent Images

1. A method of unsupervised training of acoustic models of a segmentation-based automatic speech recognition system, comprising:

receiving correct segment-based alignment data that represents a correct alignment of a first sequence of utterance features received by the speech recognition system;

receiving incorrect segment-based alignment data that represents an incorrect alignment of a second sequence of utterance features received by the speech recognition system;

identifying a first phoneme in the correct alignment data that corresponds to a second phoneme in the incorrect alignment data; and

modifying a first acoustic model of the first phoneme by moving at least one mean value thereof closer to corresponding feature values in the first sequence of utterance features.

View all claims

5 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and apparatus are provided for automatically training or modifying one or more models of acoustic units in a speech recognition system. Acoustic models are modified based on information about a particular application with which the speech recognizer is used, including speech segment alignment data for at least one correct alignment and at least one wrong alignment. The correct alignment correctly represents a phrase that the speaker uttered. The wrong alignment represents a phrase that the speech recognition system recognized that is incorrect. The segment alignment data is compared by segment to identify competing segments and those that induced the recognition error. When an erroneous segment is identified, acoustic models of the phoneme in the correct alignment are modified by moving their mean values closer to the segment'"'"'s acoustic features. Concurrently, acoustic models of the phoneme in the wrong alignment are modified by moving their mean values further from the acoustic features of the segment of the wrong alignment. As a result, the acoustic models will converge to more optimal values based on empirical utterance data representing recognition errors.

71 Citations

View as Search Results

17 Claims

1. A method of unsupervised training of acoustic models of a segmentation-based automatic speech recognition system, comprising:
- receiving correct segment-based alignment data that represents a correct alignment of a first sequence of utterance features received by the speech recognition system;
  
  receiving incorrect segment-based alignment data that represents an incorrect alignment of a second sequence of utterance features received by the speech recognition system;
  
  identifying a first phoneme in the correct alignment data that corresponds to a second phoneme in the incorrect alignment data; and
  
  modifying a first acoustic model of the first phoneme by moving at least one mean value thereof closer to corresponding feature values in the first sequence of utterance features.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. A method as recited in claim 1, further comprising:
    - modifying a second acoustic model of the second phoneme by moving at least one mean value thereof farther from corresponding feature values in the second sequence of utterance features.
  - 3. A method as recited in claim 2, further comprising:
    - iteratively repeating the identifying and modifying steps for all phonemes in the wrong alignment data that correspond to one or more phonemes in the correct alignment data.
  - 4. A method as recited in claim 2, wherein modifying the second acoustic model includes modifying a set of model components associated with the second phoneme by moving each mean value thereof farther from the corresponding feature values in the second sequence of utterance features.
  - 5. A method as recited in claim 1, wherein the correct alignment data is known to be correct based on user confirmation information received from the speech recognition system.
  - 6. A method as recited in claim 1, further comprising:
    - iteratively repeating the identifying and modifying steps for all phonemes in the correct alignment data that correspond to one or more phonemes in the wrong alignment data.
  - 7. A method as recited in claim 1, wherein the correct alignment data includes data that represents a segment alignment of a less than highest scoring hypothesis from among n-best hypotheses of an utterance received by the speech recognition system.
  - 8. A method as recited in claim 1, wherein moving at least one mean value closer to corresponding feature values in the first sequence of utterance features includes subtracting a multiple of the corresponding feature values from the at least one mean value.
  - 9. A method as recited in claim 1, wherein the moving at least one mean value closer to corresponding feature values in the first sequence of utterance features includes modifying the at least one mean value by approximately 2%.
  - 10. A method as recited in claim 1, wherein modifying a first acoustic model includes modifying a set of model components associated with the first phoneme by moving each mean value thereof closer to the corresponding feature values in the first sequence of utterance features.
  - 11. A method as recited in claim 1, wherein the incorrect alignment data is known to be incorrect based on user confirmation information received from the speech recognition system.

12. A method of improving performance of a segmentation-based automatic speech recognition system (ASR) comprising:
- receiving a correct segment-based alignment of a first sequence of utterance features received by the ASR;
  
  receiving an incorrect segment-based alignment of a second sequence of utterance features received by the ASR in the context of a particular application using the ASR;
  
  identifying a first phoneme in the correct segment-based alignment that corresponds to a second phoneme in the incorrect segment-based alignment;
  
  modifying a first acoustic model of the first phoneme by moving at least one mean value thereof closer to corresponding feature values in the first sequence of utterance features.
- View Dependent Claims (13)
- - 13. A method as recited in claim 12, further comprising:
    - modifying a second acoustic model of the second phoneme by moving at least one mean value thereof farther from corresponding feature values in the second sequence of utterance features.

14. A computer-readable medium carrying one or more sequences of instructions for training acoustic models of a segmentation-based automatic speech recognition system, wherein execution of the one or more sequences of instructions by one or more processors causes the one or more processors to perform the steps of:
- receiving correct segment-based alignment data that represents a correct alignment of a first sequence of utterance features received by the speech recognition system;
  
  receiving incorrect segment-based alignment data that represents an incorrect alignment of a second sequence of utterance features received by the speech recognition system;
  
  identifying a first phoneme in the correct alignment data that corresponds to a second phoneme in the incorrect alignment data; and
  
  modifying a first acoustic model of the first phoneme by moving at least one mean value thereof closer to corresponding feature values in the first sequence of utterance features.
- View Dependent Claims (15)
- - 15. A computer-readable medium as recited in claim 14, wherein the instructions further comprise instructions for carrying out the steps of:
    - modifying a second acoustic model of the second phoneme by moving at least one mean value thereof farther from corresponding feature values in the second sequence of utterance features.

16. A segmentation-based automatic speech recognition system comprising:
- a speech recognizer that includes one or more processors;
  
  non-volatile storage coupled to the speech recognizer and comprising a plurality of segmentation alignment data and a plurality of acoustic models;
  
  a computer-readable medium coupled to the speech recognizer and carrying one or more sequences of instructions for training acoustic models, wherein execution of the one or more sequences of instructions by the one or more processors causes the one or more processors to perform the steps of;
  
  receiving correct segment-based alignment data that represents a correct segment alignment of a first sequence of utterance features received by the speech recognition system;
  
  receiving incorrect segment-based alignment data that represents an incorrect alignment of a second sequence of utterance features received by the speech recognition system;
  
  identifying a first phoneme in the correct alignment data that corresponds to a second phoneme in the incorrect alignment data; and
  
  modifying a first acoustic model of the first phoneme by moving at least one mean value thereof closer to corresponding feature values in the first sequence of utterance features.
- View Dependent Claims (17)
- - 17. A speech recognition system as recited in claim 16, wherein the instructions further comprise instructions for carrying out the steps of:
    - modifying a second acoustic model of the second phoneme by moving at least one mean value thereof farther from corresponding feature values in the second sequence of utterance features.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
SpeechWorks International, Inc. (Microsoft Corporation)
Original Assignee
SpeechWorks International, Inc. (Microsoft Corporation)
Inventors
Dahan, Jean-Guy, Barnard, Etienne
Primary Examiner(s)
ARMSTRONG, ANGELA A

Application Number

US09/433,609
Time in Patent Office

2,744 Days
Field of Search

704/236, 704/243, 704/244, 704/254
US Class Current

704/244
CPC Class Codes

G10L 15/063   Training

G10L 2015/025   Phonemes, fenemes or fenone...

G10L 2015/0631   Creating reference template...

G10L 2015/0635   updating or merging of old ...

Method and apparatus for discriminative training of acoustic models of a speech recognition system

First Claim

5 Assignments

0 Petitions

Accused Products

Abstract

71 Citations

17 Claims

Specification

Use Cases

Quick Links

Others

Method and apparatus for discriminative training of acoustic models of a speech recognition system

First Claim

5 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

71 Citations

17 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others