Method and apparatus for discriminative training of acoustic models of a speech recognition system
First Claim
1. A method of unsupervised training of acoustic models of a segmentation-based automatic speech recognition system, comprising:
- receiving correct segment-based alignment data that represents a correct alignment of a first sequence of utterance features received by the speech recognition system;
receiving incorrect segment-based alignment data that represents an incorrect alignment of a second sequence of utterance features received by the speech recognition system;
identifying a first phoneme in the correct alignment data that corresponds to a second phoneme in the incorrect alignment data; and
modifying a first acoustic model of the first phoneme by moving at least one mean value thereof closer to corresponding feature values in the first sequence of utterance features.
5 Assignments
0 Petitions
Accused Products
Abstract
A method and apparatus are provided for automatically training or modifying one or more models of acoustic units in a speech recognition system. Acoustic models are modified based on information about a particular application with which the speech recognizer is used, including speech segment alignment data for at least one correct alignment and at least one wrong alignment. The correct alignment correctly represents a phrase that the speaker uttered. The wrong alignment represents a phrase that the speech recognition system recognized that is incorrect. The segment alignment data is compared by segment to identify competing segments and those that induced the recognition error. When an erroneous segment is identified, acoustic models of the phoneme in the correct alignment are modified by moving their mean values closer to the segment'"'"'s acoustic features. Concurrently, acoustic models of the phoneme in the wrong alignment are modified by moving their mean values further from the acoustic features of the segment of the wrong alignment. As a result, the acoustic models will converge to more optimal values based on empirical utterance data representing recognition errors.
71 Citations
17 Claims
-
1. A method of unsupervised training of acoustic models of a segmentation-based automatic speech recognition system, comprising:
-
receiving correct segment-based alignment data that represents a correct alignment of a first sequence of utterance features received by the speech recognition system; receiving incorrect segment-based alignment data that represents an incorrect alignment of a second sequence of utterance features received by the speech recognition system; identifying a first phoneme in the correct alignment data that corresponds to a second phoneme in the incorrect alignment data; and modifying a first acoustic model of the first phoneme by moving at least one mean value thereof closer to corresponding feature values in the first sequence of utterance features. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A method of improving performance of a segmentation-based automatic speech recognition system (ASR) comprising:
-
receiving a correct segment-based alignment of a first sequence of utterance features received by the ASR; receiving an incorrect segment-based alignment of a second sequence of utterance features received by the ASR in the context of a particular application using the ASR; identifying a first phoneme in the correct segment-based alignment that corresponds to a second phoneme in the incorrect segment-based alignment; modifying a first acoustic model of the first phoneme by moving at least one mean value thereof closer to corresponding feature values in the first sequence of utterance features. - View Dependent Claims (13)
-
-
14. A computer-readable medium carrying one or more sequences of instructions for training acoustic models of a segmentation-based automatic speech recognition system, wherein execution of the one or more sequences of instructions by one or more processors causes the one or more processors to perform the steps of:
-
receiving correct segment-based alignment data that represents a correct alignment of a first sequence of utterance features received by the speech recognition system; receiving incorrect segment-based alignment data that represents an incorrect alignment of a second sequence of utterance features received by the speech recognition system; identifying a first phoneme in the correct alignment data that corresponds to a second phoneme in the incorrect alignment data; and modifying a first acoustic model of the first phoneme by moving at least one mean value thereof closer to corresponding feature values in the first sequence of utterance features. - View Dependent Claims (15)
-
-
16. A segmentation-based automatic speech recognition system comprising:
-
a speech recognizer that includes one or more processors; non-volatile storage coupled to the speech recognizer and comprising a plurality of segmentation alignment data and a plurality of acoustic models; a computer-readable medium coupled to the speech recognizer and carrying one or more sequences of instructions for training acoustic models, wherein execution of the one or more sequences of instructions by the one or more processors causes the one or more processors to perform the steps of; receiving correct segment-based alignment data that represents a correct segment alignment of a first sequence of utterance features received by the speech recognition system; receiving incorrect segment-based alignment data that represents an incorrect alignment of a second sequence of utterance features received by the speech recognition system; identifying a first phoneme in the correct alignment data that corresponds to a second phoneme in the incorrect alignment data; and modifying a first acoustic model of the first phoneme by moving at least one mean value thereof closer to corresponding feature values in the first sequence of utterance features. - View Dependent Claims (17)
-
Specification