Speaker adaptation based on lateral tying for large-vocabulary continuous speech recognition
First Claim
1. A method of performing speaker adaptation in a speech recognition system which includes a set of reference models corresponding to speech data from a plurality of speakers, the speech data represented by a plurality of acoustic models and corresponding sub-events, wherein each sub-event includes one or more observations of speech data, the method comprising the steps of:
- (a) computing a degree of lateral tying between each pair of sub-events, wherein the degree of tying indicates the degree to which a first observation in a first sub-event contributes to the remaining sub-events;
(b) assigning a new observation from adaptation data of a new speaker to one of the sub-events;
(c) populating each of the sub-events with a transformed version of the observation contained in the assigned sub-event based on the degree of lateral tying computed between each pair of sub-events;
(d) adapting the reference models that correspond to the populated sub-events to account for speech pattern idiosyncrasies of the new speaker, thereby reducing the error rate of the speech recognition system.
2 Assignments
0 Petitions
Accused Products
Abstract
A system and method for performing speaker adaptation in a speech recognition system which includes a set of reference models corresponding to speech data from a plurality of speakers. The speech data is represented by a plurality of acoustic models and corresponding sub-events, and each sub-event includes one or more observations of speech data. A degree of lateral tying is computed between each pair of sub-events, wherein the degree of tying indicates the degree to which a first observation in a first sub-event contributes to the remaining sub-events. When adaptation data from a new speaker becomes available, a new observation from adaptation data is assigned to one of the sub-events. Each of the sub-events is then populated with the observations contained in the assigned sub-event based on the degree of lateral tying that was computed between each pair of sub-events. The reference models corresponding to the populated sub-events are then adapted to account for speech pattern idiosyncrasies of the new speaker, thereby reducing the error rate of the speech recognition system.
190 Citations
30 Claims
-
1. A method of performing speaker adaptation in a speech recognition system which includes a set of reference models corresponding to speech data from a plurality of speakers, the speech data represented by a plurality of acoustic models and corresponding sub-events, wherein each sub-event includes one or more observations of speech data, the method comprising the steps of:
-
(a) computing a degree of lateral tying between each pair of sub-events, wherein the degree of tying indicates the degree to which a first observation in a first sub-event contributes to the remaining sub-events; (b) assigning a new observation from adaptation data of a new speaker to one of the sub-events; (c) populating each of the sub-events with a transformed version of the observation contained in the assigned sub-event based on the degree of lateral tying computed between each pair of sub-events; (d) adapting the reference models that correspond to the populated sub-events to account for speech pattern idiosyncrasies of the new speaker, thereby reducing the error rate of the speech recognition system. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 20, 21, 22, 23, 24, 25, 26, 27)
-
-
10. A speech recognition system that performs speaker adaptation, the system including a set of reference models corresponding to speech data from a plurality of speakers, the speech data represented by a plurality of acoustic models and corresponding sub-events, wherein each sub-event includes one or more observations of speech data, the speech recognition system comprising:
-
(a) means for computing a degree of lateral tying between each pair of sub-events, wherein the degree of tying indicates the degree to which a first observation in a first sub-event contributes to the remaining sub-events; (b) means for assigning a new observation from adaptation data of a new speaker to one of the sub-events; (c) means for populating each of the sub-events with a transformed version of the observation contained in the assigned sub-event based on the degree of lateral tying computed between each pair of sub-events; (d) means for adapting the reference models that correspond to the populated sub-events to account for speech pattern idiosyncrasies of the new speaker, thereby reducing the error rate of the speech recognition system. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A computer-readable medium containing program instructions for performing speaker adaptation in a speech recognition system which includes a set of reference models corresponding to speech data from a plurality of speakers, the speech data represented by a plurality of acoustic models and corresponding sub-events, wherein each sub-event includes one or more observations of speech data, the program instructions for:
-
(a) computing a degree of lateral tying between each pair of sub-events, wherein the degree of tying indicates the degree to which a first observation in a first sub-event contributes to the remaining sub-events; (b) assigning a new observation from adaptation data of a new speaker to one of the sub-events; (c) populating each of the sub-events with a transformed version of the observation contained in the assigned sub-event based on the degree of lateral tying computed between each pair of sub-events; (d) adapting the reference models that correspond to the populated sub-events to account for speech pattern idiosyncrasies of the new speaker, thereby reducing the error rate of the speech recognition system.
-
-
28. A method of performing speaker adaptation in a speech recognition system which includes a set of reference models corresponding to speech data, the speech data represented by a first and second reference model, the first reference model comprising a first sub-event and second reference model comprising a second sub-event, wherein the first sub-event is well populated with a plurality of feature vectors and the second sub-event is sparsely populated with feature vectors, the method comprising the steps of:
-
(a) computing a transformation between the first and second sub-event to indicate a degree of lateral tying between the first and second sub-event; (b) assigning a new feature vector extracted from adaptation data to the first sub-event; (c) if the computed transformation indicates a degree of lateral tying that surpasses a desired threshold, applying the computed transformation to the feature vectors in the first sub-event to transform the feature vectors into the space of the second sub-event to thereby populate the second sub-event with feature vectors from the first sub-event; and (d) adapting the reference models that correspond to the populated sub-events to account for speech pattern idiosyncrasies of the new speaker, thereby reducing the error rate of the speech recognition system. - View Dependent Claims (29, 30)
-
Specification