Systems and methods for providing online fast speaker adaptation in speech recognition
First Claim
1. A method for performing speaker adaptation in a speech recognition system, comprising:
- receiving an audio segment;
determining whether the audio segment is a first audio segment associated with a speaker turn;
decoding the audio segment to generate a transcription associated with the first audio segment when the audio segment is the first audio segment;
estimating a transformation matrix based on the transcription associated with the first audio segment; and
decoding the audio segment using the transformation matrix to generate a transcription associated with a subsequent audio segment when the audio segment is not the first audio segment.
7 Assignments
0 Petitions
Accused Products
Abstract
A system (230) performs speaker adaptation when performing speech recognition. The system (230) receives an audio segment and identifies the audio segment as a first audio segment or a subsequent audio segment associated with a speaker turn. The system (230) then decodes the audio segment to generate a transcription associated with the first audio segment when the audio segment is the first audio segment and estimates a transformation matrix based on the transcription associated with the first audio segment. The system (230) decodes the audio segment using the transformation matrix to generate a transcription associated with the subsequent audio segment when the audio segment is the subsequent audio segment.
-
Citations
28 Claims
-
1. A method for performing speaker adaptation in a speech recognition system, comprising:
-
receiving an audio segment;
determining whether the audio segment is a first audio segment associated with a speaker turn;
decoding the audio segment to generate a transcription associated with the first audio segment when the audio segment is the first audio segment;
estimating a transformation matrix based on the transcription associated with the first audio segment; and
decoding the audio segment using the transformation matrix to generate a transcription associated with a subsequent audio segment when the audio segment is not the first audio segment. - View Dependent Claims (2, 3, 4, 5, 7, 8, 9, 10, 11)
-
-
6. The method of 5, further comprising:
-
receiving another audio segment associated with the speaker turn; and
decoding the other audio segment using the reestimated transformation matrix.
-
-
12. A system for performing speaker adaptation when performing speech recognition, comprising:
-
means for receiving an audio segment;
means for identifying the audio segment as a first audio segment or a subsequent audio segment associated with a speaker turn;
means for decoding the audio segment to generate a transcription associated with the first audio segment when the audio segment is the first audio segment;
means for estimating a transformation matrix based on the transcription associated with the first audio segment; and
means for decoding the audio segment using the transformation matrix to generate a transcription associated with the subsequent audio segment when the audio segment is the subsequent audio segment.
-
-
13. A decoder within a speech recognition system, comprising:
-
a forward decoding stage;
a backward decoding stage; and
a rescoring stage;
at least one of the forward decoding stage, the backward decoding stage, and the rescoring stage being configured to;
receive an audio segment, identify the audio segment as a first audio segment or a subsequent audio segment associated with a speaker turn, decode the audio segment to generate a transcription associated with the first audio segment when the audio segment is the first audio segment, estimate a transformation matrix based on the transcription associated with the first audio segment, and decode the audio segment using the transformation matrix to generate a transcription associated with the subsequent audio segment when the audio segment is the subsequent audio segment. - View Dependent Claims (14, 15, 16, 17, 19, 20, 21, 22, 23, 24, 25, 26, 27)
-
-
18. The decoder of 17, wherein the at least one of the forward decoding stage, the backward decoding stage, and the rescoring stage is further configured to:
-
receive another audio segment associated with the speaker turn, and decode the other audio segment using the reestimated transformation matrix.
-
-
28. A speech recognition system, comprising:
-
speaker change detection logic configured to;
receive a plurality of audio segments, and identify boundaries between speakers associated with the audio segments as speaker turns; and
a decoder configured to;
receive, from the speaker change detection logic, one of the audio segments as a received audio segment associated with one of the speaker turns, identify the received audio segment as a first audio segment or a subsequent audio segment associated with the speaker turn, decode the received audio segment to generate a transcription associated with the first audio segment when the received audio segment is the first audio segment, construct a transformation matrix based on the transcription associated with the first audio segment, and decode the received audio segment using the transformation matrix to generate a transcription associated with the subsequent audio segment when the received audio segment is the subsequent audio segment.
-
Specification