MULTI-CHANNEL SPEECH RECOGNITION
First Claim
1. A method comprising:
- combining, via a processor, a first audio signal of a first speaker in a communication session and a second audio signal from a second speaker in the communication session as a first audio channel and a second audio channel, to yield a recording of the communication session;
recognizing speech in the first audio channel of the recording using a first model associated with the first speaker; and
recognizing speech in the second audio channel of the recording using a second model associated with the second speaker, wherein the first model is different from the second model.
2 Assignments
0 Petitions
Accused Products
Abstract
Disclosed herein are systems, methods, and computer-readable storage devices for performing per-channel automatic speech recognition. An example system configured to practice the method combines a first audio signal of a first speaker in a communication session and a second audio signal from a second speaker in the communication session as a first audio channel and a second audio channel. The system can recognize speech in the first audio channel of the recording using a first model associated with the first speaker, and recognize speech in the second audio channel of the recording using a second model associated with the second speaker, wherein the first model is different from the second model. The system can generate recognized speech as an output from the communication session. The system can identify the models based on identifiers of the speakers, such as a telephone number, an IP address, a customer number, or account number.
14 Citations
20 Claims
-
1. A method comprising:
-
combining, via a processor, a first audio signal of a first speaker in a communication session and a second audio signal from a second speaker in the communication session as a first audio channel and a second audio channel, to yield a recording of the communication session; recognizing speech in the first audio channel of the recording using a first model associated with the first speaker; and recognizing speech in the second audio channel of the recording using a second model associated with the second speaker, wherein the first model is different from the second model. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system comprising:
-
a processor; and a computer-readable storage device storing instructions which, when executed by the processor, cause the processor to perform operations comprising; receiving audio having a first audio channel of speech from a first speaker and a second audio channel of speech from a second speaker; identifying a first speech recognition model for the first speaker; identifying a second speech recognition model for the second speaker; and based on voice activity detection, applying the first speech recognition model to the audio when the voice activity detection is positive in the first audio channel, and applying the second speech recognition model to the audio when the voice activity detection is positive in the second audio channel. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A computer-readable storage device storing instructions which, when executed by a computing device, cause the computing device to perform operations comprising:
-
receiving audio having a first audio channel of speech from a first speaker and a second audio channel of speech from a second speaker; identifying a first speech recognition model for the first speaker; priming a second speech recognition model for the second speaker with data from the first speech recognition model; and based on per-channel voice activity detection, applying the first speech recognition model to the audio when the voice activity detection is positive in the first audio channel, and applying the second speech recognition model to the audio when the voice activity detection is positive in the second audio channel. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification