MULTI-CHANNEL SPEECH RECOGNITION

US 20150149162A1
Filed: 11/22/2013
Published: 05/28/2015
Est. Priority Date: 11/22/2013
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

combining, via a processor, a first audio signal of a first speaker in a communication session and a second audio signal from a second speaker in the communication session as a first audio channel and a second audio channel, to yield a recording of the communication session;

recognizing speech in the first audio channel of the recording using a first model associated with the first speaker; and

recognizing speech in the second audio channel of the recording using a second model associated with the second speaker, wherein the first model is different from the second model.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Disclosed herein are systems, methods, and computer-readable storage devices for performing per-channel automatic speech recognition. An example system configured to practice the method combines a first audio signal of a first speaker in a communication session and a second audio signal from a second speaker in the communication session as a first audio channel and a second audio channel. The system can recognize speech in the first audio channel of the recording using a first model associated with the first speaker, and recognize speech in the second audio channel of the recording using a second model associated with the second speaker, wherein the first model is different from the second model. The system can generate recognized speech as an output from the communication session. The system can identify the models based on identifiers of the speakers, such as a telephone number, an IP address, a customer number, or account number.

14 Citations

View as Search Results

20 Claims

1. A method comprising:
- combining, via a processor, a first audio signal of a first speaker in a communication session and a second audio signal from a second speaker in the communication session as a first audio channel and a second audio channel, to yield a recording of the communication session;
  
  recognizing speech in the first audio channel of the recording using a first model associated with the first speaker; and
  
  recognizing speech in the second audio channel of the recording using a second model associated with the second speaker, wherein the first model is different from the second model.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, further comprising:
    - generating recognized speech as an output from the communication session.
  - 3. The method of claim 1, further comprising:
    - identifying the first model based on an identifier of the first speaker.
  - 4. The method of claim 3, wherein the identifier comprises at least one of a telephone number, an IP address, a customer number, and an account number.
  - 5. The method of claim 1, wherein each of the first model and the second model comprises at least one of an acoustic model and a language model.
  - 6. The method of claim 1, further comprising:
    - identifying when the first speaker is speaking based on voice activity detection performed on the first audio channel; and
      
      identifying when the second speaker is speaking based on voice activity detection performed on the second audio channel.
  - 7. The method of claim 1, wherein the first audio signal is received from a first telecommunications terminal of the first speaker, and the second audio signal is received from a second telecommunications terminal of the second speaker.

8. A system comprising:
- a processor; and
  
  a computer-readable storage device storing instructions which, when executed by the processor, cause the processor to perform operations comprising;
  
  receiving audio having a first audio channel of speech from a first speaker and a second audio channel of speech from a second speaker;
  
  identifying a first speech recognition model for the first speaker;
  
  identifying a second speech recognition model for the second speaker; and
  
  based on voice activity detection, applying the first speech recognition model to the audio when the voice activity detection is positive in the first audio channel, and applying the second speech recognition model to the audio when the voice activity detection is positive in the second audio channel.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The system of claim 8, wherein the operations further comprise:
    - caching the first speech recognition model and the second speech recognition model.
  - 10. The system of claim 9, wherein the operations further comprise:
    - switching between a cached first speech recognition model and a cached second speech recognition model to recognize speech in the audio.
  - 11. The system of claim 8, wherein each of the first speech recognition model and the second speech recognition model comprises at least one of an acoustic model and a language model.
  - 12. The system of claim 8, wherein identifying a first speech recognition model for the first speaker further comprises:
    - retrieving, from a database of speech recognition models, the first speech recognition model based on a unique identifier associated with the first audio channel.
  - 13. The system of claim 8, wherein the operations further comprise:
    - generating speech recognition output from applying the first speech recognition model and applying the second speech recognition model to the audio.
  - 14. The system of claim 8, wherein each of the first speech recognition model and the second speech recognition model is an acoustic model, and wherein the operations further comprise:
    - applying a common language model to the first audio channel and the second audio channel.

15. A computer-readable storage device storing instructions which, when executed by a computing device, cause the computing device to perform operations comprising:
- receiving audio having a first audio channel of speech from a first speaker and a second audio channel of speech from a second speaker;
  
  identifying a first speech recognition model for the first speaker;
  
  priming a second speech recognition model for the second speaker with data from the first speech recognition model; and
  
  based on per-channel voice activity detection, applying the first speech recognition model to the audio when the voice activity detection is positive in the first audio channel, and applying the second speech recognition model to the audio when the voice activity detection is positive in the second audio channel.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. The computer-readable storage device of claim 15, wherein the operations further comprise:
    - determining, prior to priming the second speech recognition model, that the first speech recognition model is more developed than the second speech recognition model.
  - 17. The computer-readable storage device of claim 16, wherein determining that the first speech recognition model is more developed than the second speech recognition model is based on at least one of vocabulary size, accuracy of recognition results, subjective ratings, and a completeness score.
  - 18. The computer-readable storage device of claim 15, wherein priming the second speech recognition model for the second speaker with data from the first speech recognition model comprises copying at least part of the first speech recognition model in to the second speech recognition model.
  - 19. The computer-readable storage device of claim 15, wherein identifying the first speech recognition model for the first speaker is based on a unique identifier for the first speaker.
  - 20. The computer-readable storage device of claim 19, wherein the unique identifier comprises at least one of a telephone number, an IP address, a customer number, and an account number.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
AT&T Intellectual Property I LP (AT&T, Inc.)
Inventors
MELAMED, Ilya Dan, LJOLJE, Andrej

Granted Patent

US 10,199,035 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/231
CPC Class Codes

G10L 15/07   to the speaker

G10L 15/20   Speech recognition techniqu...

G10L 15/28   Constructional details of s...

G10L 2015/227   of the speaker; Human-fact...

MULTI-CHANNEL SPEECH RECOGNITION

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

14 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

MULTI-CHANNEL SPEECH RECOGNITION

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

14 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links