Speaker-cluster dependent speaker recognition (speaker-type automated speech recognition)

US 8,600,750 B2
Filed: 06/08/2010
Issued: 12/03/2013
Est. Priority Date: 06/08/2010
Status: Active Grant

First Claim

Patent Images

1. An apparatus, comprising:

an audio corpus;

transcription data corresponding to the audio corpus;

automatic speech recognition logic coupled with an associated interface for receiving the audio corpus and the associated interface for receiving transcription data;

wherein the automatic speech recognition logic is operable to analyze the audio corpus and the transcription data corresponding to the audio corpus to determine a plurality of speaker clusters corresponding to a plurality of speaker types from the audio corpus and the transcription data corresponding to the audio corpus;

wherein the automatic speech recognition logic is trained for each speaker cluster belonging to the plurality of speaker clusters;

wherein the automatic speech recognition logic is operable to determine a selected speaker cluster selected from the plurality of speaker clusters for an associated source responsive to receiving audio data and a transcription of the audio data from the associated source; and

wherein the automatic speech recognition logic employs the selected speaker cluster for transcribing the audio from the associated source.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In an example embodiment, there is disclosed herein an automatic speech recognition (ASR) system that employs speaker clustering (or speaker type) for transcribing audio. A large corpus of audio with corresponding transcripts is analyzed to determine a plurality of speaker types (e.g., dialects). The ASR system is trained for each speaker type. Upon encountering a new user, the ASR system attempts to map the user to a speaker type. After the new user is mapped to a speaker type, the ASR employs the speaker type for transcribing audio from the new user.

Citations

20 Claims

1. An apparatus, comprising:
- an audio corpus;
  
  transcription data corresponding to the audio corpus;
  
  automatic speech recognition logic coupled with an associated interface for receiving the audio corpus and the associated interface for receiving transcription data;
  
  wherein the automatic speech recognition logic is operable to analyze the audio corpus and the transcription data corresponding to the audio corpus to determine a plurality of speaker clusters corresponding to a plurality of speaker types from the audio corpus and the transcription data corresponding to the audio corpus;
  
  wherein the automatic speech recognition logic is trained for each speaker cluster belonging to the plurality of speaker clusters;
  
  wherein the automatic speech recognition logic is operable to determine a selected speaker cluster selected from the plurality of speaker clusters for an associated source responsive to receiving audio data and a transcription of the audio data from the associated source; and
  
  wherein the automatic speech recognition logic employs the selected speaker cluster for transcribing the audio from the associated source.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The apparatus of claim 1, wherein the automatic speech recognition logic selectively employs speaker independent automatic speech recognition to transcribe the audio from the associated source until the selected speaker type for the new user is determined.
  - 3. The apparatus of claim 1, wherein predetermined words are employed to train the automatic speech recognition logic.
  - 4. The apparatus of claim 1, wherein predetermined sub-words are employed to train the automatic speech recognition logic.
  - 5. The apparatus of claim 4, wherein the predetermined sub-words comprises phonemes.
  - 6. The apparatus of claim 4, wherein the predetermined sub-words comprises di-phones.
  - 7. The apparatus of claim 4, wherein the predetermined sub-words comprises tri-phones.
  - 8. The apparatus of claim 1, wherein the automatic speech recognition employs predetermined phrases for training.
  - 9. The apparatus of claim 1, wherein the audio corpus and transcription data corresponding to the audio corpus are received from a voice mail system.

10. A method, comprising:
- analyzing, by an automatic speech recognition system, a corpus of audio data and transcription data corresponding to the corpus of audio data to determine a plurality of speaker types from the corpus of audio data and the corresponding transcription data;
  
  training the automatic speech recognition system for each of the plurality of speaker types;
  
  receiving audio data from an associated new user;
  
  determining a selected one of the plurality of speaker types based on the audio data received from the associated new user and a transcription of the audio received from the associated new user by the automatic speech recognition system; and
  
  transcribing, by the automatic speech recognition system, the audio data received from the associated new user based on the selected one of the plurality of speaker types.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17)
- - 11. The method of claim 10, further comprising selectively transcribing, by the automatic speech recognition system, the audio data received from the associated new user employing speaker independent transcription until the selected one of the plurality of speaker types is determined.
  - 12. The method of claim 10, wherein training the automatic speech recognition system is at least partially based on predetermined words.
  - 13. The method of claim 10, wherein training the automatic speech recognition system is at least partially based on predetermined phrases.
  - 14. The method of claim 10, wherein training the automatic speech recognition system is at least partially based on sub-words.
  - 15. The method of claim 14, wherein the sub-words comprise phonemes.
  - 16. The method of claim 14, wherein the sub-words comprises one of a group consisting of di-phones and tri-phones.
  - 17. The method of claim 10, further comprising re-evaluating the plurality of speaker types;
    - wherein re-evaluating includes audio data from the new user and transcribed audio data from the new user.

18. Logic encoded on tangible non-transitory media for execution by a processor, and when executed operable to:
- analyze a corpus of audio data and transcription data corresponding to the audio data to determine a plurality of speaker types from the corpus of audio data and the corresponding transcription data;
  
  train for automatic speech recognition of each of the plurality of speaker types;
  
  receive audio data from an associated source;
  
  determine a selected one of the plurality of speaker types based on the audio data received from the associated source and a transcription of the audio received from the associated source; and
  
  selectively transcribe the audio data received from the associated source based on the selected one of the plurality of speaker types.
- View Dependent Claims (19, 20)
- - 19. The logic according to claim 18, wherein the logic is further operable to selectively transcribe the audio data received from the associated source using speaker independent automatic speech recognition until the selected one of the plurality of speaker types is determined.
  - 20. The logic of claim 18, wherein logic is further operable to:
    - re-determine speaker types from the corpus of audio data and corresponding transcription data;
      
      re-train the automatic speech recognition; and
      
      re-determine the speaker type for the source.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Cisco Technology, Inc. (Cisco Systems, Inc.)
Original Assignee
Cisco Technology, Inc. (Cisco Systems, Inc.)
Inventors
Ramalho, Michael A., Tatum, Todd C., Sarkar, Shantanu
Primary Examiner(s)
AZAD, ABUL K

Application Number

US12/795,959
Publication Number

US 20110301949A1
Time in Patent Office

1,274 Days
Field of Search

704/235, 704242-245
US Class Current

704/245
CPC Class Codes

G10L 15/075 supervised, i.e. under mach...

G10L 2015/0631 Creating reference template...

Speaker-cluster dependent speaker recognition (speaker-type automated speech recognition)

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Speaker-cluster dependent speaker recognition (speaker-type automated speech recognition)

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links