SPEAKER-CLUSTER DEPENDENT SPEAKER RECOGNITION (SPEAKER-TYPE AUTOMATED SPEECH RECOGNITION)

US 20110301949A1
Filed: 06/08/2010
Published: 12/08/2011
Est. Priority Date: 06/08/2010
Status: Active Grant

First Claim

Patent Images

1. An apparatus, comprising:

an audio corpus;

transcription data corresponding to the audio corpus;

an automatic speech recognition logic coupled to the interface for receiving the audio corpus and interface for receiving transcription data;

wherein the automatic speech recognition logic is operable to determine a plurality of speaker clusters corresponding to a plurality of speaker types from the audio corpus and transcription data corresponding to the audio corpus;

wherein the automatic speech recognition logic is trained for each speaker cluster belonging to the plurality of speaker clusters;

wherein the automatic speech recognition logic determines a selected speaker cluster selected from the plurality of speaker clusters for a source responsive to receiving audio data and a transcription of the audio data from a source; and

wherein the automatic speech recognition logic employs the selected speaker cluster for transcribing audio from the source.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In an example embodiment, there is disclosed herein an automatic speech recognition (ASR) system that employs speaker clustering (or speaker type) for transcribing audio. A large corpus of audio with corresponding transcripts is analyzed to determine a plurality of speaker types (e.g., dialects). The ASR system is trained for each speaker type. Upon encountering a new user, the ASR system attempts to map the user to a speaker type. After the new user is mapped to a speaker type, the ASR employs the speaker type for transcribing audio from the new user.

10 Citations

View as Search Results

20 Claims

1. An apparatus, comprising:
- an audio corpus;
  
  transcription data corresponding to the audio corpus;
  
  an automatic speech recognition logic coupled to the interface for receiving the audio corpus and interface for receiving transcription data;
  
  wherein the automatic speech recognition logic is operable to determine a plurality of speaker clusters corresponding to a plurality of speaker types from the audio corpus and transcription data corresponding to the audio corpus;
  
  wherein the automatic speech recognition logic is trained for each speaker cluster belonging to the plurality of speaker clusters;
  
  wherein the automatic speech recognition logic determines a selected speaker cluster selected from the plurality of speaker clusters for a source responsive to receiving audio data and a transcription of the audio data from a source; and
  
  wherein the automatic speech recognition logic employs the selected speaker cluster for transcribing audio from the source.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The apparatus of claim 1, wherein speaker independent automatic speech recognition is employed by the automatic speech recognition logic to transcribe audio for the source until the selected speaker type for the new user is determined.
  - 3. The apparatus of claim 1, wherein predetermined words are employed to train the automatic speech recognition logic.
  - 4. The apparatus of claim 1, wherein predetermined sub-words are employed to train the automatic speech recognition logic.
  - 5. The apparatus of claim 4, wherein the predetermined sub-words comprises phonemes.
  - 6. The apparatus of claim 4, wherein the predetermined sub-words comprises di-phones.
  - 7. The apparatus of claim 4, wherein the predetermined sub-words comprises tri-phones.
  - 8. The apparatus of claim 1, wherein the automatic speech recognition employs predetermined phrases for training.
  - 9. The apparatus of claim 1, wherein the audio corpus and transcription data corresponding to the audio corpus are received from a voice mail system.

10. A method, comprising:
- determining a plurality of speaker types by an automatic speech recognition system from a corpus of audio data and corresponding transcription data;
  
  training the automatic speech recognition system for each speaker type;
  
  receiving audio data from a new user;
  
  determining a selected one of the plurality of speaker types based on audio data and a transcription of the audio received from the new user by the automatic speech recognition system; and
  
  transcribing audio data by the automatic speech recognition system from the new user based on the selected one of the plurality of speaker types.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17)
- - 11. The method of claim 10, further comprising transcribing audio data received from the new user employing speaker independent transcription until the selected one of the plurality of speaker types is determined.
  - 12. The method of claim 10, wherein training the automatic speech recognition system is at least partially based on predetermined words.
  - 13. The method of claim 10, wherein training the automatic speech recognition system is at least partially based on predetermined phrases.
  - 14. The method of claim 10, wherein training the automatic speech recognition system is at least partially based on sub-words.
  - 15. The method of claim 14, wherein the sub-words comprise phonemes.
  - 16. The method of claim 14, wherein the sub-words comprises one of a group consisting of di-phones and tri-phones.
  - 17. The method of claim 10, further comprising re-evaluating the plurality of speaker types;
    - wherein re-evaluating includes audio data from the new user and transcribed audio data from the new user.

18. Logic encoded on at least one tangible media for execution by a processor, and when executed operable to:
- determine a plurality of speaker types from a corpus of audio data and corresponding transcription data;
  
  train for automatic speech recognition of each speaker type; and
  
  receiving audio data from a source;
  
  determine a selected one of the plurality of speaker types based on audio data and a transcription of the audio received from the source; and
  
  transcribe audio data from the source based on the selected one of the plurality of speaker types.
- View Dependent Claims (19, 20)
- - 19. The logic according to 18, wherein the logic is further operable to transcribe audio data form the source using speaker independent automatic speech recognition until the selected one of the plurality of speaker types is selected.
  - 20. The logic of claim 18, wherein logic is further operable to:
    - re-determine speaker types from the corpus of audio data and corresponding transcription data;
      
      re-train the automatic speech recognition; and
      
      re-determine the speaker type for the source.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Cisco Technology, Inc. (Cisco Systems, Inc.)
Original Assignee
Cisco Technology, Inc. (Cisco Systems, Inc.)
Inventors
Ramalho, Michael A., Tatum, Todd C., Sarkar, Shantanu

Granted Patent

US 8,600,750 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/231
CPC Class Codes

G10L 15/075 supervised, i.e. under mach...

G10L 2015/0631 Creating reference template...

SPEAKER-CLUSTER DEPENDENT SPEAKER RECOGNITION (SPEAKER-TYPE AUTOMATED SPEECH RECOGNITION)

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

10 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

SPEAKER-CLUSTER DEPENDENT SPEAKER RECOGNITION (SPEAKER-TYPE AUTOMATED SPEECH RECOGNITION)

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

10 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links