Unsupervised and active learning in automatic speech recognition for call classification

US 20060190253A1
Filed: 02/23/2005
Published: 08/24/2006
Est. Priority Date: 02/23/2005
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

providing utterance data including at least a small amount of manually transcribed data;

performing automatic speech recognition on ones of the utterance data not having a corresponding manual transcription to produce automatically transcribed utterances;

training a model using all of the manually transcribed data and the automatically transcribed utterances;

intelligently selecting a predetermined number of utterances not having a corresponding manual transcription;

manually transcribing the selected number of utterances not having a corresponding manual transcription; and

labeling ones of the automatically transcribed data as well has ones of the manually transcribed data.

View all claims

5 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Utterance data that includes at least a small amount of manually transcribed data is provided. Automatic speech recognition is performed on ones of the utterance data not having a corresponding manual transcription to produce automatically transcribed utterances. A model is trained using all of the manually transcribed data and the automatically transcribed utterances. A predetermined number of utterances not having a corresponding manual transcription are intelligently selected and manually transcribed. Ones of the automatically transcribed data as well as ones having a corresponding manual transcription are labeled. In another aspect of the invention, audio data is mined from at least one source, and a language model is trained for call classification from the mined audio data to produce a language model.

54 Citations

View as Search Results

39 Claims

1. A method comprising:
- providing utterance data including at least a small amount of manually transcribed data;
  
  performing automatic speech recognition on ones of the utterance data not having a corresponding manual transcription to produce automatically transcribed utterances;
  
  training a model using all of the manually transcribed data and the automatically transcribed utterances;
  
  intelligently selecting a predetermined number of utterances not having a corresponding manual transcription;
  
  manually transcribing the selected number of utterances not having a corresponding manual transcription; and
  
  labeling ones of the automatically transcribed data as well has ones of the manually transcribed data.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1, further comprising:
    - performing automatic speech recognition of ones of the utterance data not having a corresponding manual transcription to produce a new set of automatically transcribed utterances;
      
      training a model using all of the manually transcribed data and the new set of automatically transcribed utterances;
      
      intelligently selecting another predetermined number of utterances not having a corresponding manual transcription; and
      
      manually transcribing the selected another predetermined number of utterances not having a corresponding manual transcription.
  - 3. The method of claim 2, further comprising:
    - determining confidence scores with respect to the new set of automatically transcribed utterances, wherein;
      
      the act of intelligently selecting a predetermined number of utterances not having a corresponding manual transcription is based on the confidence scores.
  - 4. The method of claim 3, wherein:
    - the act of intelligently selecting another predetermined number of utterances not having a corresponding manual transcription selects the predetermined number of utterances having lowest ones of the corresponding confidence scores.
  - 5. The method of claim 2, further comprising:
    - determining whether word accuracy of the new set of automatically transcribed utterances has converged, and repeating all of the acts of claim 2 when the determining has determined that the word accuracy has not converged.
  - 6. The method of claim 1, further comprising:
    - determining whether word accuracy of the set of automatically transcribed utterances has converged, wherein;
      
      the act of labeling ones of the automatically transcribed data as well as ones of the manually transcribed data is performed when the determining has determined that the word accuracy has converged.
  - 7. The method of claim 6, wherein:
    - the act of labeling ones of the automatically transcribed data as well as ones of the manually transcribed data is performed only when the determining has determined that the word accuracy has converged.
  - 8. The method of claim 1, wherein:
    - the model includes a spoken language model.

9. A system comprising:
- an automatic speech recognizer configured to automatically transcribe utterance data not having a corresponding manual transcription and produce a set of automatically transcribed data;
  
  a learning module configured to intelligently select a predetermined number of utterances from the set of automatically transcribed data, the selected number of predetermined utterances to be manually transcribed, added to a set of manually transcribed data, and deleted from the set of automatically transcribed data;
  
  a training module configured to train a language model using the set of manually transcribed data and the set of automatically transcribed data; and
  
  a labeler to label at least some of the set of automatically transcribed data and the set of manually transcribed data.
- View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
- - 10. The system of claim 9, wherein:
    - the learning module is configured to determine confidence scores with respect to the set of automatically transcribed data and intelligently select the predetermined number of utterances from the set of automatically transcribed data based on the confidence scores.
  - 11. The system of claim 10, wherein:
    - the learning module is further configured to select the predetermined number of utterances having lowest ones of the confidence scores.
  - 12. The system of claim 10, wherein:
    - the learning module is further configured to determine the confidence scores based on lattices produced by the automatic speech recognizer.
  - 13. The system of claim 9, wherein:
    - the automatic speech recognizer, the learning module, and the training module are configured to work together repeatedly, until word accuracy converges, to;
      
      automatically transcribe utterance data not having a corresponding manual transcription and produce a set of automatically transcribed data, intelligently select a predetermined number of utterances from the set of automatically transcribed data to be manually transcribed, added to the set of manually transcribed data, and deleted from the set of automatically transcribed data, and to train a language model using the set of manually transcribed data and the set of automatically transcribed data.
  - 14. The system of claim 13, wherein:
    - the labeler labels at least some of the set of automatically transcribed as well as the manually transcribed data after the word accuracy converges.
  - 15. The system of claim 13, wherein:
    - the labeler labels at least some of the set of automatically transcribed data only after the word accuracy converges.
  - 16. The system of claim 9, wherein:
    - the training module is further configured to train a spoken language model.

17. A system comprising:
- means for performing automatic speech recognition on ones of a plurality of utterance data not having a corresponding manual transcription to produce automatically transcribed utterances;
  
  means for training a language model using manually transcribed data and the automatically transcribed utterances;
  
  means for intelligently selecting, for manual transcription, a predetermined number of utterances not having a corresponding manual transcription from the utterance data; and
  
  a labeler to label ones of the automatically transcribed data as well as ones of the manually transcribed data. means for coordinating activities such that the means for performing automatic speech recognition, the means for training a language model, and the means for intelligently selecting repeatedly perform corresponding activities until word accuracy of the means for performing automatic speech recognition converges, wherein the labeler is to label ones of the automatically transcribed data as well as ones having a corresponding manual transcription after the word accuracy converges.

19. A machine-readable medium having a plurality of instructions recorded thereon, the instructions comprising:
- instructions for performing automatic speech recognition on ones of a plurality of utterance data not having a corresponding manual transcription to produce automatically transcribed utterances;
  
  instructions for training a model using manually transcribed data and the automatically transcribed utterances;
  
  instructions for intelligently selecting, for manual transcription, a predetermined number of utterances, from the utterance data, not having a corresponding manual transcription;
  
  instructions for receiving new manually transcribed data; and
  
  instructions for permitting labeling of ones of the automatically transcribed data as well as ones of the manually transcribed data.
- View Dependent Claims (20, 21, 22, 23, 24, 25)
- - 20. The machine-readable medium of claim 19, further comprising instructions for iteratively performing, until word accuracy of the automatic speech recognition converges:
    - automatic speech recognition on ones of a plurality of utterance data not having a corresponding manual transcription to produce automatically transcribed utterances, training a model using manually transcribed data and the automatically transcribed utterances, intelligently selecting, for manual transcription, a predetermined number of utterances, from the utterance data, not having a corresponding manual transcription, and receiving new manually transcribed data.
  - 21. The machine-readable medium of claim 19, wherein the instructions for permitting labeling ones of the automatically transcribed data as well as ones of the manually transcribed data further comprise:
    - instructions for permitting labeling ones of the automatically transcribed data as well as ones as well as ones of the manually transcribed data, after word accuracy of the automatic speech recognition converges.
  - 22. The machine-readable medium of claim 19, wherein:
    - the instructions for intelligently selecting, for manual transcription, a predetermined number of utterances, from the utterance data, not having a corresponding manual transcription further comprise;
      
      instructions for selecting the predetermined number of utterances based on confidence scores.
  - 23. The machine-readable medium of claim 22, wherein instructions for selecting the predetermined number of utterances based on confidence scores further comprises:
    - instructions for selecting the predetermined number of utterances having lowest ones of the confidence scores.
  - 24. The machine-readable medium of claim 22, wherein the confidence scores are based on lattices resulting from the performing of the automatic speech recognition.
  - 25. The machine-readable medium of claim 19, wherein the instructions for training a model using manually transcribed data and the automatically transcribed utterances further comprise instructions for training a language model.

26. A method comprising:
- mining audio data from at least one source; and
  
  training a language model for call classification from the mined audio data to produce a language model.
- View Dependent Claims (27, 28, 29, 30, 31)
- - 27. The method of claim 26, further comprising:
    - generating automatic speech recognition transcriptions using the produced language model;
      
      training a new language model using the generated automatic speech recognition transcriptions and any other available transcribed data; and
      
      generating new automatic speech recognition transcriptions using the new language model.
  - 28. The method of claim 27, further comprising:
    - repeating the acts of;
      
      training a new language model using the generated automatic speech recognition transcriptions and any other available transcribed data, and generating new automatic speech recognition transcriptions using the new language model.
  - 29. The method of claim 26, further comprising:
    - generating automatic speech recognition transcriptions using the produced language model;
      
      generating a new language model by applying an adaptation technique; and
      
      generating new automatic speech recognition transcriptions using the new language model.
  - 30. The method of claim 29, wherein the adaptation technique includes MAP adaptation and mixture modeling.
  - 31. The method of claim 29, further comprising:
    - repeating the acts of;
      
      generating a new language model by applying an adaptation technique; and
      
      generating new automatic speech recognition transcriptions using the new language model.

32. A machine-readable medium having recorded thereon a plurality of instructions for a processor, the machine-readable medium comprising:
- a set of instructions for mining audio data from at least one source; and
  
  a set of instructions for training a language model for call classification from the mined audio data to produce a language model.
- View Dependent Claims (33, 34, 35)
- - 33. The machine-readable medium of claim 32, further comprising:
    - a set of instructions for generating automatic speech recognition transcriptions using the produced language model;
      
      a set of instructions for training a new language model using the generated automatic speech recognition transcriptions and any other available transcribed data; and
      
      a set of instructions for generating new automatic speech recognition transcriptions using the new language model.
  - 34. The machine-readable medium of claim 32, further comprising:
    - a set of instructions for generating automatic speech recognition transcriptions using the produced language model;
      
      a set of instructions for generating a new language model by applying an adaptation technique; and
      
      a set of instructions for generating new automatic speech recognition transcriptions using the new language model.
  - 35. The machine-readable medium of claim 34, wherein the adaptation technique includes MAP adaptation.

36. An apparatus comprising:
- a processor; and
  
  storage to store instructions for the processor, wherein the processor is configured to;
  
  mine audio data from at least one source, and train a language model for call classification from the mined audio data to produce a language model.
- View Dependent Claims (37, 38, 39)
- - 37. The apparatus of claim 36, wherein the processor is further configured to:
    - generate automatic speech recognition transcriptions using the produced language model;
      
      train a new language model using the generated automatic speech recognition transcriptions and any other available transcribed data; and
      
      generate new automatic speech recognition transcriptions using the new language model.
  - 38. The apparatus of claim 36, wherein the processor is further configured to:
    - generate automatic speech recognition transcriptions using the produced language model;
      
      generate a new language model by applying an adaptation technique; and
      
      generate new automatic speech recognition transcriptions using the new language model.
  - 39. The apparatus of claim 38, wherein the adaptation technique includes MAP adaptation.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
AT&T Corporation (AT&T, Inc.)
Inventors
Riccardi, Giuseppe, Hakkani-Tur, Dilek Z., Rahim, Mazin G., Tur, Gokhan

Granted Patent

US 8,818,808 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/243
CPC Class Codes

G10L 15/063   Training

G10L 15/07   to the speaker

G10L 15/18   using natural language mode...

G10L 15/26   Speech to text systems G10L...

G10L 2015/0638   Interactive procedures

Unsupervised and active learning in automatic speech recognition for call classification

First Claim

5 Assignments

0 Petitions

Accused Products

Abstract

54 Citations

39 Claims

Specification

Solutions

Use Cases

Quick Links

Unsupervised and active learning in automatic speech recognition for call classification

First Claim

5 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

54 Citations

39 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links