Unsupervised and active learning in automatic speech recognition for call classification

US 9,666,182 B2
Filed: 10/05/2015
Issued: 05/30/2017
Est. Priority Date: 02/23/2005
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

performing, via a processor, automatic speech recognition using a bootstrap model on utterance data not having a corresponding manual transcription, to produce automatically transcribed utterances, wherein the bootstrap model is based on text data mined from a website relevant to a specific domain;

selecting, via the processor, a predetermined number of utterances not having a corresponding manual transcription based on a geometrically computed n-tuple confidence score; and

generating a language model based on the automatically transcribed utterances, the predetermined number of utterances, and transcriptions of the predetermined number of utterances.

View all claims

5 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Utterance data that includes at least a small amount of manually transcribed data is provided. Automatic speech recognition is performed on ones of the utterance data not having a corresponding manual transcription to produce automatically transcribed utterances. A model is trained using all of the manually transcribed data and the automatically transcribed utterances. A predetermined number of utterances not having a corresponding manual transcription are intelligently selected and manually transcribed. Ones of the automatically transcribed data as well as ones having a corresponding manual transcription are labeled. In another aspect of the invention, audio data is mined from at least one source, and a language model is trained for call classification from the mined audio data to produce a language model.

21 Citations

View as Search Results

20 Claims

1. A method comprising:
- performing, via a processor, automatic speech recognition using a bootstrap model on utterance data not having a corresponding manual transcription, to produce automatically transcribed utterances, wherein the bootstrap model is based on text data mined from a website relevant to a specific domain;
  
  selecting, via the processor, a predetermined number of utterances not having a corresponding manual transcription based on a geometrically computed n-tuple confidence score; and
  
  generating a language model based on the automatically transcribed utterances, the predetermined number of utterances, and transcriptions of the predetermined number of utterances.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1, wherein the transcriptions of the predetermined number of utterances are made by a human being.
  - 3. The method of claim 1, further comprising:
    - performing additional automatic speech recognition using the language model.
  - 4. The method of claim 2, further comprising:
    - iteratively repeating the performing of the automatic speech recognition using the bootstrap model, the selecting, and the performing of additional speech recognition using the language model until a word accuracy converges.
  - 5. The method of claim 1, wherein the predetermined number of utterances correspond to a specific number of utterances having lowest confidence scores.
  - 6. The method of claim 1, wherein the predetermined number of utterances used in generating the language model are equal in number to the automatically transcribed utterances.
  - 7. The method of claim 1, wherein the predetermined number of utterances are randomly selected.
  - 8. The method of claim 1, wherein the language model is further based on the bootstrap model.

9. A system comprising:
- a processor; and
  
  a computer-readable storage medium having instructions stored which, when executed by the processor, cause the processor to perform operations comprising;
  
  performing automatic speech recognition using a bootstrap model on utterance data not having a corresponding manual transcription, to produce automatically transcribed utterances, wherein the bootstrap model is based on text data mined from a web site relevant to a specific domain;
  
  selecting a predetermined number of utterances not having a corresponding manual transcription based on a geometrically computed n-tuple confidence score; and
  
  generating a language model based on the automatically transcribed utterances, the predetermined number of utterances, and transcriptions of the predetermined number of utterances.
- View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
- - 10. The system of claim 9, wherein the transcriptions of the predetermined number of utterances are made by a human being.
  - 11. The system of claim 9, the computer-readable storage medium having additional instructions stored which, when executed by the processor, cause the processor to perform operations comprising:
    - performing additional automatic speech recognition using the language model.
  - 12. The system of claim 11, the computer-readable storage medium having additional instructions stored which, when executed by the processor, cause the processor to perform operations comprising:
    - iteratively repeating the performing of the automatic speech recognition using the bootstrap model, the selecting, and the performing of additional speech recognition using the language model until a word accuracy converges.
  - 13. The system of claim 9, wherein the predetermined number of utterances correspond to a specific number of utterances having lowest confidence scores.
  - 14. The system of claim 9, wherein the predetermined number of utterances used in generating the language model are equal in number to the automatically transcribed utterances.
  - 15. The system of claim 9, wherein the predetermined number of utterances are randomly selected.
  - 16. The system of claim 9, wherein the language model is further based on the bootstrap model.

17. A computer-readable storage device having instructions stored which, when executed by a computing device, cause the computing device to perform operations comprising:
- performing automatic speech recognition using a bootstrap model on utterance data not having a corresponding manual transcription, to produce automatically transcribed utterances, wherein the bootstrap model is based on text data mined from a website relevant to a specific domain;
  
  selecting a predetermined number of utterances not having a corresponding manual transcription based on a geometrically computed n-tuple confidence score; and
  
  generating a language model based on the automatically transcribed utterances, the predetermined number of utterances, and transcriptions of the predetermined number of utterances.
- View Dependent Claims (18, 19, 20)
- - 18. The computer-readable storage device of claim 17, wherein the transcriptions of the predetermined number of utterances are made by a human being.
  - 19. The computer-readable storage device of claim 17, having additional instructions stored which, when executed by the computing device, cause the computing device to perform operations comprising:
    - performing additional automatic speech recognition using the language model.
  - 20. The computer-readable storage device of claim 17, having additional instructions stored which, when executed by the computing device, cause the computing device to perform operations comprising:
    - iteratively repeating the performing of the automatic speech recognition using the bootstrap model, the selecting, and the performing of additional speech recognition using the language model until a word accuracy converges.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Rahim, Mazin G., Hakkani-Tur, Dilek Z., Tur, Gokhan, Riccardi, Giuseppe
Primary Examiner(s)
Armstrong, Angela A

Application Number

US14/874,843
Publication Number

US 20160027434A1
Time in Patent Office

603 Days
Field of Search
US Class Current
CPC Class Codes

G10L 15/063   Training

G10L 15/07   to the speaker

G10L 15/18   using natural language mode...

G10L 15/26   Speech to text systems G10L...

G10L 2015/0638   Interactive procedures

Unsupervised and active learning in automatic speech recognition for call classification

First Claim

5 Assignments

0 Petitions

Accused Products

Abstract

21 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Unsupervised and active learning in automatic speech recognition for call classification

First Claim

5 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

21 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links