Unsupervised and active learning in automatic speech recognition for call classification

US 9,159,318 B2
Filed: 08/26/2014
Issued: 10/13/2015
Est. Priority Date: 02/23/2005
Status: Expired due to Fees

First Claim

Patent Images

1. A method comprising:

performing, via a processor, automatic speech recognition using a bootstrap model on utterance data not having a corresponding manual transcription, to produce automatically transcribed utterances, wherein the bootstrap model is based on text data mined from a website relevant to a specific domain;

selecting, via the processor, a predetermined number of utterances not having a corresponding manual transcription based on a geometrically computed n-tuple confidence score;

receiving transcriptions of the predetermined number of utterances, wherein the transcriptions are made by a human being; and

generating a language model based on the automatically transcribed utterances, the predetermined number of utterances, and the transcriptions.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Utterance data that includes at least a small amount of manually transcribed data is provided. Automatic speech recognition is performed on ones of the utterance data not having a corresponding manual transcription to produce automatically transcribed utterances. A model is trained using all of the manually transcribed data and the automatically transcribed utterances. A predetermined number of utterances not having a corresponding manual transcription are intelligently selected and manually transcribed. Ones of the automatically transcribed data as well as ones having a corresponding manual transcription are labeled. In another aspect of the invention, audio data is mined from at least one source, and a language model is trained for call classification from the mined audio data to produce a language model.

Citations

20 Claims

1. A method comprising:
- performing, via a processor, automatic speech recognition using a bootstrap model on utterance data not having a corresponding manual transcription, to produce automatically transcribed utterances, wherein the bootstrap model is based on text data mined from a website relevant to a specific domain;
  
  selecting, via the processor, a predetermined number of utterances not having a corresponding manual transcription based on a geometrically computed n-tuple confidence score;
  
  receiving transcriptions of the predetermined number of utterances, wherein the transcriptions are made by a human being; and
  
  generating a language model based on the automatically transcribed utterances, the predetermined number of utterances, and the transcriptions.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, further comprising:
    - performing automatic speech recognition using the language model.
  - 3. The method of claim 2, further comprising:
    - iteratively repeating the performing of automatic speech recognition using the bootstrap model, the selecting, the receiving, the generating, and the performing of speech recognition using the language model until a word accuracy converges.
  - 4. The method of claim 1, wherein the predetermined number of utterances correspond to a specific number of utterances having lowest confidence scores.
  - 5. The method of claim 1, wherein the predetermined number of utterances used in generating the language model are equal in number to the automatically transcribed utterances.
  - 6. The method of claim 1, wherein the predetermined number of utterances are randomly selected.
  - 7. The method of claim 1, wherein the language model is further based on the bootstrap model.

8. A system comprising:
- a processor; and
  
  a computer-readable storage medium having instructions stored which, when executed by the processor, cause the processor to perform operations comprising;
  
  performing automatic speech recognition using a bootstrap model on utterance data not having a corresponding manual transcription, to produce automatically transcribed utterances, wherein the bootstrap model is based on text data mined from a website relevant to a specific domain;
  
  selecting a predetermined number of utterances not having a corresponding manual transcription based on a geometrically computed n-tuple confidence score;
  
  receiving transcriptions of the predetermined number of utterances, wherein the transcriptions are made by a human being; and
  
  generating a language model based on the automatically transcribed utterances, the predetermined number of utterances, and the transcriptions.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The system of claim 8, the computer-readable storage medium having additional instructions stored which, when executed by the processor, result in operations comprising:
    - performing automatic speech recognition using the language model.
  - 10. The system of claim 9, the computer-readable storage medium having additional instructions stored which, when executed by the processor, result in operations comprising:
    - iteratively repeating the performing of automatic speech recognition using the bootstrap model, the selecting, the receiving, the generating, and the performing of speech recognition using the language model until a word accuracy converges.
  - 11. The system of claim 8, wherein the predetermined number of utterances correspond to a specific number of utterances having lowest confidence scores.
  - 12. The system of claim 8, wherein the predetermined number of utterances used in generating the language model are equal in number to the automatically transcribed utterances.
  - 13. The system of claim 8, wherein the predetermined number of utterances are randomly selected.
  - 14. The system of claim 8, wherein the language model is further based on the bootstrap model.

15. A computer-readable storage device having instructions stored which, when executed by a computing device, cause the computing device to perform operations comprising:
- performing automatic speech recognition using a bootstrap model on utterance data not having a corresponding manual transcription, to produce automatically transcribed utterances, wherein the bootstrap model is based on text data mined from a website relevant to a specific domain;
  
  selecting a predetermined number of utterances not having a corresponding manual transcription based on a geometrically computed n-tuple confidence score;
  
  receiving transcriptions of the predetermined number of utterances, wherein the transcriptions are made by a human being; and
  
  generating a language model based on the automatically transcribed utterances, the predetermined number of utterances, and the transcriptions.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. The computer-readable storage device of claim 15, having additional instructions stored which, when executed by the processor, result in operations comprising:
    - performing automatic speech recognition using the language model.
  - 17. The computer-readable storage device of claim 16, having additional instructions stored which, when executed by the processor, result in operations comprising:
    - iteratively repeating the performing of automatic speech recognition using the bootstrap model, the selecting, the receiving, the generating, and the performing of speech recognition using the language model until a word accuracy converges.
  - 18. The computer-readable storage device of claim 15, wherein the predetermined number of utterances correspond to a specific number of utterances having lowest confidence scores.
  - 19. The computer-readable storage device of claim 15, wherein the predetermined number of utterances used in generating the language model are equal in number to the automatically transcribed utterances.
  - 20. The computer-readable storage device of claim 15, wherein the predetermined number of utterances are randomly selected.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
AT&T Intellectual Property I LP (AT&T, Inc.)
Inventors
Hakkani-Tur, Dilek Z., Rahim, Mazin G., Riccardi, Giuseppe, Tur, Gokhan
Primary Examiner(s)
Armstrong, Angela A

Application Number

US14/468,375
Publication Number

US 20150046159A1
Time in Patent Office

413 Days
Field of Search

704/243, 704/235
US Class Current

1/1
CPC Class Codes

G10L 15/063   Training

G10L 15/07   to the speaker

G10L 15/18   using natural language mode...

G10L 15/26   Speech to text systems G10L...

G10L 2015/0638   Interactive procedures

Unsupervised and active learning in automatic speech recognition for call classification

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Unsupervised and active learning in automatic speech recognition for call classification

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links