Unsupervised and active learning in automatic speech recognition for call classification
First Claim
1. A method comprising:
- performing, via a processor, automatic speech recognition using a bootstrap model on utterance data not having a corresponding manual transcription, to produce automatically transcribed utterances, wherein the bootstrap model is based on text data mined from a website relevant to a specific domain;
selecting, via the processor, a predetermined number of utterances not having a corresponding manual transcription based on a geometrically computed n-tuple confidence score;
receiving transcriptions of the predetermined number of utterances, wherein the transcriptions are made by a human being; and
generating a language model based on the automatically transcribed utterances, the predetermined number of utterances, and the transcriptions.
4 Assignments
0 Petitions
Accused Products
Abstract
Utterance data that includes at least a small amount of manually transcribed data is provided. Automatic speech recognition is performed on ones of the utterance data not having a corresponding manual transcription to produce automatically transcribed utterances. A model is trained using all of the manually transcribed data and the automatically transcribed utterances. A predetermined number of utterances not having a corresponding manual transcription are intelligently selected and manually transcribed. Ones of the automatically transcribed data as well as ones having a corresponding manual transcription are labeled. In another aspect of the invention, audio data is mined from at least one source, and a language model is trained for call classification from the mined audio data to produce a language model.
-
Citations
20 Claims
-
1. A method comprising:
-
performing, via a processor, automatic speech recognition using a bootstrap model on utterance data not having a corresponding manual transcription, to produce automatically transcribed utterances, wherein the bootstrap model is based on text data mined from a website relevant to a specific domain; selecting, via the processor, a predetermined number of utterances not having a corresponding manual transcription based on a geometrically computed n-tuple confidence score; receiving transcriptions of the predetermined number of utterances, wherein the transcriptions are made by a human being; and generating a language model based on the automatically transcribed utterances, the predetermined number of utterances, and the transcriptions. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system comprising:
-
a processor; and a computer-readable storage medium having instructions stored which, when executed by the processor, cause the processor to perform operations comprising; performing automatic speech recognition using a bootstrap model on utterance data not having a corresponding manual transcription, to produce automatically transcribed utterances, wherein the bootstrap model is based on text data mined from a website relevant to a specific domain; selecting a predetermined number of utterances not having a corresponding manual transcription based on a geometrically computed n-tuple confidence score; receiving transcriptions of the predetermined number of utterances, wherein the transcriptions are made by a human being; and generating a language model based on the automatically transcribed utterances, the predetermined number of utterances, and the transcriptions. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A computer-readable storage device having instructions stored which, when executed by a computing device, cause the computing device to perform operations comprising:
-
performing automatic speech recognition using a bootstrap model on utterance data not having a corresponding manual transcription, to produce automatically transcribed utterances, wherein the bootstrap model is based on text data mined from a website relevant to a specific domain; selecting a predetermined number of utterances not having a corresponding manual transcription based on a geometrically computed n-tuple confidence score; receiving transcriptions of the predetermined number of utterances, wherein the transcriptions are made by a human being; and generating a language model based on the automatically transcribed utterances, the predetermined number of utterances, and the transcriptions. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification