Unsupervised and active learning in automatic speech recognition for call classification
First Claim
1. A method comprising:
- providing utterance data including at least a small amount of manually transcribed data;
performing automatic speech recognition on ones of the utterance data not having a corresponding manual transcription to produce automatically transcribed utterances;
training a model using all of the manually transcribed data and the automatically transcribed utterances;
intelligently selecting a predetermined number of utterances not having a corresponding manual transcription;
manually transcribing the selected number of utterances not having a corresponding manual transcription; and
labeling ones of the automatically transcribed data as well has ones of the manually transcribed data.
5 Assignments
0 Petitions
Accused Products
Abstract
Utterance data that includes at least a small amount of manually transcribed data is provided. Automatic speech recognition is performed on ones of the utterance data not having a corresponding manual transcription to produce automatically transcribed utterances. A model is trained using all of the manually transcribed data and the automatically transcribed utterances. A predetermined number of utterances not having a corresponding manual transcription are intelligently selected and manually transcribed. Ones of the automatically transcribed data as well as ones having a corresponding manual transcription are labeled. In another aspect of the invention, audio data is mined from at least one source, and a language model is trained for call classification from the mined audio data to produce a language model.
54 Citations
39 Claims
-
1. A method comprising:
-
providing utterance data including at least a small amount of manually transcribed data;
performing automatic speech recognition on ones of the utterance data not having a corresponding manual transcription to produce automatically transcribed utterances;
training a model using all of the manually transcribed data and the automatically transcribed utterances;
intelligently selecting a predetermined number of utterances not having a corresponding manual transcription;
manually transcribing the selected number of utterances not having a corresponding manual transcription; and
labeling ones of the automatically transcribed data as well has ones of the manually transcribed data. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A system comprising:
-
an automatic speech recognizer configured to automatically transcribe utterance data not having a corresponding manual transcription and produce a set of automatically transcribed data;
a learning module configured to intelligently select a predetermined number of utterances from the set of automatically transcribed data, the selected number of predetermined utterances to be manually transcribed, added to a set of manually transcribed data, and deleted from the set of automatically transcribed data;
a training module configured to train a language model using the set of manually transcribed data and the set of automatically transcribed data; and
a labeler to label at least some of the set of automatically transcribed data and the set of manually transcribed data. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
-
17. A system comprising:
-
means for performing automatic speech recognition on ones of a plurality of utterance data not having a corresponding manual transcription to produce automatically transcribed utterances;
means for training a language model using manually transcribed data and the automatically transcribed utterances;
means for intelligently selecting, for manual transcription, a predetermined number of utterances not having a corresponding manual transcription from the utterance data; and
a labeler to label ones of the automatically transcribed data as well as ones of the manually transcribed data. means for coordinating activities such that the means for performing automatic speech recognition, the means for training a language model, and the means for intelligently selecting repeatedly perform corresponding activities until word accuracy of the means for performing automatic speech recognition converges, wherein the labeler is to label ones of the automatically transcribed data as well as ones having a corresponding manual transcription after the word accuracy converges.
-
-
19. A machine-readable medium having a plurality of instructions recorded thereon, the instructions comprising:
-
instructions for performing automatic speech recognition on ones of a plurality of utterance data not having a corresponding manual transcription to produce automatically transcribed utterances;
instructions for training a model using manually transcribed data and the automatically transcribed utterances;
instructions for intelligently selecting, for manual transcription, a predetermined number of utterances, from the utterance data, not having a corresponding manual transcription;
instructions for receiving new manually transcribed data; and
instructions for permitting labeling of ones of the automatically transcribed data as well as ones of the manually transcribed data. - View Dependent Claims (20, 21, 22, 23, 24, 25)
-
-
26. A method comprising:
-
mining audio data from at least one source; and
training a language model for call classification from the mined audio data to produce a language model. - View Dependent Claims (27, 28, 29, 30, 31)
-
-
32. A machine-readable medium having recorded thereon a plurality of instructions for a processor, the machine-readable medium comprising:
-
a set of instructions for mining audio data from at least one source; and
a set of instructions for training a language model for call classification from the mined audio data to produce a language model. - View Dependent Claims (33, 34, 35)
-
-
36. An apparatus comprising:
-
a processor; and
storage to store instructions for the processor, wherein the processor is configured to;
mine audio data from at least one source, and train a language model for call classification from the mined audio data to produce a language model. - View Dependent Claims (37, 38, 39)
-
Specification