Training an automatic speech recognition system using compressed word frequencies
First Claim
1. A method comprising:
- obtaining, at a computing system, respective word frequencies fi from a corpus of utterance-to-text-string mappings, wherein the corpus of utterance-to-text-string mappings contains associations between audio utterances and respective text string transcriptions of the audio utterances, and wherein the respective word frequencies fi are based on occurrences of words in the text string transcriptions;
determining respective compressed word frequencies ci by raising each of the respective word frequencies fi to a power m, wherein m<
1 and ci=fim;
selecting sample utterance-to-text-string mappings from the corpus of utterance-to-text-string mappings based on the respective compressed word frequencies ci; and
training an automatic speech recognition (ASR) system with the sample utterance-to-text-string mappings.
2 Assignments
0 Petitions
Accused Products
Abstract
Respective word frequencies may be determined from a corpus of utterance-to-text-string mappings that contain associations between audio utterances and a respective text string transcription of each audio utterance. Respective compressed word frequencies may be obtained based on the respective word frequencies such that the distribution of the respective compressed word frequencies has a lower variance than the distribution of the respective word frequencies. Sample utterance-to-text-string mappings may be selected from the corpus of utterance-to-text-string mappings based on the compressed word frequencies. An automatic speech recognition (ASR) system may be trained with the sample utterance-to-text-string mappings.
307 Citations
20 Claims
-
1. A method comprising:
-
obtaining, at a computing system, respective word frequencies fi from a corpus of utterance-to-text-string mappings, wherein the corpus of utterance-to-text-string mappings contains associations between audio utterances and respective text string transcriptions of the audio utterances, and wherein the respective word frequencies fi are based on occurrences of words in the text string transcriptions; determining respective compressed word frequencies ci by raising each of the respective word frequencies fi to a power m, wherein m<
1 and ci=fim;selecting sample utterance-to-text-string mappings from the corpus of utterance-to-text-string mappings based on the respective compressed word frequencies ci; and training an automatic speech recognition (ASR) system with the sample utterance-to-text-string mappings. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. An article of manufacture including a non-transitory computer-readable storage medium, having stored thereon program instructions that, upon execution by a computing device, cause the computing device to perform operations comprising:
-
obtaining respective word frequencies from a corpus of utterance-to-text-string mappings, wherein the corpus of utterance-to-text-string mappings contains associations between audio utterances and respective text string transcriptions of the audio utterances, and wherein the respective word frequencies are based on occurrences of words in the text string transcriptions; determining respective compressed word frequencies based on the respective word frequencies, wherein a first distribution of the respective word frequencies has a higher variance than a second distribution of the respective compressed word frequencies; selecting sample utterance-to-text-string mappings from the corpus of utterance-to-text-string mappings based on the compressed word frequencies; and training an automatic speech recognition (ASR) system with the sample utterance-to-text-string mappings. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18, 19)
-
-
20. A computing system comprising:
-
at least one processor; data storage; and program instructions in the data storage that, upon execution by the at least one processor, cause the computing system to; obtain respective word frequencies from a corpus of utterance-to-text-string mappings, wherein the corpus of utterance-to-text-string mappings contains associations between audio utterances and respective text string transcriptions of audio utterances, and wherein the respective word frequencies are based on occurrences of words in the text string transcriptions, determine respective compressed word frequencies based on the respective word frequencies, wherein a first distribution of the respective word frequencies has a higher variance than a second distribution of the respective compressed word frequencies, select sample utterance-to-text-string mappings from the corpus of utterance-to-text-string mappings based on the compressed word frequencies, and train an automatic speech recognition (ASR) system with the sample utterance-to-text-string mappings.
-
Specification