Training an automatic speech recognition system using compressed word frequencies

US 8,543,398 B1
Filed: 11/01/2012
Issued: 09/24/2013
Est. Priority Date: 02/29/2012
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

obtaining, at a computing system, respective word frequencies f_ifrom a corpus of utterance-to-text-string mappings, wherein the corpus of utterance-to-text-string mappings contains associations between audio utterances and respective text string transcriptions of the audio utterances, and wherein the respective word frequencies f_iare based on occurrences of words in the text string transcriptions;

determining respective compressed word frequencies c_iby raising each of the respective word frequencies f_ito a power m, wherein m<

1 and c_i=f_i^m;

selecting sample utterance-to-text-string mappings from the corpus of utterance-to-text-string mappings based on the respective compressed word frequencies c_i; and

training an automatic speech recognition (ASR) system with the sample utterance-to-text-string mappings.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Respective word frequencies may be determined from a corpus of utterance-to-text-string mappings that contain associations between audio utterances and a respective text string transcription of each audio utterance. Respective compressed word frequencies may be obtained based on the respective word frequencies such that the distribution of the respective compressed word frequencies has a lower variance than the distribution of the respective word frequencies. Sample utterance-to-text-string mappings may be selected from the corpus of utterance-to-text-string mappings based on the compressed word frequencies. An automatic speech recognition (ASR) system may be trained with the sample utterance-to-text-string mappings.

307 Citations

20 Claims

1. A method comprising:
- obtaining, at a computing system, respective word frequencies f_ifrom a corpus of utterance-to-text-string mappings, wherein the corpus of utterance-to-text-string mappings contains associations between audio utterances and respective text string transcriptions of the audio utterances, and wherein the respective word frequencies f_iare based on occurrences of words in the text string transcriptions;
  
  determining respective compressed word frequencies c_iby raising each of the respective word frequencies f_ito a power m, wherein m<
  
  1 and c_i=f_i^m;
  
  selecting sample utterance-to-text-string mappings from the corpus of utterance-to-text-string mappings based on the respective compressed word frequencies c_i; and
  
  training an automatic speech recognition (ASR) system with the sample utterance-to-text-string mappings.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1, wherein m is also greater than 0.
  - 3. The method of claim 2, wherein m is also greater than or equal to 0.5.
  - 4. The method of claim 1, wherein obtaining the respective word frequencies f_icomprises:
    - developing a histogram that associates the words in the text string transcriptions with respective counts of the occurrences of the words; and
      
      obtaining the respective word frequencies f_ifrom the respective counts.
  - 5. The method of claim 1, wherein selecting the sample utterance-to-text-string mappings from the corpus of utterance-to-text-string mappings based on the compressed word frequencies c_icomprises:
    - determining a first word selection probability for a first word based on a first compressed word frequency of the first word divided by a first word frequency of the first word, wherein the sample utterance-to-text-string mappings include a particular utterance mapped to a particular text string, and wherein the particular text string contains the first word; and
      
      selecting the particular utterance based on the first word selection probability.
  - 6. The method of claim 5, wherein selecting the sample utterance-to-text-string mappings from the corpus of utterance-to-text-string mappings based on the compressed word frequencies c_ifurther comprises:
    - determining a second word selection probability for a second word based on a second compressed word frequency of the second word divided by a second word frequency of the second word, wherein the particular text string also contains the second word; and
      
      selecting the particular utterance based on the first word selection probability and second word selection probability.
  - 7. The method of claim 6, wherein selecting the particular utterance based on the first word selection probability and second word selection probability comprises:
    - calculating an arithmetic mean of the first word selection probability and the second word selection probability; and
      
      selecting the particular utterance with a probability of the arithmetic mean.
  - 8. The method of claim 6, wherein selecting the particular utterance based on the first word selection probability and second word selection probability comprises:
    - calculating a geometric mean of the first word selection probability and the second word selection probability; and
      
      selecting the particular utterance with a probability of the geometric mean.
  - 9. The method of claim 1, further comprising:
    - before training the ASR system with the sample utterance-to-text-string mappings, training the ASR system with the entire corpus of utterance-to-text-string mappings.

10. An article of manufacture including a non-transitory computer-readable storage medium, having stored thereon program instructions that, upon execution by a computing device, cause the computing device to perform operations comprising:
- obtaining respective word frequencies from a corpus of utterance-to-text-string mappings, wherein the corpus of utterance-to-text-string mappings contains associations between audio utterances and respective text string transcriptions of the audio utterances, and wherein the respective word frequencies are based on occurrences of words in the text string transcriptions;
  
  determining respective compressed word frequencies based on the respective word frequencies, wherein a first distribution of the respective word frequencies has a higher variance than a second distribution of the respective compressed word frequencies;
  
  selecting sample utterance-to-text-string mappings from the corpus of utterance-to-text-string mappings based on the compressed word frequencies; and
  
  training an automatic speech recognition (ASR) system with the sample utterance-to-text-string mappings.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18, 19)
- - 11. The article of manufacture of claim 10, wherein determining the respective compressed word frequencies based on the respective word frequencies comprises:
    - raising the respective word frequencies to a power to form the respective compressed word frequencies, wherein the power is less than 1.
  - 12. The article of manufacture of claim 11, wherein the power is also greater than 0.
  - 13. The article of manufacture of claim 12, wherein the power is also greater than or equal to 0.5.
  - 14. The article of manufacture of claim 10, wherein obtaining the respective word frequencies comprises:
    - developing a histogram that associates words in the text string transcriptions in the corpus of utterance-to-text-string mappings with respective counts of occurrences of the words; and
      
      obtaining the respective word frequencies from the respective counts.
  - 15. The article of manufacture of claim 10, wherein selecting the sample utterance-to-text-string mappings from the corpus of utterance-to-text-string mappings based on the compressed word frequencies comprises:
    - determining a first word selection probability for a first word based on a first compressed word frequency of the first word divided by a first word frequency of the first word, wherein the sample utterance-to-text-string mappings include a particular utterance mapped to a particular text string, and wherein the particular text string contains the first word; and
      
      selecting the particular utterance based on the first word selection probability.
  - 16. The article of manufacture of claim 15, wherein selecting the sample utterance-to-text-string mappings from the corpus of utterance-to-text-string mappings based on the compressed word frequencies further comprises:
    - determining a second word selection probability for a second word based on a second compressed word frequency of the second word divided by a second word frequency of the second word, wherein the particular text string also contains the second word; and
      
      selecting the particular utterance based on the first word selection probability and second word selection probability.
  - 17. The article of manufacture of claim 16, wherein selecting the particular utterance based on the first word selection probability and second word selection probability comprises:
    - calculating an arithmetic mean of the first word selection probability and the second word selection probability; and
      
      selecting the particular utterance with a probability of the arithmetic mean.
  - 18. The article of manufacture of claim 16, wherein selecting the particular utterance based on the first word selection probability and second word selection probability comprises:
    - calculating a geometric mean of the first word selection probability and the second word selection probability; and
      
      selecting the particular utterance with a probability of the geometric mean.
  - 19. The article of manufacture of claim 10, wherein the program instructions further cause the computing device to perform operations comprising:
    - before training the ASR system with the sample utterance-to-text-string mappings, training the ASR system with the entire corpus of utterance-to-text-string mappings.

20. A computing system comprising:
- at least one processor;
  
  data storage; and
  
  program instructions in the data storage that, upon execution by the at least one processor, cause the computing system to;
  
  obtain respective word frequencies from a corpus of utterance-to-text-string mappings, wherein the corpus of utterance-to-text-string mappings contains associations between audio utterances and respective text string transcriptions of audio utterances, and wherein the respective word frequencies are based on occurrences of words in the text string transcriptions,determine respective compressed word frequencies based on the respective word frequencies, wherein a first distribution of the respective word frequencies has a higher variance than a second distribution of the respective compressed word frequencies,select sample utterance-to-text-string mappings from the corpus of utterance-to-text-string mappings based on the compressed word frequencies, andtrain an automatic speech recognition (ASR) system with the sample utterance-to-text-string mappings.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Strope, Brian, Weintraub, Mitchel
Primary Examiner(s)
AZAD, ABUL K

Application Number

US13/666,223
Time in Patent Office

327 Days
Field of Search

704200-278
US Class Current

704/235
CPC Class Codes

G10L 15/063 Training

Training an automatic speech recognition system using compressed word frequencies

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

307 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Training an automatic speech recognition system using compressed word frequencies

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

307 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links