Error reduction in speech processing

US 9,697,827 B1
Filed: 12/11/2012
Issued: 07/04/2017
Est. Priority Date: 12/11/2012
Status: Active Grant

First Claim

Patent Images

1. A system for reducing errors in speech processing, comprising:

an automatic speech recognition module configured to;

receive an utterance; and

generate an input word lattice based on the utterance;

an electronic data store in communication with the automatic speech recognition module, the electronic data store configured to store the input word lattice, a grammar and a phoneme confusion table, wherein the input word lattice comprises a plurality of speech recognition hypotheses, the grammar comprises a plurality of commands and the phoneme confusion table comprises a plurality of insertion, deletion and substitution probabilities; and

a natural language understanding module in communication with the electronic data store, the natural language understanding module configured to;

generate an input finite state transducer (FST) based at least in part on the input word lattice,wherein the input FST comprises sequences of phonemes organized into input FST paths, andwherein a path of the input FST paths corresponds with a speech recognition hypothesis of the speech recognition hypotheses;

generate an edit FST based at least in part on the phoneme confusion table;

generate a grammar FST based at least in part on the grammar,wherein the grammar FST comprises sequences of phonemes organized into grammar FST paths, andwherein a path of the grammar FST paths corresponds to a command of the plurality of commands;

generate an output FST using the input FST, the edit FST and the grammar FST,wherein a path of the output FST corresponds to a command of the plurality of commands, andwherein a first path of the output FST is indicative of a difference between a first path of the input FST paths and a first path of the grammar FST paths;

compute a first difference score using the first path of the output FST;

determine a command representative of the received utterance based at least in part on the first difference score; and

initiate an action based at least in part on the determined command.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Features are disclosed for reducing errors in speech recognition processing. Methods for reducing errors can include receiving multiple speech recognition hypotheses based on an utterance indicative of a command or query of a user and determining a command or query within a grammar having a least amount of difference from one of the speech recognition hypotheses. The determination of the least amount of difference may be based at least in part on a comparison of individual subword units along at least some of the sequence paths of the speech recognition hypotheses and the grammar. For example, the comparison may be performed on the phoneme level instead of the word level.

207 Citations

30 Claims

1. A system for reducing errors in speech processing, comprising:
- an automatic speech recognition module configured to;
  
  receive an utterance; and
  
  generate an input word lattice based on the utterance;
  
  an electronic data store in communication with the automatic speech recognition module, the electronic data store configured to store the input word lattice, a grammar and a phoneme confusion table, wherein the input word lattice comprises a plurality of speech recognition hypotheses, the grammar comprises a plurality of commands and the phoneme confusion table comprises a plurality of insertion, deletion and substitution probabilities; and
  
  a natural language understanding module in communication with the electronic data store, the natural language understanding module configured to;
  
  generate an input finite state transducer (FST) based at least in part on the input word lattice,wherein the input FST comprises sequences of phonemes organized into input FST paths, andwherein a path of the input FST paths corresponds with a speech recognition hypothesis of the speech recognition hypotheses;
  
  generate an edit FST based at least in part on the phoneme confusion table;
  
  generate a grammar FST based at least in part on the grammar,wherein the grammar FST comprises sequences of phonemes organized into grammar FST paths, andwherein a path of the grammar FST paths corresponds to a command of the plurality of commands;
  
  generate an output FST using the input FST, the edit FST and the grammar FST,wherein a path of the output FST corresponds to a command of the plurality of commands, andwherein a first path of the output FST is indicative of a difference between a first path of the input FST paths and a first path of the grammar FST paths;
  
  compute a first difference score using the first path of the output FST;
  
  determine a command representative of the received utterance based at least in part on the first difference score; and
  
  initiate an action based at least in part on the determined command.
- View Dependent Claims (2, 3, 4, 5)
- - 2. The system of claim 1, wherein the first difference score is smaller than a difference score for any other pair of paths from the grammar FST and the input FST.
  - 3. The system of claim 1, wherein the output FST comprises a second path indicative of a difference between a second path of the input FST and a second path of the grammar FST.
  - 4. The system of claim 1, wherein the natural language understanding module is further configured to select a path from the output FST and wherein the selected path comprises information about the action.
  - 5. The system of claim 1, wherein the natural language understanding module is further configured to generate the output FST by performing a composition operation using the grammar FST, the edit FST and the input FST.

6. A computer-implemented method, comprising:
- under control of one or more computing devices configured with specific computer-executable instructions,receiving an utterance;
  
  generating an input finite state transducer (FST) comprising sequences of subword units organized into input FST paths, wherein a path of the input FST paths corresponds to a speech recognition hypothesis of a plurality of speech recognition hypotheses that are based on the received utterance;
  
  obtaining a grammar of utterances, wherein each utterance of the grammar of utterances comprises a sequence of subword units;
  
  generating, using the grammar of utterances, an utterance FST comprising sequences of subword units organized into utterance FST paths, wherein a path of the utterance FST paths corresponds to a command;
  
  generating an output FST using the input FST and the utterance FST, wherein the output FST comprises a first path indicative of a difference between a first path of the input FST paths and a first path of the utterance FST paths, and a second path indicative of a difference between the first path of the input FST paths and a second path of the utterance FST paths;
  
  computing a first difference score using the first path of the output FST;
  
  computing a second difference score using the second path of the output FST; and
  
  determining a first command representative of the received utterance based at least in part on the first difference score and the second difference score.
- View Dependent Claims (7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24)
- - 7. The method of claim 6, wherein the output FST comprises a second path indicative of a difference between a second path of the input FST and a second path of the utterance FST.
  - 8. The method of claim 7, further comprising computing a second difference score using the second path of the output FST.
  - 9. The method of claim 8, further comprising selecting a path from the output FST based at least in part on the first difference score and the second difference score.
  - 10. The method of claim 6, wherein the input FST was generated from one of a lattice of speech recognition results or an N-best list of speech recognition results.
  - 11. The method of claim 6, wherein said subword units comprise phonemes.
  - 12. The method of claim 6, wherein said computing a first difference score using the first path of the output FST comprises computing a Levenshtein distance.
  - 13. The method of claim 6, wherein said generating an output FST using the input FST and the utterance FST further comprises generating an edit FST based at least in part on a subword unit confusion table.
  - 14. The method of claim 13, wherein the subword unit confusion table comprises a plurality of insertion, deletion and substitution probabilities.
  - 15. The method of claim 14, wherein the difference score is based on the probabilities in the subword unit confusion table.
  - 16. The method of claim 15, wherein the difference score corresponding to substitution of a first subword unit with a second subword unit having a high confusion probability with the first subword unit is lower than the difference score corresponding to substitution of a third subword unit having a low confusion probability with the first subword unit.
  - 17. The method of claim 14, wherein the probabilities are context-dependent based at least in part on neighboring phonemes.
  - 18. The method of claim 13, wherein said generating an output FST comprises performing a composition operation using the utterance FST, the edit FST and the input FST.
  - 19. The method of claim 6, further comprising initiating an action based at least in part on the determined command representative of the received utterance.
  - 20. The method of claim 6, further comprising generating the plurality of speech recognition hypotheses by performing automatic speech recognition on the utterance.
  - 21. The method of claim 6, wherein:
    - determining the first command representative of the received utterance is further based on a capability of the one or more computing devices; and
      
      the method further comprises performing, with the one or more computing devices, an action based on the first command.
  - 22. The method of claim 6, wherein:
    - the first path of the output FST comprises a plurality of nodes and one or more arcs between nodes of the plurality of nodes;
      
      an arc of the one or more arcs is associated with a subword included in the first path of the input FST, a subword included in the first path of the utterance FST, and a difference score; and
      
      the difference score of the arc is based on a difference between the subword included the first path of the input FST and the subword included in the first path of the utterance FST.
  - 23. The method of claim 22, wherein the first difference score is based on one or more difference scores associated with the one or more arcs of the first path of the output FST.
  - 24. The method of claim 6, wherein the determining the first command is further based at least in part on at least one of:
    - historical user data, a user preference, availability of hardware, availability of a file, or a capability of a computing device to process the first command.

25. A non-transitory computer-readable medium comprising one or more modules configured to execute in one or more processors of a computing device, the one or more modules being further configured to:
- receive an utterance;
  
  generate a plurality of speech recognition hypotheses based on the received utterance, wherein each hypothesis of the plurality of speech recognition hypotheses comprises a sequence of subword units;
  
  obtain a grammar of utterances, wherein each utterance of the plurality of utterances comprises a sequence of subword units;
  
  generate an input finite state transducer (FST) from the plurality of hypotheses;
  
  generate a grammar FST from the grammar of utterances,wherein the grammar FST comprises a plurality of paths of subword units, andwherein a path of the grammar FST corresponds to a command;
  
  generate an output FST using the input FST and the grammar FST, wherein the output FST comprises;
  
  a first path indicative of a difference between a first path of the input FST and a first path of the grammar FST; and
  
  a second path indicative of a difference between a second path of the input FST and a second path of the grammar FST;
  
  compute a first difference score using the first path of the output FST;
  
  compute a second difference score using the second path of the output FST; and
  
  determine a command representative of the received utterance based at least in part on the first difference score and the second difference score.
- View Dependent Claims (26, 27, 28, 29, 30)
- - 26. The non-transitory computer readable medium of claim 25, wherein said utterances comprise commands or queries.
  - 27. The non-transitory computer readable medium of claim 25, wherein said computing the first difference score and said computing the second difference score further comprises generating an edit FST that assigns penalties associated with insertions, deletions and substitutions of subword units between the input FST and the grammar FST.
  - 28. The non-transitory computer readable medium of claim 27, wherein said edit FST is composed with the input FST and the grammar FST by performing a composition operation to generate an output FST.
  - 29. The non-transitory computer readable medium of claim 27, wherein substitution penalties are based on phonetic similarity between subword units.
  - 30. The non-transitory computer readable medium of claim 25, wherein said first difference score and said second difference score are performed by computing a Levenshtein distance.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Original Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Inventors
Lilly, Jeffrey Paul, Adams, Jeffrey Penrod, Thomas, Ryan Paul
Primary Examiner(s)
Dorvil, Richemond
Assistant Examiner(s)
Le, Thuykhanh

Application Number

US13/711,478
Time in Patent Office

1,666 Days
Field of Search

None
US Class Current
CPC Class Codes

G10L 15/14   using statistical models, e...

G10L 15/18   using natural language mode...

G10L 15/19   Grammatical context, e.g. d...

Error reduction in speech processing

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

207 Citations

30 Claims

Specification

Use Cases

Quick Links

Others

Error reduction in speech processing

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

207 Citations

30 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others