Error reduction in speech processing
First Claim
1. A system for reducing errors in speech processing, comprising:
- an automatic speech recognition module configured to;
receive an utterance; and
generate an input word lattice based on the utterance;
an electronic data store in communication with the automatic speech recognition module, the electronic data store configured to store the input word lattice, a grammar and a phoneme confusion table, wherein the input word lattice comprises a plurality of speech recognition hypotheses, the grammar comprises a plurality of commands and the phoneme confusion table comprises a plurality of insertion, deletion and substitution probabilities; and
a natural language understanding module in communication with the electronic data store, the natural language understanding module configured to;
generate an input finite state transducer (FST) based at least in part on the input word lattice,wherein the input FST comprises sequences of phonemes organized into input FST paths, andwherein a path of the input FST paths corresponds with a speech recognition hypothesis of the speech recognition hypotheses;
generate an edit FST based at least in part on the phoneme confusion table;
generate a grammar FST based at least in part on the grammar,wherein the grammar FST comprises sequences of phonemes organized into grammar FST paths, andwherein a path of the grammar FST paths corresponds to a command of the plurality of commands;
generate an output FST using the input FST, the edit FST and the grammar FST,wherein a path of the output FST corresponds to a command of the plurality of commands, andwherein a first path of the output FST is indicative of a difference between a first path of the input FST paths and a first path of the grammar FST paths;
compute a first difference score using the first path of the output FST;
determine a command representative of the received utterance based at least in part on the first difference score; and
initiate an action based at least in part on the determined command.
1 Assignment
0 Petitions
Accused Products
Abstract
Features are disclosed for reducing errors in speech recognition processing. Methods for reducing errors can include receiving multiple speech recognition hypotheses based on an utterance indicative of a command or query of a user and determining a command or query within a grammar having a least amount of difference from one of the speech recognition hypotheses. The determination of the least amount of difference may be based at least in part on a comparison of individual subword units along at least some of the sequence paths of the speech recognition hypotheses and the grammar. For example, the comparison may be performed on the phoneme level instead of the word level.
207 Citations
30 Claims
-
1. A system for reducing errors in speech processing, comprising:
-
an automatic speech recognition module configured to; receive an utterance; and generate an input word lattice based on the utterance; an electronic data store in communication with the automatic speech recognition module, the electronic data store configured to store the input word lattice, a grammar and a phoneme confusion table, wherein the input word lattice comprises a plurality of speech recognition hypotheses, the grammar comprises a plurality of commands and the phoneme confusion table comprises a plurality of insertion, deletion and substitution probabilities; and a natural language understanding module in communication with the electronic data store, the natural language understanding module configured to; generate an input finite state transducer (FST) based at least in part on the input word lattice, wherein the input FST comprises sequences of phonemes organized into input FST paths, and wherein a path of the input FST paths corresponds with a speech recognition hypothesis of the speech recognition hypotheses; generate an edit FST based at least in part on the phoneme confusion table; generate a grammar FST based at least in part on the grammar, wherein the grammar FST comprises sequences of phonemes organized into grammar FST paths, and wherein a path of the grammar FST paths corresponds to a command of the plurality of commands; generate an output FST using the input FST, the edit FST and the grammar FST, wherein a path of the output FST corresponds to a command of the plurality of commands, and wherein a first path of the output FST is indicative of a difference between a first path of the input FST paths and a first path of the grammar FST paths; compute a first difference score using the first path of the output FST; determine a command representative of the received utterance based at least in part on the first difference score; and initiate an action based at least in part on the determined command. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A computer-implemented method, comprising:
under control of one or more computing devices configured with specific computer-executable instructions, receiving an utterance; generating an input finite state transducer (FST) comprising sequences of subword units organized into input FST paths, wherein a path of the input FST paths corresponds to a speech recognition hypothesis of a plurality of speech recognition hypotheses that are based on the received utterance; obtaining a grammar of utterances, wherein each utterance of the grammar of utterances comprises a sequence of subword units; generating, using the grammar of utterances, an utterance FST comprising sequences of subword units organized into utterance FST paths, wherein a path of the utterance FST paths corresponds to a command; generating an output FST using the input FST and the utterance FST, wherein the output FST comprises a first path indicative of a difference between a first path of the input FST paths and a first path of the utterance FST paths, and a second path indicative of a difference between the first path of the input FST paths and a second path of the utterance FST paths; computing a first difference score using the first path of the output FST; computing a second difference score using the second path of the output FST; and determining a first command representative of the received utterance based at least in part on the first difference score and the second difference score. - View Dependent Claims (7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24)
-
25. A non-transitory computer-readable medium comprising one or more modules configured to execute in one or more processors of a computing device, the one or more modules being further configured to:
-
receive an utterance; generate a plurality of speech recognition hypotheses based on the received utterance, wherein each hypothesis of the plurality of speech recognition hypotheses comprises a sequence of subword units; obtain a grammar of utterances, wherein each utterance of the plurality of utterances comprises a sequence of subword units; generate an input finite state transducer (FST) from the plurality of hypotheses; generate a grammar FST from the grammar of utterances, wherein the grammar FST comprises a plurality of paths of subword units, and wherein a path of the grammar FST corresponds to a command; generate an output FST using the input FST and the grammar FST, wherein the output FST comprises; a first path indicative of a difference between a first path of the input FST and a first path of the grammar FST; and a second path indicative of a difference between a second path of the input FST and a second path of the grammar FST; compute a first difference score using the first path of the output FST; compute a second difference score using the second path of the output FST; and determine a command representative of the received utterance based at least in part on the first difference score and the second difference score. - View Dependent Claims (26, 27, 28, 29, 30)
-
Specification