Using models to detect potential significant errors in speech recognition results
First Claim
1. A method of processing a result of a recognition by an automatic speech recognition (ASR) system on a speech input, the method comprising:
- determining whether the result includes a member of a set of words or phrases, the set of words or phrases comprising a plurality of members and each member of the set comprising a word or phrase; and
when it is determined that the result includes a word or phrase of a first member of the set,producing a modified result by substituting a word or phrase of a second member of the set for the word or phrase of the first member in the result,evaluating the modified result using at least one language model and/or at least one acoustic model, anddetermining, based on a result of the evaluating, whether to trigger presentation of an alert to a user via a user interface.
2 Assignments
1 Petition
Accused Products
Abstract
In some embodiments, a recognition result produced by a speech processing system based on an analysis of a speech input is evaluated for indications of potential errors. In some embodiments, sets of words/phrases that may be acoustically similar or otherwise confusable, the misrecognition of which can be significant in the domain, may be used together with a language model to evaluate a recognition result to determine whether the recognition result includes such an indication. In some embodiments, a word/phrase of a set that appears in the result is iteratively replaced with each of the other words/phrases of the set. The result of the replacement may be evaluated using a language model to determine a likelihood of the newly-created string of words appearing in a language and/or domain. The likelihood may then be evaluated to determine whether the result of the replacement is sufficiently likely for an alert to be triggered.
39 Citations
20 Claims
-
1. A method of processing a result of a recognition by an automatic speech recognition (ASR) system on a speech input, the method comprising:
-
determining whether the result includes a member of a set of words or phrases, the set of words or phrases comprising a plurality of members and each member of the set comprising a word or phrase; and when it is determined that the result includes a word or phrase of a first member of the set, producing a modified result by substituting a word or phrase of a second member of the set for the word or phrase of the first member in the result, evaluating the modified result using at least one language model and/or at least one acoustic model, and determining, based on a result of the evaluating, whether to trigger presentation of an alert to a user via a user interface. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. At least one non-transitory computer-readable storage medium having encoded thereon computer-executable instructions that, when executed by at least one computer, cause the at least one computer to carry out a method of processing a result of a recognition by an automatic speech recognition (ASR) system on a speech input, the method comprising:
-
determining whether the result includes a member of a set of words or phrases, the set of words or phrases comprising a plurality of members and each member of the set comprising a word or phrase; and when it is determined that the result includes a word or phrase of a first member of the set, producing a modified result by substituting a word or phrase of a second member of the set for the word or phrase of the first member in the result, evaluating the modified result using at least one language model and/or at least one acoustic model, and determining, based on a result of the evaluating, whether to trigger presentation of an alert to a user via a user interface. - View Dependent Claims (9, 10, 11)
-
-
12. An apparatus comprising:
-
at least one processor; and at least one storage medium having encoded thereon processor-executable instructions that, when executed by the at least one processor, cause the at least one processor to carry out a method of processing a result of a recognition by an automatic speech recognition (ASR) system on a speech input, the method comprising; determining whether the result includes a member of a set of words or phrases, the set of words or phrases comprising a plurality of members and each member of the set comprising a word or phrase; and when it is determined that the result includes a word or phrase of a first member of the set, producing a modified result by substituting a word or phrase of a second member of the set for the word or phrase of the first member in the result, evaluating the modified result using at least one language model and/or at least one acoustic model, and determining, based on a result of the evaluating, whether to trigger presentation of an alert to a user via a user interface. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20)
-
Specification