Reducing false positives in speech recognition systems
First Claim
Patent Images
1. A method comprising:
- receiving, by a computing device, a spoken utterance;
performing, by the computing device, speech recognition processing on the spoken utterance and generating a recognition result;
determining, by the computing device, consistency of duration of component sounds of the recognition result, the determining comprising;
calculating a speaker rate by dividing a total duration of the spoken utterance by a sum of expected durations for the component sounds of the recognition result;
for each component sound;
calculating a modified expected duration by multiplying the component sound'"'"'s expected duration by the speaker rate; and
calculating a delta value corresponding to a difference between the component sound'"'"'s duration in the spoken utterance and the component sound'"'"'s modified expected duration; and
calculating a duration consistency score by taking a sum of squares of the delta values and dividing the sum by the total number of component sounds; and
validating, by the computing device, the recognition result based on the duration consistency score.
1 Assignment
0 Petitions
Accused Products
Abstract
Embodiments of the present invention improve methods of performing speech recognition. In one embodiment, the present invention includes a method comprising receiving a spoken utterance, processing the spoken utterance in a speech recognizer to generate a recognition result, determining consistencies of one or more parameters of component sounds of the spoken utterance, wherein the parameters are selected from the group consisting of duration, energy, and pitch, and wherein each component sound of the spoken utterance has a corresponding value of said parameter, and validating the recognition result based on the consistency of at least one of said parameters.
25 Citations
18 Claims
-
1. A method comprising:
-
receiving, by a computing device, a spoken utterance; performing, by the computing device, speech recognition processing on the spoken utterance and generating a recognition result; determining, by the computing device, consistency of duration of component sounds of the recognition result, the determining comprising; calculating a speaker rate by dividing a total duration of the spoken utterance by a sum of expected durations for the component sounds of the recognition result; for each component sound; calculating a modified expected duration by multiplying the component sound'"'"'s expected duration by the speaker rate; and calculating a delta value corresponding to a difference between the component sound'"'"'s duration in the spoken utterance and the component sound'"'"'s modified expected duration; and calculating a duration consistency score by taking a sum of squares of the delta values and dividing the sum by the total number of component sounds; and validating, by the computing device, the recognition result based on the duration consistency score. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A system comprising:
-
a processor; and a non-transitory computer readable storage medium having stored thereon program code that, when executed by the processor, causes the processor to; receive a spoken utterance; perform speech recognition processing on the spoken utterance and generate a recognition result; determine consistency of duration of component sounds of the recognition result, the determining comprising; calculating a speaker rate by dividing a total duration of the spoken utterance by a sum of expected durations for the component sounds of the recognition result; for each component sound; calculating a modified expected duration by multiplying the component sound'"'"'s expected duration by the speaker rate; and calculating a delta value corresponding to a difference between the component sound'"'"'s duration in the spoken utterance and the component sound'"'"'s modified expected duration; and calculating a duration consistency score by taking a sum of squares of the delta values and dividing the sum by the total number of component sounds; and validate the recognition result based on the duration consistency score. - View Dependent Claims (10, 11, 12, 13)
-
-
14. A non-transitory computer readable storage medium having stored thereon program code executable by a processor, the program code comprising:
-
code that causes the processor to receive a spoken utterance; code that causes the processor to perform speech recognition processing on the spoken utterance and generate a recognition result; code that causes the processor to determine consistency of duration of component sounds of the recognition result, the determining comprising; calculating a speaker rate by dividing a total duration of the spoken utterance by a sum of expected durations for the component sounds of the recognition result; for each component sound; calculating a modified expected duration by multiplying the component sound'"'"'s expected duration by the speaker rate; and calculating a delta value corresponding to a difference between the component sound'"'"'s duration in the spoken utterance and the component sound'"'"'s modified expected duration; and calculating a duration consistency score by taking a sum of squares of the delta values and dividing the sum by the total number of component sounds; and code that causes the processor to validate the recognition result based on the duration consistency score. - View Dependent Claims (15, 16, 17, 18)
-
Specification