combined engine system and method for voice recognition
First Claim
Patent Images
1. A voice recognition system, comprising:
- an acoustic processor configured to extract speech parameters from digitized speech samples of an utterance;
a plurality of voice recognition engines coupled to the acoustic processor, each voice recognition engine configured to produce a plurality of hypotheses; and
decision logic configured to compare a most likely hypothesis of a first voice recognition engine to a second most likely hypothesis of the first voice recognition engine to form a first difference, delta 1;
compare a most likely hypothesis of the second voice recognition engine to a second most likely hypothesis of the second voice recognition engine to form a second difference, delta 2;
add delta 1 and delta 2 to form a delta sum; and
accept the most likely hypothesis of the first voice recognition engine if the most likely hypothesis of the first voice recognition engine is equal in likeliness to the most likely hypothesis of the first voice recognition engine and the delta sum is greater than a first predetermined threshold.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and system that combines voice recognition engines and resolves any differences between the results of individual voice recognition engines. A speaker independent (SI) Hidden Markov Model (HMM) engine, a speaker independent Dynamic Time Warping (DTW-SI) engine and a speaker dependent Dynamic Time Warping (DTW-SD) engine are combined. Combining and resolving the results of these engines results in a system with better recognition accuracy and lower rejection rates than using the results of only one engine.
93 Citations
15 Claims
-
1. A voice recognition system, comprising:
-
an acoustic processor configured to extract speech parameters from digitized speech samples of an utterance;
a plurality of voice recognition engines coupled to the acoustic processor, each voice recognition engine configured to produce a plurality of hypotheses; and
decision logic configured to compare a most likely hypothesis of a first voice recognition engine to a second most likely hypothesis of the first voice recognition engine to form a first difference, delta 1;
compare a most likely hypothesis of the second voice recognition engine to a second most likely hypothesis of the second voice recognition engine to form a second difference, delta 2;
add delta 1 and delta 2 to form a delta sum; and
accept the most likely hypothesis of the first voice recognition engine if the most likely hypothesis of the first voice recognition engine is equal in likeliness to the most likely hypothesis of the first voice recognition engine and the delta sum is greater than a first predetermined threshold. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A method for voice recognition, comprising:
-
extracting speech parameters with an acoustic processor from digitized speech samples of an utterance;
coupling a plurality of voice recognition engines to the acoustic processor; and
producing a plurality of hypotheses from each voice recognition engine;
comparing the most likely hypothesis of the first voice recognition engine to the second most likely hypothesis of the first voice recognition engine to form a first difference, delta 1;
comparing the most likely hypothesis of the second voice recognition engine to the second most likely hypothesis of the second voice recognition engine to form a second difference, delta 2;
adding delta 1 and delta 2 to form a delta sum; and
accepting the most likely hypothesis of the first voice recognition engine if the most likely hypothesis of the first voice recognition engine is equal in likeliness to the most likely hypothesis of the first voice recognition engine and the delta sum is greater than a first predetermined threshold. - View Dependent Claims (11, 12, 13, 14, 15)
comparing the most likely hypothesis of the first voice recognition engine to the second most likely hypothesis of the second voice recognition engine and, if the likeliness of the most likely hypothesis of the first voice recognition engine is equal to the likeliness of the second most likely hypothesis of the second voice recognition 2 engine and delta 1 is greater than a second predetermined threshold, accepting the most likely hypothesis of the fist voice recognition engine.
-
-
12. A method as in claim 11 wherein the most likely hypothesis of the first voice recognition engine is not equal in likeliness to the most likely hypothesis of the first voice recognition engine and/or the delta sum is not greater than a predetermined threshold, the method further comprising:
comparing the most likely hypothesis of the second voice recognition engine to the second most likely hypothesis of the first voice recognition engine and, if the likeliness of the most likely hypothesis of the second voice recognition engine is equal to the likeliness of the second most likely hypothesis of the first voice recognition 2 engine and delta 2 is greater than a third predetermined threshold, accepting the most likely hypothesis of the second voice recognition engine.
-
13. The method of claim 10 wherein the voice recognition engines are selected from the group consisting of speaker independent Dynamic Time Warping, speaker independent Hidden Markov Model, speaker dependent Dynamic Time Warping, speaker dependent Hidden Markov Model.
-
14. The method of claim 11 wherein the voice recognition engines are selected from the group consisting of speaker independent Dynamic Time Warping, speaker independent Hidden Markov Model, speaker dependent Dynamic Time Warping, speaker dependent Hidden Markov Model.
-
15. The method of claim 12 wherein the voice recognition engines are selected from the group consisting of speaker independent Dynamic Time Warping, speaker independent Hidden Markov Model, speaker dependent Dynamic Time Warping, speaker dependent Hidden Markov Model.
Specification