Voice recognition apparatus
First Claim
Patent Images
1. A speech recognition apparatus for recognizing speech uttered by an operator, comprising:
- a portion for performing a speech recognition process on a voice signal corresponding to said speech to thereby acquire vocal phrase data indicating the uttered phrase;
a portion for detecting a point of time when said operator has started uttering said speech and a point of time when said operator has ended uttering said speech on the basis of a signal level of said voice signal to thereby output first utterance duration information;
a portion for capturing a mouth of said operator to acquire mouth image data;
a portion for detecting a point of time when said operator has started uttering said speech and a point of time when said operator has ended uttering said speech on the basis of said mouth image data to thereby output second utterance duration information; and
a controller for outputting said vocal phrase data as long as said first utterance duration information is approximate to said second utterance duration information.
2 Assignments
0 Petitions
Accused Products
Abstract
Disclosed is a voice recognition apparatus which can prevent an erroneous manipulation due to erroneous voice recognition from being carried out even in a noisy environment. As long as a duration of utterance acquired based on the level of a voice signal uttered by an operator (user) approximately coincides with a duration of utterance acquired based on mouth image data acquired by capturing the mouth of the operator, the voice recognition apparatus outputs vocal-manipulation phrase data as the result of voice recognition.
10 Citations
13 Claims
-
1. A speech recognition apparatus for recognizing speech uttered by an operator, comprising:
-
a portion for performing a speech recognition process on a voice signal corresponding to said speech to thereby acquire vocal phrase data indicating the uttered phrase;
a portion for detecting a point of time when said operator has started uttering said speech and a point of time when said operator has ended uttering said speech on the basis of a signal level of said voice signal to thereby output first utterance duration information;
a portion for capturing a mouth of said operator to acquire mouth image data;
a portion for detecting a point of time when said operator has started uttering said speech and a point of time when said operator has ended uttering said speech on the basis of said mouth image data to thereby output second utterance duration information; and
a controller for outputting said vocal phrase data as long as said first utterance duration information is approximate to said second utterance duration information.
-
-
2. A speech recognition apparatus for recognizing speech uttered by an operator to thereby acquire vocal phrase data representing a phrase indicated by said speech, comprising:
-
a portion for performing a speech recognition process on a voice signal corresponding to said speech to thereby acquire a plurality of vocal phrase data candidates;
a portion for detecting a point of time when said operator has started uttering said speech and a point of time when said operator has ended uttering said speech on the basis of a signal level of said voice signal to thereby generate first utterance duration information;
a portion for capturing a mouth of said operator to acquire mouth image data;
a portion for detecting a point of time when said operator has started uttering said speech and a point of time when said operator has ended uttering said speech on the basis of said mouth image data to thereby generate second utterance duration information;
a portion for counting the number of changes in a shape of said mouth in a duration of utterance indicated by said second utterance duration information on the basis of said mouth image data to thereby generate number-of-mouth-shape-change information; and
a portion for selecting that one of said vocal phrase data candidates which has a count of changes in said mouth equal to the count indicated by said number-of-mouth-shape-changes information and outputting said selected vocal phrase data candidate as said vocal phrase data, as long as said first utterance duration information is approximate to said second utterance duration information.
-
-
3. A speech recognition apparatus for recognizing words uttered by a speaker, comprising:
-
a first detection circuit which detects a talk start time and a talk end time of the speaker on the basis of a speech signal, and thereafter outputs first utterance duration information;
a second detection circuit which detects a talk start time and a talk end time of the speaker on the basis of mouth image data, and thereafter outputs second utterance duration information; and
a controller which receives the outputted first and second utterance duration information and compares at least a portion of the first utterance duration information to at least a portion of the second utterance duration information. - View Dependent Claims (4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
a processing circuit which determines the number of mouth shape changes of the speaker on the basis of the speech signal and thereafter outputs first mouth shape change information to the controller; and
an analyzing circuit which determines the number of mouth shape changes of the speaker on the basis of the mouth image data and thereafter outputs second mouth shape change information to the controller.
-
-
5. A speech recognition apparatus for recognizing words uttered by a speaker according to claim 4, wherein, when the controller determines that the first utterance duration information and second utterance duration information have a certain relationship, the controller compares the first mouth shape change information to the second mouth shape change information.
-
6. A speech recognition apparatus for recognizing words uttered by a speaker according to claim 5, wherein, when the controller determines that the first mouth-shape change information and the second mouth shape change information do not have a certain relationship, the controller outputs a signal requesting the speaker to reutter the words.
-
7. A speech recognition apparatus for recognizing words uttered by a speaker according to claim 5, wherein, when the controller determines that the first utterance duration information and the second utterance duration information do not have a certain relationship, the controller outputs a signal requesting the speaker to reutter the words.
-
8. A speech recognition apparatus for recognizing words uttered by a speaker according to claim 5, further comprising a circuit which acquires vocal phrase data corresponding to the words uttered by the speaker.
-
9. A speech recognition apparatus for recognizing words uttered by a speaker according to claim 8, wherein, when the controller determines that the first mouth shape change information and the second utterance duration information have a certain relationship, the controller outputs said vocal phrase data.
-
10. A speech recognition apparatus for recognizing words uttered by a speaker according to claim 9, wherein, when the controller determines that the first mouth shape change information and the second utterance duration information do not have a certain relationship, the controller outputs a signal requesting the speaker to reutter the words.
-
11. A speech recognition apparatus for recognizing words uttered by a speaker according to claim 3, wherein, when the controller determines that the first utterance duration information and the second utterance duration information do not have a certain relationship, the controller outputs a signal requesting the speaker to reutter the words.
-
12. A speech recognition apparatus for recognizing words uttered by a speaker according to claim 3, further comprising a circuit which acquires vocal phrase data corresponding to the words uttered by the speaker.
-
13. A speech recognition apparatus for recognizing words uttered by a speaker according to claim 12, wherein, when the controller determines that the first utterance duration information and the second utterance duration information have a certain relationship, the controller outputs said vocal phrase data.
Specification