Interactive speech recognition combining speaker-independent and speaker-specific word recognition, and having a response-creation capability
First Claim
1. An interactive speech recognition apparatus, comprising:
- a voice input unit to receive voice and translate the received voice into digital form;
a voice analysis unit in communication with said voice input unit to generate characteristic voice data for the received digitized voice;
a non-specific speaker word identification unit in communication with said voice analysis unit to determine whether the characteristic voice data substantially matches standard characteristic voice information corresponding to pre-registered expressions and generate identified non-specific speaker expression data in response thereto;
a specific speaker word enrollment unit in communication with said voice analysis unit to register individual expressions spoken by a specific speaker and generate identified specific speaker expression data based on the characteristic voice data;
a speech recognition and dialogue management unit in communication with said non-specific speaker word identification unit and said specific speaker word enrollment unit to receive the identified non-specific speaker expression data and the identified specific speaker expression data respectively therefrom, for selecting one of said non-specific speaker word identification unit and said specific speaker word enrollment unit to recognize a meaning from the received voice based on the received identified expression data, and to formulate an appropriate response from existing response data corresponding to the recognized meaning;
a response creation function in communication with said speech recognition and dialogue management unit to enable a user to create response data based on speaker input; and
a voice synthesizer in communication with said speech recognition and dialogue management unit to generate synthesized audio corresponding to the appropriate response formulated by said speech recognition and dialogue management unit.
1 Assignment
0 Petitions
Accused Products
Abstract
A technique for improving speech recognition in low-cost, speech interactive devices. This technique calls for implementing a speaker-specific word enrollment and detection unit in parallel with a word detection unit to permit comprehension of spoken commands or messages issued by binary questions when no recognizable words are found. Preferably, specific speaker detection will be based on the speaker'"'"'s own personal list of words or expression. Other facets include complementing non-specific pre-registered word characteristic information with individual, speaker-specific verbal characteristics to improve recognition in cases where the speaker has unusual speech mannerisms or accent and response alteration in which speaker-specification registration functions are leveraged to provide access and permit changes to a predefined responses table according to user needs and tastes.
93 Citations
11 Claims
-
1. An interactive speech recognition apparatus, comprising:
-
a voice input unit to receive voice and translate the received voice into digital form; a voice analysis unit in communication with said voice input unit to generate characteristic voice data for the received digitized voice; a non-specific speaker word identification unit in communication with said voice analysis unit to determine whether the characteristic voice data substantially matches standard characteristic voice information corresponding to pre-registered expressions and generate identified non-specific speaker expression data in response thereto; a specific speaker word enrollment unit in communication with said voice analysis unit to register individual expressions spoken by a specific speaker and generate identified specific speaker expression data based on the characteristic voice data; a speech recognition and dialogue management unit in communication with said non-specific speaker word identification unit and said specific speaker word enrollment unit to receive the identified non-specific speaker expression data and the identified specific speaker expression data respectively therefrom, for selecting one of said non-specific speaker word identification unit and said specific speaker word enrollment unit to recognize a meaning from the received voice based on the received identified expression data, and to formulate an appropriate response from existing response data corresponding to the recognized meaning; a response creation function in communication with said speech recognition and dialogue management unit to enable a user to create response data based on speaker input; and a voice synthesizer in communication with said speech recognition and dialogue management unit to generate synthesized audio corresponding to the appropriate response formulated by said speech recognition and dialogue management unit. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. An interactive speech recognition method, comprising the steps of:
-
perceiving voice; translating the perceived voice into corresponding digital form; generating characteristic voice data for the perceived digitized voice; enrolling specific speaker expressions spoken by a specific speaker; determining whether the characteristic voice data generated in said characteristic voice data generating step substantially matches standard characteristic voice information corresponding to at least one of; the specific speaker expressions enrolled in said specific speaker expression enrolling step; and non-specific speaker expressions prestored in memory; generating identified expression data if it is determined in said determining step that the characteristic voice data generated in said characteristic voice data generating step substantially matches standard characteristic voice information by selecting one member of the group consisting of the specific-speaker and non-specific speaker expressions; assimilating a contextual meaning based on the identified expression data generated in said identified expression data generating step; formulating an appropriate response from existing response data based on said assimilated contextual meaning; enabling a user to create response data based on speaker input; and synthesizing audio corresponding to the appropriate formulated response. - View Dependent Claims (9, 10, 11)
-
Specification