Speech recognition system, speech recognition method, speech synthesis system, speech synthesis method, and program product having increased accuracy
First Claim
1. A speech recognition system, comprising:
- a sound signal processor configured to acquire a sound signal from an object, and to calculate a sound signal parameter based on the acquired sound signal;
an electromyographic signal processor configured to acquire potential changes on a surface of the object as an electromyographic signal, and to calculate an electromyographic signal parameter based on the acquired electromyographic signal;
an image information processor configured to acquire image information by taking an image of the object, and to calculate an image information parameter based on the acquired image information;
a speech recognizer configured to recognize a speech signal vocalized by the object, based on the sound signal parameter, the electromyographic signal parameter, and the image information parameter, whereinthe speech recognizer includes a hierarchical network in which a plurality of non-linear components including an input unit and an output unit are located from upstream to downstream hierarchically;
the output unit of the upstream non-linear component is connected to the input unit of the downstream non-linear component within adjacent non-linear components;
a weight value is assigned to the connection or a combination of the connections,each of the non-linear components is configured to calculate data which is outputted from the output unit and to determine the connection to which the calculated data is outputted, in accordance with data inputted to the input unit and the weight value assigned to the connection or the combination,the sound signal parameter, the electromyographic signal parameter, and the image information parameter are inputted to the most upstream non-linear components in the hierarchical network as the inputted data,the recognized speech signals are outputted from the output unit of the most downstream non-linear components in the hierarchical network as the outputted data; and
the speech recognizer recognizes the speech signal based on the outputted data; and
a recognition result provider configured to provide a result recognized by the speech recognizer.
1 Assignment
0 Petitions
Accused Products
Abstract
The object of the present invention is to keep a high success rate in recognition with a low-volume of sound signal, without being affected by noise.
The speech recognition system comprises a sound signal processor 10 configured to acquire a sound signal, and to calculate a sound signal parameter based on the acquired sound signal; an electromyographic signal processor 13 configured to acquire potential changes on a surface of the object as an electromyographic signal, and to calculate an electromyographic signal parameter based on the acquired electromyographic signal; an image information processor 16 configured to acquire image information by taking an image of the object, and to calculate an image information parameter based on the acquired image information; a speech recognizer 20 configured to recognize a speech signal vocalized by the object, based on the sound signal parameter, the electromyographic signal parameter and the image information parameter; and a recognition result provider 21 configured to provide a result recognized by the speech recognizer 20.
-
Citations
13 Claims
-
1. A speech recognition system, comprising:
-
a sound signal processor configured to acquire a sound signal from an object, and to calculate a sound signal parameter based on the acquired sound signal; an electromyographic signal processor configured to acquire potential changes on a surface of the object as an electromyographic signal, and to calculate an electromyographic signal parameter based on the acquired electromyographic signal; an image information processor configured to acquire image information by taking an image of the object, and to calculate an image information parameter based on the acquired image information; a speech recognizer configured to recognize a speech signal vocalized by the object, based on the sound signal parameter, the electromyographic signal parameter, and the image information parameter, wherein the speech recognizer includes a hierarchical network in which a plurality of non-linear components including an input unit and an output unit are located from upstream to downstream hierarchically; the output unit of the upstream non-linear component is connected to the input unit of the downstream non-linear component within adjacent non-linear components; a weight value is assigned to the connection or a combination of the connections, each of the non-linear components is configured to calculate data which is outputted from the output unit and to determine the connection to which the calculated data is outputted, in accordance with data inputted to the input unit and the weight value assigned to the connection or the combination, the sound signal parameter, the electromyographic signal parameter, and the image information parameter are inputted to the most upstream non-linear components in the hierarchical network as the inputted data, the recognized speech signals are outputted from the output unit of the most downstream non-linear components in the hierarchical network as the outputted data; and the speech recognizer recognizes the speech signal based on the outputted data; and a recognition result provider configured to provide a result recognized by the speech recognizer. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A speech recognition method, comprising:
-
acquiring a sound signal from an object, and calculating a sound signal parameter based on the acquired sound signal; acquiring potential changes on a surface of the object as an electromyographic signal, and calculating an electromyographic signal parameter based on the acquired electromyographic signal; acquiring image information by taking an image of the object, and calculating an image information parameter based on the acquired image information; recognizing a speech signal vocalized by the object using a speech recognizer, based on the sound signal parameter, the electromyographic signal parameter, and the image information parameter, the speech recognizer including a hierarchical network in which a plurality of non-linear components including an input unit and an output unit are located from upstream to downstream hierarchically, wherein recognizing a speech signal vocalized by the object includes connecting the output unit of the upstream non-linear component to the input unit of the downstream non-linear component within adjacent non-linear components, assigning a weight value to the connection or a combination of the connections, calculating data which is outputted from the output unit and determining the connection to which the calculated data is outputted with each of the non-linear components, in accordance with data inputted to the input unit and the weight value assigned to the connection or the combination, inputting the sound signal parameter, the electromyographic signal parameter, and the image information parameter to the most upstream non-linear components in the hierarchical network as the inputted data, outputting the recognized speech signals from the output unit of the most downstream non-linear components in the hierarchical network as the outputted data, and recognizing the speech signal based on the outputted data; and providing a result recognized by the recognizing.
-
-
13. A computer readable medium encoded with computer executable instructions, wherein the instructions, when executed by a processor, cause the processor to perform a method, comprising:
-
acquiring a sound signal from an object, and calculating a sound signal parameter based on the acquired sound signal; acquiring potential changes on a surface of the object as an electromyographic signal, and calculating an electromyographic signal parameter based on the acquired electromyographic signal; acquiring image information by taking an image of the object, and calculating an image information parameter based on the acquired image information; recognizing a speech signal vocalized by the object using a speech recognizer, based on the sound signal parameter, the electromyographic signal parameter, and the image information parameter, the speech recognizer including a hierarchical network in which a plurality of non-linear components including an input unit and an output unit are located from upstream to downstream hierarchically, wherein recognizing a speech signal vocalized by the object includes connecting the output unit of the upstream non-linear component to the input unit of the downstream non-linear component within adjacent non-linear components, assigning a weight value to the connection or a combination of the connections, calculating data which is outputted from the output unit and determining the connection to which the calculated data is outputted with each of the non-linear components, in accordance with data inputted to the input unit and the weight value assigned to the connection or the combination, inputting the sound signal parameter, the electromyographic signal parameter, and the image information parameter to the most upstream non-linear components in the hierarchical network as the inputted data, outputting the recognized speech signals from the output unit of the most downstream non-linear components in the hierarchical network as the outputted data, and recognizing the speech signal based on the outputted data; and providing a result recognized by the recognizing.
-
Specification