Speech recognition system, speech recognition method, speech synthesis system, speech synthesis method, and program product having increased accuracy

US 7,369,991 B2
Filed: 03/04/2003
Issued: 05/06/2008
Est. Priority Date: 03/04/2002
Status: Active Grant

First Claim

Patent Images

1. A speech recognition system, comprising:

a sound signal processor configured to acquire a sound signal from an object, and to calculate a sound signal parameter based on the acquired sound signal;

an electromyographic signal processor configured to acquire potential changes on a surface of the object as an electromyographic signal, and to calculate an electromyographic signal parameter based on the acquired electromyographic signal;

an image information processor configured to acquire image information by taking an image of the object, and to calculate an image information parameter based on the acquired image information;

a speech recognizer configured to recognize a speech signal vocalized by the object, based on the sound signal parameter, the electromyographic signal parameter, and the image information parameter, whereinthe speech recognizer includes a hierarchical network in which a plurality of non-linear components including an input unit and an output unit are located from upstream to downstream hierarchically;

the output unit of the upstream non-linear component is connected to the input unit of the downstream non-linear component within adjacent non-linear components;

a weight value is assigned to the connection or a combination of the connections,each of the non-linear components is configured to calculate data which is outputted from the output unit and to determine the connection to which the calculated data is outputted, in accordance with data inputted to the input unit and the weight value assigned to the connection or the combination,the sound signal parameter, the electromyographic signal parameter, and the image information parameter are inputted to the most upstream non-linear components in the hierarchical network as the inputted data,the recognized speech signals are outputted from the output unit of the most downstream non-linear components in the hierarchical network as the outputted data; and

the speech recognizer recognizes the speech signal based on the outputted data; and

a recognition result provider configured to provide a result recognized by the speech recognizer.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The object of the present invention is to keep a high success rate in recognition with a low-volume of sound signal, without being affected by noise.

The speech recognition system comprises a sound signal processor 10 configured to acquire a sound signal, and to calculate a sound signal parameter based on the acquired sound signal; an electromyographic signal processor 13 configured to acquire potential changes on a surface of the object as an electromyographic signal, and to calculate an electromyographic signal parameter based on the acquired electromyographic signal; an image information processor 16 configured to acquire image information by taking an image of the object, and to calculate an image information parameter based on the acquired image information; a speech recognizer 20 configured to recognize a speech signal vocalized by the object, based on the sound signal parameter, the electromyographic signal parameter and the image information parameter; and a recognition result provider 21 configured to provide a result recognized by the speech recognizer 20.

Citations

13 Claims

1. A speech recognition system, comprising:
- a sound signal processor configured to acquire a sound signal from an object, and to calculate a sound signal parameter based on the acquired sound signal;
  
  an electromyographic signal processor configured to acquire potential changes on a surface of the object as an electromyographic signal, and to calculate an electromyographic signal parameter based on the acquired electromyographic signal;
  
  an image information processor configured to acquire image information by taking an image of the object, and to calculate an image information parameter based on the acquired image information;
  
  a speech recognizer configured to recognize a speech signal vocalized by the object, based on the sound signal parameter, the electromyographic signal parameter, and the image information parameter, whereinthe speech recognizer includes a hierarchical network in which a plurality of non-linear components including an input unit and an output unit are located from upstream to downstream hierarchically;
  
  the output unit of the upstream non-linear component is connected to the input unit of the downstream non-linear component within adjacent non-linear components;
  
  a weight value is assigned to the connection or a combination of the connections,each of the non-linear components is configured to calculate data which is outputted from the output unit and to determine the connection to which the calculated data is outputted, in accordance with data inputted to the input unit and the weight value assigned to the connection or the combination,the sound signal parameter, the electromyographic signal parameter, and the image information parameter are inputted to the most upstream non-linear components in the hierarchical network as the inputted data,the recognized speech signals are outputted from the output unit of the most downstream non-linear components in the hierarchical network as the outputted data; and
  
  the speech recognizer recognizes the speech signal based on the outputted data; and
  
  a recognition result provider configured to provide a result recognized by the speech recognizer.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. The speech recognition system according to claim 1, wherein the speech recognizer is configured to recognize the speech signal based on each of the sound signal parameter, the electromyographic signal parameter, and the image information parameter, to compare each of the recognized speech signals, and to recognize the speech signal based on the compared result.
  - 3. The speech recognition system according to claim 1, wherein the speech recognizer is configured to recognize the speech signal using the sound signal parameter, the electromyographic signal parameter, and the image information parameter simultaneously.
  - 4. The speech recognition system according to claim 1, wherein the speech recognizer includes a learning function configured to change the weight assigned to the non-linear components by inputting sampling data which is transferred from downstream to upstream.
  - 5. The speech recognition system according to claim 1, the system further comprising:
    - a positioning device and a holding device;
      
      whereinthe sound signal processor includes a microphone configured to acquire the sound signal from a sound source;
      
      the electromyographic signal processor includes electrodes configured to acquire the potential changes on a surface around the sound source as the electromyographic signal;
      
      the image information processor includes a camera configured to acquire the image information by taking an image of the motion of the sound source;
      
      the positioning device fixes the microphone and the electrodes adjacent to the sound source; and
      
      the holding device holds the camera and the positioning device.
  - 6. The speech recognition system according to claim 1, wherein the speech recognizer is configured to recognize a predetermined phoneme or pattern, and to recognize the speech signal based only on the sound signal parameter when the predetermined phoneme or pattern is recognized.
  - 7. The speech recognition system according to claim 1, wherein the speech recognizer is configured to recognize a predetermined phoneme or pattern, and to ignore the electromyographic signal parameter when the predetermined phoneme or pattern is recognized.
  - 8. The speech recognition system according to claim 1, wherein the speech recognizer is configured to recognize the speech signal based only on the sound signal parameter, when speech based on the sound signal parameter is recognized above a predetermined level.
  - 9. The speech recognition system according to claim 1, whereinthe sound signal processor includes a microphone configured to acquire the sound signal from a sound source, the microphone configured to communicate with a communications device;
    - the electromyographic signal processor includes electrodes configured to acquire the potential changes on a surface around the sound source as the electromyographic signal, the electrodes being installed on a surface of the communications device;
      
      the image information processor includes a camera configured to acquire the image information by taking an image of the motion of the sound source, the camera being installed on a terminal separated from the communications device; and
      
      the communications device is configured to transmit and to receive data from the terminal.
  - 10. The speech recognition system according to claim 9, whereinthe terminal includes a body on which the camera is installed, and a belt for fixing the body;
    - andthe recognition result provider is a display configured to display the result, the display being installed on a surface of the body.
  - 11. The speech recognition system according to claim 9, wherein the recognition result provider is configured to display the result in a translucent display, the recognition result provider being installed in the holding device.

12. A speech recognition method, comprising:
- acquiring a sound signal from an object, and calculating a sound signal parameter based on the acquired sound signal;
  
  acquiring potential changes on a surface of the object as an electromyographic signal, and calculating an electromyographic signal parameter based on the acquired electromyographic signal;
  
  acquiring image information by taking an image of the object, and calculating an image information parameter based on the acquired image information;
  
  recognizing a speech signal vocalized by the object using a speech recognizer, based on the sound signal parameter, the electromyographic signal parameter, and the image information parameter, the speech recognizer including a hierarchical network in which a plurality of non-linear components including an input unit and an output unit are located from upstream to downstream hierarchically, wherein recognizing a speech signal vocalized by the object includesconnecting the output unit of the upstream non-linear component to the input unit of the downstream non-linear component within adjacent non-linear components,assigning a weight value to the connection or a combination of the connections,calculating data which is outputted from the output unit and determining the connection to which the calculated data is outputted with each of the non-linear components, in accordance with data inputted to the input unit and the weight value assigned to the connection or the combination,inputting the sound signal parameter, the electromyographic signal parameter, and the image information parameter to the most upstream non-linear components in the hierarchical network as the inputted data,outputting the recognized speech signals from the output unit of the most downstream non-linear components in the hierarchical network as the outputted data, andrecognizing the speech signal based on the outputted data; and
  
  providing a result recognized by the recognizing.

13. A computer readable medium encoded with computer executable instructions, wherein the instructions, when executed by a processor, cause the processor to perform a method, comprising:
- acquiring a sound signal from an object, and calculating a sound signal parameter based on the acquired sound signal;
  
  acquiring potential changes on a surface of the object as an electromyographic signal, and calculating an electromyographic signal parameter based on the acquired electromyographic signal;
  
  acquiring image information by taking an image of the object, and calculating an image information parameter based on the acquired image information;
  
  recognizing a speech signal vocalized by the object using a speech recognizer, based on the sound signal parameter, the electromyographic signal parameter, and the image information parameter, the speech recognizer including a hierarchical network in which a plurality of non-linear components including an input unit and an output unit are located from upstream to downstream hierarchically, wherein recognizing a speech signal vocalized by the object includesconnecting the output unit of the upstream non-linear component to the input unit of the downstream non-linear component within adjacent non-linear components,assigning a weight value to the connection or a combination of the connections,calculating data which is outputted from the output unit and determining the connection to which the calculated data is outputted with each of the non-linear components, in accordance with data inputted to the input unit and the weight value assigned to the connection or the combination,inputting the sound signal parameter, the electromyographic signal parameter, and the image information parameter to the most upstream non-linear components in the hierarchical network as the inputted data,outputting the recognized speech signals from the output unit of the most downstream non-linear components in the hierarchical network as the outputted data, andrecognizing the speech signal based on the outputted data; and
  
  providing a result recognized by the recognizing.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
NTT Docomo Incorporated (Nippon Telegraph and Telephone Corporation)
Original Assignee
NTT Docomo Incorporated (Nippon Telegraph and Telephone Corporation)
Inventors
Sugimura, Toshiaki, Hiraiwa, Akira, Manabe, Hiroyuki
Primary Examiner(s)
OPSASNICK, MICHAEL N

Application Number

US10/377,822
Publication Number

US 20030171921A1
Time in Patent Office

1,890 Days
Field of Search

704/236, 704/235
US Class Current

704/235
CPC Class Codes

G06F 18/256   of results relating to diff...

G06V 10/811   the classifiers operating o...

G06V 40/20   Movements or behaviour, e.g...

G10L 13/033   Voice editing, e.g. manipul...

G10L 15/24   Speech recognition using no...

G10L 2021/0135   Voice conversion or morphing

Speech recognition system, speech recognition method, speech synthesis system, speech synthesis method, and program product having increased accuracy

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

13 Claims

Specification

Solutions

Use Cases

Quick Links

Speech recognition system, speech recognition method, speech synthesis system, speech synthesis method, and program product having increased accuracy

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

13 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links