Robotics visual and auditory system

US 20090030552A1
Filed: 02/12/2003
Published: 01/29/2009
Est. Priority Date: 12/17/2002
Status: Abandoned Application

First Claim

Patent Images

1. A robotics visual and auditory system comprising;

a plurality of acoustic models,a speech recognition engine for executing speech recognition processes to separated sound signals from respective sound sources by using the acoustic models, anda selector for integrating a plurality of speech recognition process results obtained by the speech recognition process, and selecting any one of speech recognition process results,wherein, in order to respond the case where a plurality of speakers speak to said robot from different directions with the robot'"'"'s front direction as the base, the acoustic models are provided with respect to each speaker and each direction so to respond each direction,wherein the speech recognition engine uses each of said acoustic models separately for one sound signal separated by sound source separation, and executes said speech recognition process in parallel.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

It is a robotics visual and auditory system provided with an auditory module (20), a face module (30), a stereo module (37), a motor control module (40), and an association module (50) to control these respective modules. The auditory module (20) collects sub-bands having interaural phase difference (IPD) or interaural intensity difference (IID) within a predetermined range by an active direction pass filter (23a) having a pass range which, according to auditory characteristics, becomes minimum in the frontal direction, and larger as the angle becomes wider to the left and right, based on an accurate sound source directional information from the association module (50), and conducts sound source separation by restructuring a wave shape of a sound source, conducts speech recognition of separated sound signals from respective sound sources using a plurality of acoustic models (27d), integrates speech recognition results from each acoustic model by a selector, and judges the most reliable speech recognition result among the speech recognition results.

164 Citations

13 Claims

1. A robotics visual and auditory system comprising;
- a plurality of acoustic models,a speech recognition engine for executing speech recognition processes to separated sound signals from respective sound sources by using the acoustic models, anda selector for integrating a plurality of speech recognition process results obtained by the speech recognition process, and selecting any one of speech recognition process results,wherein, in order to respond the case where a plurality of speakers speak to said robot from different directions with the robot'"'"'s front direction as the base, the acoustic models are provided with respect to each speaker and each direction so to respond each direction,wherein the speech recognition engine uses each of said acoustic models separately for one sound signal separated by sound source separation, and executes said speech recognition process in parallel.
- View Dependent Claims (2, 3)
- - 2. A robotics visual and auditory system as set forth in claim 1, wherein the selector calculates the cost function value, upon integrating the speech recognition process result, based on the recognition result by the speech recognition process and the speaker'"'"'s direction, and judges the speech recognition process result having the maximum value of the cost function as the most reliable speech recognition result.
  - 3. A robotics visual and auditory system as set forth in claim 1 or claim 2, wherein it is provided with a dialogue part to output the speech recognition process results selected by the selector to outside.

4. A robotics visual and auditory system comprising;
- an auditory module which is provided at least with a pair of microphones to collect external sounds, and, based on sound signals from the microphones, determines a direction of at least one speaker by sound source separation and localization by grouping based on pitch extraction and harmonic sounds,a face module which is provided a camera to take images of a robot'"'"'s front, identifies each speaker, and extracts his face event from each speaker'"'"'s face recognition and localization, based on images taken by the camera,a motor control module which is provided with a drive motor to rotate the robot in the horizontal direction, and extracts motor event, based on a rotational position of the drive motor,an association module which determines each speaker'"'"'s direction, based on directional information of sound source localization of the auditory event and face localization of the face event, from said auditory, face, and motor events, generates an auditory stream and a face stream by connecting said events in the temporal direction using a Kalman filter for determinations, and further generates an association stream associating these streams, andan attention control module which conduct an attention control based on said streams, and drive-controls the motor based on an action planning results accompanying the attention control,in order for the auditory module to respond the case where a plurality of speakers speak to said robot from different directions with the robot'"'"'s front direction as the base, acoustic models are provided in each direction so to respond each speaker, and each direction,wherein the auditory module collects sub-bands having interaural phase difference (IPD) or interaural intensity difference (IID) within a predetermined range by an active direction pass filter having a pass range which, according to auditory characteristics, becomes minimum in the frontal direction, and larger as the angle becomes wider to the left and right, based on an accurate sound source directional information from the association module, and conducts sound source separation by restructuring a wave shape of a sound source, conducts speech recognition in parallel for one sound signal separated by sound source separation using a plurality of the acoustic models, integrates speech recognition results from each acoustic model by a selector, and judges the most reliable speech recognition result among the speech recognition results.
- View Dependent Claims (6, 7, 10, 11, 12, 13)
- - 6. A robotics visual and auditory system as set forth in claim 4 or claim 5, characterized in that;
    - when the speech recognition by the auditory module failed, the attention control module is made up as to collect speeches again from the microphones with the microphones and the camera turned to the sound source direction of the sound signals, and to perform again speech recognition of the speech by the auditory module, based on the sound signals conducted sound source localization and sound source separation.
  - 7. A robotics visual and auditory system as set forth in claim 4 or claim 5, characterized in that;
    - the auditory module refers to the face event from the face module upon performing the speech recognition.
  - 10. A robotics visual and auditory system as set forth in claim 4 or claim 5, wherein it is provided with a dialogue part to output the speech recognition results judged by the auditory module to outside.
  - 11. A robotics visual and auditory system as set forth in claim 4 or claim 5, wherein a pass range of the active direction pass filter can be controlled for each frequency.
  - 12. A robotics visual and auditory system as set forth in claim 4 or claim 5, wherein the selector calculates the cost function value, upon integrating the speech recognition result, based on the recognition result by the speech recognition and the direction determined by the association module, and judges the speech recognition process result having the maximum value of the cost function as the most reliable speech recognition result.
  - 13. A robotics visual and auditory system as set forth in claim 4 or claim 5, characterized in that;
    - it recognizes the speaker'"'"'s name based on the acoustic model utilized to obtain speech recognition result.

5. A robotics visual and auditory system comprising;
- an auditory module which is provided at least with a pair of microphones to collect external sounds, and, based on sound signals from the microphones, determines a direction of at least one speaker by sound source separation and localization by grouping based on pitch extraction and harmonic sounds,a face module which is provided a camera to take images of a robot'"'"'s front, identifies each speaker, and extracts his face event from each speaker'"'"'s face recognition and localization, based on images taken by the camera,a stereo module which extracts and localizes a longitudinally long matter, based on a parallax extracted from images taken by a stereo camera, and extracts stereo event,a motor control module which is provided with a drive motor to rotate the robot in the horizontal direction, and extracts motor event, based on a rotational position of the drive motor,an association module which determines each speaker'"'"'s direction, based on directional information of sound source localization of the auditory event and face localization of the face event, from said auditory, face, stereo, and motor events, generates an auditory stream, a face stream and a stereo visual stream by connecting said events in the temporal direction using a Kalman filter for determinations, and further generates an association stream associating these streams, andan attention control module which conduct an attention control based on said streams, and drive-controls the motor based on an action planning results accompanying the attention control,in order for the auditory module to respond the case where a plurality of speakers speak to said robot from different directions with the robot'"'"'s front direction as the base, acoustic models are provided in each direction so to respond each speaker, and each direction,wherein the auditory module collects sub-bands having interaural phase difference (IPD) or interaural intensity difference (IID) within a predetermined range by an active direction pass filter having a pass range which, according to auditory characteristics, becomes minimum in the frontal direction, and larger as the angle becomes wider to the left and right, based on an accurate sound source directional information from the association module, and conducts sound source separation by restructuring a wave shape of a sound source, conducts speech recognition in parallel for one sound signal separated by sound source separation using a plurality of the acoustic models, integrates speech recognition results from each acoustic model by a selector, and judges the most reliable speech recognition result among the speech recognition results.
- View Dependent Claims (8, 9)
- - 8. A robotics visual and auditory system as set forth in claim 5, characterized in that;
    - the auditory module refers to the stereo event from the stereo module upon performing the speech recognition.
  - 9. A robotics visual and auditory system as set forth in claim 5, characterized in that;
    - the auditory module refers to the face event from the face module and the stereo event from the stereo module upon performing the speech recognition.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Japan Science & Technology Trading Co. Limited
Original Assignee
Japan Science & Technology Trading Co. Limited
Inventors
Okuno, Hiroshi, Nakadai, Kazuhiro, Kitano, Hiroaki

Application Number

US10/539,047
Publication Number

US 20090030552A1
Time in Patent Office

Days
Field of Search
US Class Current

700/258
CPC Class Codes

G06N 3/008   based on physical entities ...

G10L 15/28   Constructional details of s...

G10L 2015/228   of application context

G10L 2021/02166   Microphone arrays; Beamforming

G10L 21/028   using properties of sound s...

Robotics visual and auditory system

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

164 Citations

13 Claims

Specification

Solutions

Use Cases

Quick Links

Robotics visual and auditory system

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

164 Citations

13 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links