Robotics visual and auditory system
First Claim
1. A robotics visual and auditory system comprising;
- a plurality of acoustic models,a speech recognition engine for executing speech recognition processes to separated sound signals from respective sound sources by using the acoustic models, anda selector for integrating a plurality of speech recognition process results obtained by the speech recognition process, and selecting any one of speech recognition process results,wherein, in order to respond the case where a plurality of speakers speak to said robot from different directions with the robot'"'"'s front direction as the base, the acoustic models are provided with respect to each speaker and each direction so to respond each direction,wherein the speech recognition engine uses each of said acoustic models separately for one sound signal separated by sound source separation, and executes said speech recognition process in parallel.
1 Assignment
0 Petitions
Accused Products
Abstract
It is a robotics visual and auditory system provided with an auditory module (20), a face module (30), a stereo module (37), a motor control module (40), and an association module (50) to control these respective modules. The auditory module (20) collects sub-bands having interaural phase difference (IPD) or interaural intensity difference (IID) within a predetermined range by an active direction pass filter (23a) having a pass range which, according to auditory characteristics, becomes minimum in the frontal direction, and larger as the angle becomes wider to the left and right, based on an accurate sound source directional information from the association module (50), and conducts sound source separation by restructuring a wave shape of a sound source, conducts speech recognition of separated sound signals from respective sound sources using a plurality of acoustic models (27d), integrates speech recognition results from each acoustic model by a selector, and judges the most reliable speech recognition result among the speech recognition results.
164 Citations
13 Claims
-
1. A robotics visual and auditory system comprising;
-
a plurality of acoustic models, a speech recognition engine for executing speech recognition processes to separated sound signals from respective sound sources by using the acoustic models, and a selector for integrating a plurality of speech recognition process results obtained by the speech recognition process, and selecting any one of speech recognition process results, wherein, in order to respond the case where a plurality of speakers speak to said robot from different directions with the robot'"'"'s front direction as the base, the acoustic models are provided with respect to each speaker and each direction so to respond each direction, wherein the speech recognition engine uses each of said acoustic models separately for one sound signal separated by sound source separation, and executes said speech recognition process in parallel. - View Dependent Claims (2, 3)
-
-
4. A robotics visual and auditory system comprising;
-
an auditory module which is provided at least with a pair of microphones to collect external sounds, and, based on sound signals from the microphones, determines a direction of at least one speaker by sound source separation and localization by grouping based on pitch extraction and harmonic sounds, a face module which is provided a camera to take images of a robot'"'"'s front, identifies each speaker, and extracts his face event from each speaker'"'"'s face recognition and localization, based on images taken by the camera, a motor control module which is provided with a drive motor to rotate the robot in the horizontal direction, and extracts motor event, based on a rotational position of the drive motor, an association module which determines each speaker'"'"'s direction, based on directional information of sound source localization of the auditory event and face localization of the face event, from said auditory, face, and motor events, generates an auditory stream and a face stream by connecting said events in the temporal direction using a Kalman filter for determinations, and further generates an association stream associating these streams, and an attention control module which conduct an attention control based on said streams, and drive-controls the motor based on an action planning results accompanying the attention control, in order for the auditory module to respond the case where a plurality of speakers speak to said robot from different directions with the robot'"'"'s front direction as the base, acoustic models are provided in each direction so to respond each speaker, and each direction, wherein the auditory module collects sub-bands having interaural phase difference (IPD) or interaural intensity difference (IID) within a predetermined range by an active direction pass filter having a pass range which, according to auditory characteristics, becomes minimum in the frontal direction, and larger as the angle becomes wider to the left and right, based on an accurate sound source directional information from the association module, and conducts sound source separation by restructuring a wave shape of a sound source, conducts speech recognition in parallel for one sound signal separated by sound source separation using a plurality of the acoustic models, integrates speech recognition results from each acoustic model by a selector, and judges the most reliable speech recognition result among the speech recognition results. - View Dependent Claims (6, 7, 10, 11, 12, 13)
-
-
5. A robotics visual and auditory system comprising;
-
an auditory module which is provided at least with a pair of microphones to collect external sounds, and, based on sound signals from the microphones, determines a direction of at least one speaker by sound source separation and localization by grouping based on pitch extraction and harmonic sounds, a face module which is provided a camera to take images of a robot'"'"'s front, identifies each speaker, and extracts his face event from each speaker'"'"'s face recognition and localization, based on images taken by the camera, a stereo module which extracts and localizes a longitudinally long matter, based on a parallax extracted from images taken by a stereo camera, and extracts stereo event, a motor control module which is provided with a drive motor to rotate the robot in the horizontal direction, and extracts motor event, based on a rotational position of the drive motor, an association module which determines each speaker'"'"'s direction, based on directional information of sound source localization of the auditory event and face localization of the face event, from said auditory, face, stereo, and motor events, generates an auditory stream, a face stream and a stereo visual stream by connecting said events in the temporal direction using a Kalman filter for determinations, and further generates an association stream associating these streams, and an attention control module which conduct an attention control based on said streams, and drive-controls the motor based on an action planning results accompanying the attention control, in order for the auditory module to respond the case where a plurality of speakers speak to said robot from different directions with the robot'"'"'s front direction as the base, acoustic models are provided in each direction so to respond each speaker, and each direction, wherein the auditory module collects sub-bands having interaural phase difference (IPD) or interaural intensity difference (IID) within a predetermined range by an active direction pass filter having a pass range which, according to auditory characteristics, becomes minimum in the frontal direction, and larger as the angle becomes wider to the left and right, based on an accurate sound source directional information from the association module, and conducts sound source separation by restructuring a wave shape of a sound source, conducts speech recognition in parallel for one sound signal separated by sound source separation using a plurality of the acoustic models, integrates speech recognition results from each acoustic model by a selector, and judges the most reliable speech recognition result among the speech recognition results. - View Dependent Claims (8, 9)
-
Specification