Speech Recognition Apparatus, Speech Recognition Apparatus and Program Thereof
First Claim
1. A speech recognition apparatus comprising:
- a microphone array comprising at least 3 microphones for measuring a profile of a base form sound from possible various sound source directions and a profile of a nondirectional background sound prior to recording a voice;
wherein each microphone measures a delay and a sum of peak power for each of a plurality of angles from a horizontal axis and from a vertical axis in response to a sound source located at a plurality of locations about said microphone array;
a database for storing said profile of said base form sound from said possible various sound source directions and said profile of said nondirectional background sound measured prior to said recording of said voice;
a sound source localization part for comparing a profile of the voice recorded by the microphone array with the profile of the base form sound from said possible various sound source directions and said profile of said nondirectional background sounds measured prior to said recording of said voice and stored in the database to estimate a sound source direction of the recorded voice; and
a speech recognition part for executing speech recognition of voice data of a component of the sound source direction estimated by the sound source localization part.
1 Assignment
0 Petitions
Accused Products
Abstract
Provided is a method for canceling background noise of a sound source other than a target direction sound source in order to realize highly accurate speech recognition, and a system using the same. In terms of directional characteristics of a microphone array, due to a capability of approximating a power distribution of each angle of each of possible various sound source directions by use of a sum of coefficient multiples of a base form angle power distribution of a target sound source measured beforehand by base form angle by using a base form sound, and power distribution of a non-directional background sound by base form, only a component of the target sound source direction is extracted at a noise suppression part. In addition, when the target sound source direction is unknown, at a sound source localization part, a distribution for minimizing the approximate residual is selected from base form angle power distributions of various sound source directions to assume a target sound source direction. Further, maximum likelihood estimation is executed by using voice data of the component of the sound source direction passed through these processes, and a voice model obtained by predetermined modeling of the voice data, and speech recognition is carried out based on an obtained assumption value.
-
Citations
21 Claims
-
1. A speech recognition apparatus comprising:
-
a microphone array comprising at least 3 microphones for measuring a profile of a base form sound from possible various sound source directions and a profile of a nondirectional background sound prior to recording a voice; wherein each microphone measures a delay and a sum of peak power for each of a plurality of angles from a horizontal axis and from a vertical axis in response to a sound source located at a plurality of locations about said microphone array; a database for storing said profile of said base form sound from said possible various sound source directions and said profile of said nondirectional background sound measured prior to said recording of said voice; a sound source localization part for comparing a profile of the voice recorded by the microphone array with the profile of the base form sound from said possible various sound source directions and said profile of said nondirectional background sounds measured prior to said recording of said voice and stored in the database to estimate a sound source direction of the recorded voice; and a speech recognition part for executing speech recognition of voice data of a component of the sound source direction estimated by the sound source localization part. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A speech recognition method for recognizing a voice inputted through a microphone array comprising at least 3 microphones by controlling a computer, comprising:
-
a voice inputting step of recording a voice by using the microphone array, and storing voice data in a memory; wherein each microphone measures a delay and a sum of peak power for each of a plurality of angles from a horizontal axis and from a vertical axis in response to a white noise source located at a plurality of locations about said microphone array; a sound source localization step of estimating a sound source direction of the recorded voice based on the voice data stored in the memory, and storing a result of the estimation in a memory; a noise suppression step of decomposing the recorded voice into a component of a sound of the estimated sound source location, and a component of a nondirectional background sound based on the result of the estimation stored in the memory and information regarding premeasured profile of a predetermined voice, and storing voice data in which the component of the background sound from the recorded voice is canceled into a memory; and a speech recognition step of recognizing the recorded voice based on the voice data in which the component of the background sound is canceled stored in the memory. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19)
-
-
20. A speech recognition method for recognizing a voice by use of a microphone array comprising at least 3 microphones by controlling a computer, comprising:
-
a voice inputting step of recording a voice by using the microphone array, and storing voice data in a memory, wherein each microphone measures a delay and a sum of peak power for each of a plurality of angles from a horizontal axis and from a vertical axis in response to a white noise source located at a plurality of locations about said microphone array; a sound source localization step of obtaining profile for various voice input directions by combining profiles of base form and nondirectional background sounds from a premeasured specific sound source direction, comparing the obtained profile with profile of the recorded voice obtained from the voice data stored in the memory to estimate a sound source direction of the recorded voice, and storing a result of the estimation in a memory; a noise suppression step of extracting and storing voice data of the component of the estimated sound source direction of the recorded voice based on the estimation result of the sound source direction stored in the memory, and the voice data; and a speech recognition step of recognizing the recorded voice based on voice data in which the component of the background sound is canceled stored in the memory.
-
-
21. A computer-readable medium encoded with a computer program for recognizing a voice by using a microphone array comprising at least 3 microphones by controlling a computer, making the computer execute:
-
a voice inputting process of recording a voice by using the microphone array, and storing voice data in a memory; wherein each microphone measures a delay and a sum of peak power for each of a plurality of angles from a horizontal axis and from a vertical axis in response to a white noise source located at a plurality of locations about said microphone array; a sound source localization process of estimating a sound source direction of the recorded voice based on the voice data stored in the memory, and storing a result of the estimation in a memory; a noise suppression process of decomposing the recorded voice into a component of a sound of the estimated sound source direction and a component of a nondirectional background sound based on the result of the estimation stored in the memory and information regarding premeasured profile of a predetermined voice, and storing voice data in which the component of the background sound is canceled from the recorded voice in a memory; and a speech recognition process of recognizing the recorded voice based on the voice data the component of the background sound is canceled stored in the memory.
-
Specification