Method and system for microphone array input type speech recognition using band-pass power distribution for sound source position/direction estimation
First Claim
1. A microphone array input type speech recognition system, comprising:
- a speech input unit for inputting speeches in a plurality of channels using a microphone array formed by a plurality of microphones;
a frequency analysis unit for analyzing an input speech of each channel inputted by the speech input unit, and obtaining band-pass waveforms for each channel, each band-pass waveform being a waveform for each frequency bandwidth;
a sound source position search unit for calculating a band-pass power distribution for each frequency bandwidth from the band-pass waveforms for each frequency bandwidth obtained by the frequency analysis unit, synthesizing calculated band-pass power distributions for a plurality of frequency bandwidths, and estimating a sound source position or direction from a synthesized band-pass power distribution;
a speech parameter extraction unit for extracting a speech parameter for speech recognition, from the band-pass power distribution for each frequency bandwidth calculated by the sound source position search unit, according to the sound source position or direction estimated by the sound source position search unit; and
a speech recognition unit for obtaining a speech recognition result by matching the speech parameter extracted by the speech parameter extraction unit with a recognition dictionary.
1 Assignment
0 Petitions
Accused Products
Abstract
A microphone array input type speech recognition scheme capable of realizing a high precision sound source position or direction estimation by a small amount of calculations, and thereby realizing a high precision speech recognition. A band-pass waveform, which is a waveform for each frequency bandwidth, is obtained from input signals of the microphone array, and a band-pass power of the sound source is directly obtained from the band-pass waveform. Then, the obtained band-pass power is used as the speech parameter. It is also possible to realize the sound source estimation and the band-pass power estimation at high precision while further reducing an amount of calculations, by utilizing a sound source position search processing in which a low resolution position estimation and a high resolution position estimation are combined.
117 Citations
26 Claims
-
1. A microphone array input type speech recognition system, comprising:
-
a speech input unit for inputting speeches in a plurality of channels using a microphone array formed by a plurality of microphones; a frequency analysis unit for analyzing an input speech of each channel inputted by the speech input unit, and obtaining band-pass waveforms for each channel, each band-pass waveform being a waveform for each frequency bandwidth; a sound source position search unit for calculating a band-pass power distribution for each frequency bandwidth from the band-pass waveforms for each frequency bandwidth obtained by the frequency analysis unit, synthesizing calculated band-pass power distributions for a plurality of frequency bandwidths, and estimating a sound source position or direction from a synthesized band-pass power distribution; a speech parameter extraction unit for extracting a speech parameter for speech recognition, from the band-pass power distribution for each frequency bandwidth calculated by the sound source position search unit, according to the sound source position or direction estimated by the sound source position search unit; and a speech recognition unit for obtaining a speech recognition result by matching the speech parameter extracted by the speech parameter extraction unit with a recognition dictionary. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A microphone array input type speech analysis system, comprising:
-
a speech input unit for inputting speeches in a plurality of channels using a microphone array formed by a plurality of microphones; a frequency analysis unit for analyzing an input speech of each channel inputted by the speech input unit, and obtaining band-pass waveforms for each channel, each band-pass waveform being a waveform for each frequency bandwidth; a sound source position search unit for calculating a band-pass power distribution for each frequency bandwidth from the band-pass waveforms for each frequency bandwidth obtained by the frequency analysis unit, synthesizing calculated band-pass power distributions for a plurality of frequency bandwidths, and estimating a sound source position or direction from a synthesized band-pass power distribution; and a speech parameter extraction unit for extracting a speech parameter from the band-pass power distribution for each frequency bandwidth estimated by the sound source position search unit, according to the sound source position or direction estimated by the sound source position search unit. - View Dependent Claims (11)
-
-
12. A microphone array input type speech analysis system, comprising:
-
a speech input unit for inputting speeches in a plurality of channels using a microphone array formed by a plurality of microphones; a frequency analysis unit for analyzing an input speech of each channel inputted by the speech input unit, and obtaining band-pass waveforms for each channel, each band-pass waveform being a waveform for each frequency bandwidth; and a sound source position search unit for calculating a band-pass power distribution for each frequency bandwidth from the band-pass waveforms for each frequency bandwidth obtained by the frequency analysis unit, synthesizing calculated band-pass power distributions for a plurality of frequency bandwidths, and estimating a sound source position or direction from a synthesized band-pass power distribution. - View Dependent Claims (13)
-
-
14. A microphone array input type speech recognition method, comprising the steps of:
-
inputting speeches in a plurality of channels using a microphone array formed by a plurality of microphones; analyzing an input speech of each channel inputted by the inputting step, and obtaining band-pass waveforms for each channel, each band-pass waveform being a waveform for each frequency bandwidth; calculating a band-pass power distribution for each frequency bandwidth from the band-pass waveforms for each frequency bandwidth obtained by the analyzing step, synthesizing calculated band-pass power distributions for a plurality of frequency bandwidths, and estimating a sound source position or direction from a synthesized band-pass power distribution; extracting a speech parameter for speech recognition, from the band-pass power distribution for each frequency bandwidth calculated by the calculating step, according to the sound source position or direction estimated by the calculating step; and obtaining a speech recognition result by matching the speech parameter extracted by the extracting step with a recognition dictionary. - View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22)
-
-
23. A microphone array input type speech analysis method, comprising the steps of:
-
inputting speeches in a plurality of channels using a microphone array formed by a plurality of microphones; analyzing an input speech of each channel inputted by the inputting step, and obtaining band-pass waveforms for each channel, each band-pass waveform being a waveform for each frequency bandwidth; calculating a band-pass power distribution for each frequency bandwidth from the band-pass waveforms for each frequency bandwidth obtained by the analyzing step, synthesizing calculated band-pass power distributions for a plurality of frequency bandwidths, and estimating a sound source position or direction from a synthesized band-pass power distribution; and extracting a speech parameter from the band-pass power distribution for each frequency bandwidth calculated by the calculating step, according to the sound source position or direction estimated by the calculating step. - View Dependent Claims (24)
-
-
25. A microphone array input type speech analysis method, comprising the steps of:
-
inputting speeches in a plurality of channels using a microphone array formed by a plurality of microphones; analyzing an input speech of each channel inputted by the inputting step, and obtaining band-pass waveforms for each channel, each band-pass waveform being a waveform for each frequency bandwidth; and calculating a band-pass power distribution for each frequency bandwidth from the band-pass waveforms for each frequency bandwidth obtained by the analyzing step, synthesizing calculated band-pass power distributions for a plurality of frequency bandwidths, and estimating a sound source position or direction from a synthesized band-pass power distribution. - View Dependent Claims (26)
-
Specification