Speech recognition apparatus and method in noisy circumstances
First Claim
Patent Images
1. A speech recognition apparatus for recognizing an input speech under noisy circumstances comprising:
- a noise model memory for storing a noise model;
a speech model memory for storing a noise-free speech model;
a reference model memory for storing a plurality of speech models for collation;
an acoustic analyzer for receiving the input speech, acoustically analyzing a noise-superimposed speech signal of the input speech, and outputting a time-series feature vector of noise-superimposed speech;
a superimposed-noise estimating unit for estimating a superimposed noise based on the time-series feature vector of noise-superimposed speech by using the noise model stored in the noise model memory and the noise-free speech model stored in the speech model memory, and outputting an estimated superimposed-noise spectrum;
a spectrum calculator for receiving the input speech, analyzing a spectrum of the noise-superimposed speech signal of the input speech, and outputting a time-series noise-superimposed speech spectrum;
a noise spectrum eliminator for eliminating a spectrum component of a noise speech in the noise-superimposed speech signal for the time-series noise-superimposed speech spectrum output from the spectrum calculator by using the estimated superimposed-noise spectrum output from the superimposed-noise estimating unit, and outputting a time-series noise-eliminated speech spectrum;
a feature vector calculator for calculating a first feature vector from the time-series noise-eliminated speech spectrum and outputting a time-series feature vector of noise-eliminated speech; and
a collating unit for collating the time-series feature vector of noise-eliminated speech with the plurality of speech models for collation stored in the reference model memory, selecting a speech model out of the plurality of speech models for collation, whose likelihood is highest, and outputting the speech model as a recognition result.
1 Assignment
0 Petitions
Accused Products
Abstract
An estimated-SN(Signal Noise)-ratio is calculated for a time-series feature vector of noise-superimposed speech by using a noise-free speech model and a noise model. A noise-superimposed model is generated based on the estimated-SN-ratio. A likelihood between the time-series feature vector of noise-superimposed speech and the noise-superimposed model is calculated to obtain likelihood information. A noise spectrum included in the noise-superimposed speech is estimated from the likelihood information.
52 Citations
13 Claims
-
1. A speech recognition apparatus for recognizing an input speech under noisy circumstances comprising:
-
a noise model memory for storing a noise model; a speech model memory for storing a noise-free speech model; a reference model memory for storing a plurality of speech models for collation; an acoustic analyzer for receiving the input speech, acoustically analyzing a noise-superimposed speech signal of the input speech, and outputting a time-series feature vector of noise-superimposed speech; a superimposed-noise estimating unit for estimating a superimposed noise based on the time-series feature vector of noise-superimposed speech by using the noise model stored in the noise model memory and the noise-free speech model stored in the speech model memory, and outputting an estimated superimposed-noise spectrum; a spectrum calculator for receiving the input speech, analyzing a spectrum of the noise-superimposed speech signal of the input speech, and outputting a time-series noise-superimposed speech spectrum; a noise spectrum eliminator for eliminating a spectrum component of a noise speech in the noise-superimposed speech signal for the time-series noise-superimposed speech spectrum output from the spectrum calculator by using the estimated superimposed-noise spectrum output from the superimposed-noise estimating unit, and outputting a time-series noise-eliminated speech spectrum; a feature vector calculator for calculating a first feature vector from the time-series noise-eliminated speech spectrum and outputting a time-series feature vector of noise-eliminated speech; and a collating unit for collating the time-series feature vector of noise-eliminated speech with the plurality of speech models for collation stored in the reference model memory, selecting a speech model out of the plurality of speech models for collation, whose likelihood is highest, and outputting the speech model as a recognition result. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A speech recognition method for recognizing an input speech under noisy circumstances, including a noise model memory for storing a noise model, a speech model memory for storing a noise-free speech model, and a reference model memory for storing a plurality of speech models for collation, the method comprising the steps of:
-
analyzing a noise-superimposed speech signal of the input speech acoustically to output a time-series feature vector of noise-superimposed speech; estimating a superimposed-noise for the time-series feature vector of noise-superimposed speech by using the noise model stored in the noise model memory and the noise-free speech model stored in the speech model memory to output an estimated superimposed-noise spectrum; calculating a spectrum of the noise-superimposed speech signal in the input speech by performing a spectrum-analysis to output a time-series noise-superimposed speech spectrum; eliminating a spectrum component of a noise speech in the noise-superimposed speech signal for the time-series noise-superimposed speech spectrum output from the step of calculating the spectrum by using the estimated superimposed-noise spectrum output from the step of estimating the superimposed-noise to output a time-series noise-eliminated speech spectrum; calculating a first feature vector from the time-series noise-eliminated speech spectrum to output a time-series feature vector of noise-eliminated speech; and collating the time-series feature vector of noise-eliminated speech with the plurality of speech models for collation stored in the reference model memory to select and output a speech model out of the plurality of speech models for collation, whose likelihood is highest, as a recognition result. - View Dependent Claims (10, 11, 12, 13)
-
Specification