Preprocessing system for speech recognition
First Claim
1. A system for preprocessing the speech of a speaker to provide a normalized signal for subsequent processing, said system comprising:
- means for generating speaker specific gain settings, speaker specific spectral settings, speaker specific pitch settings and speaker specific peak normalization settings for the speech of a particular speaker, said settings being generated during an enrollment for said particular speaker wherein words spoken during said enrollment may be a different set relative to words spoken by said speaker after said enrollment;
means coupled to said generating means for generating said normalized signal using said settings, which normalized signal represents the speech of the speaker which is to be processed;
wherein said means for generating said speaker specific settings comprises;
a) gain enrollment means for generating said speaker specific settings of the gain for controlling an overall signal level;
b) spectral and pitch enrollment means for generating said speaker specific spectral settings and said speaker specific pitch settings;
c) peak normalization enrollment means for generating said speaker specific peak normalization settings;
wherein said normalized signal includes a set of parameters, said set of parameters including spectral parameters, temporal parameters, pitch parameters, said normalized signal further including a nasal energy signal, an oral energy signal and a pitch epoch timing signal, and wherein said normalized signal generating means includes data acquisition means for generating from the speech of the speaker said oral energy signal, said nasal energy signal and an oral amplitude signal, wherein said oral amplitude signal is input to;
(i) spectral analyzer means for generating said spectral parameters;
(ii) temporal analyzer means for generating said temporal parameters; and
(iii) pitch analyzer means for generating said pitch parameters and said pitch epoch timing signal and wherein said data acquisition means comprises;
(a) an oral microphone for converting sound emanating from the speaker'"'"'s mouth into a first electrical signal;
(b) a nasal microphone for converting sound emanating from the speaker'"'"'s nose into a second electrical signal;
(c) first gain control means coupled to said oral microphone for producing a digitally controlled gain of said first electrical signal;
(d) second gain control means coupled to said nasal microphone for producing a digitally controlled gain of said second electrical signal;
(e) first band limiting means coupled to said first gain control means for producing a voiced band oral amplitude signal from said gain controlled first electrical signal;
(f) second band limiting means coupled to said second gain control means for producing a voiced band nasal amplitude signal from said gain controlled second electrical signal;
(g) energy computation means coupled to said first and second band limiting means for performing a wide band RMS to DC conversion on the output from each of said first and second band limiting means;
(h) first filter means coupled to said first band limiting means for producing a low pass Nyquist filtered output from said voiced band oral amplitude signal;
(i) second filter means coupled to said energy computation means for producing a low pass Nyquist filtered output from each of said DC converted outputs from said energy computation means;
(j) analog to digital converter means coupled to said first and second filter means for generating a digitalized oral amplitude signal from the output of said first filter means, and a digitalized oral energy signal and a digitalized nasal energy signal from the outputs of said second filter means; and
wherein said means for generating said speaker specific settings comprises;
a) gain enrollment means for generating said speaker specific settings of the gain for controlling an overall signal level;
b) spectral and pitch enrollment means for generating said speaker specific spectral settings and said speaker specific pitch settings;
(c) peak normalization enrollment means for generating said speaker specific peak normalization settings.
0 Assignments
0 Petitions
Accused Products
Abstract
The present invention processes an independent body of speech during an enrollment process and creates a set of speaker specific enrollment parameters for normalizing analysis parameters including the speaker'"'"'s pitch, the frequency spectrum of the speech as a function of time, and certain measurements of the speech signal in the time-domain. A particular objective of the invention is to make these analysis parameters have the same meaning from speaker to speaker. Thus after the pre-processing performed by this invention, the parameters would look much the same for the same word independent of speaker. In this manner, variations in the speech signal caused by the physical makeup of a speaker'"'"'s throat, mouth, lips, teeth, and nasal cavity would be, at least in part, reduced by the pre-processing.
-
Citations
10 Claims
-
1. A system for preprocessing the speech of a speaker to provide a normalized signal for subsequent processing, said system comprising:
-
means for generating speaker specific gain settings, speaker specific spectral settings, speaker specific pitch settings and speaker specific peak normalization settings for the speech of a particular speaker, said settings being generated during an enrollment for said particular speaker wherein words spoken during said enrollment may be a different set relative to words spoken by said speaker after said enrollment; means coupled to said generating means for generating said normalized signal using said settings, which normalized signal represents the speech of the speaker which is to be processed; wherein said means for generating said speaker specific settings comprises; a) gain enrollment means for generating said speaker specific settings of the gain for controlling an overall signal level; b) spectral and pitch enrollment means for generating said speaker specific spectral settings and said speaker specific pitch settings; c) peak normalization enrollment means for generating said speaker specific peak normalization settings; wherein said normalized signal includes a set of parameters, said set of parameters including spectral parameters, temporal parameters, pitch parameters, said normalized signal further including a nasal energy signal, an oral energy signal and a pitch epoch timing signal, and wherein said normalized signal generating means includes data acquisition means for generating from the speech of the speaker said oral energy signal, said nasal energy signal and an oral amplitude signal, wherein said oral amplitude signal is input to; (i) spectral analyzer means for generating said spectral parameters; (ii) temporal analyzer means for generating said temporal parameters; and (iii) pitch analyzer means for generating said pitch parameters and said pitch epoch timing signal and wherein said data acquisition means comprises; (a) an oral microphone for converting sound emanating from the speaker'"'"'s mouth into a first electrical signal; (b) a nasal microphone for converting sound emanating from the speaker'"'"'s nose into a second electrical signal; (c) first gain control means coupled to said oral microphone for producing a digitally controlled gain of said first electrical signal; (d) second gain control means coupled to said nasal microphone for producing a digitally controlled gain of said second electrical signal; (e) first band limiting means coupled to said first gain control means for producing a voiced band oral amplitude signal from said gain controlled first electrical signal; (f) second band limiting means coupled to said second gain control means for producing a voiced band nasal amplitude signal from said gain controlled second electrical signal; (g) energy computation means coupled to said first and second band limiting means for performing a wide band RMS to DC conversion on the output from each of said first and second band limiting means; (h) first filter means coupled to said first band limiting means for producing a low pass Nyquist filtered output from said voiced band oral amplitude signal; (i) second filter means coupled to said energy computation means for producing a low pass Nyquist filtered output from each of said DC converted outputs from said energy computation means; (j) analog to digital converter means coupled to said first and second filter means for generating a digitalized oral amplitude signal from the output of said first filter means, and a digitalized oral energy signal and a digitalized nasal energy signal from the outputs of said second filter means; and wherein said means for generating said speaker specific settings comprises; a) gain enrollment means for generating said speaker specific settings of the gain for controlling an overall signal level; b) spectral and pitch enrollment means for generating said speaker specific spectral settings and said speaker specific pitch settings; (c) peak normalization enrollment means for generating said speaker specific peak normalization settings. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A system for preprocessing the speaker of a speaker to provide a normalized signal for subsequent processing, said system comprising:
-
means for generating speaker specific gain settings, speaker specific spectral settings, speaker specific pitch settings and speaker specific peak normalization settings for the speech of a particular speaker, said settings being generated during an enrollment for said particular speaker wherein words spoken during said enrollment may be a disjoint set relative to words spoken by said speaker after said enrollment; means coupled to said generating means for generating said normalized signal using said settings, which normalized signal represents the speech of the speaker which is to be processed, wherein said means for generating said speaker specific settings comprises; a) gain enrollment means for generating said speaker specific settings of the gain for controlling an overall signal level; b) spectral and pitch enrollment means for generating said speaker specific spectral settings and said speaker specific pitch settings; c) peak normalization enrollment means for generating said speaker specific peak normalization settings wherein said gain enrollment means comprises; data acquisition means for generating gain settings from the speech of the speaker during said enrollment using said speech and predetermined default gain settings;
means for performing statistical analysis of said generated gain settings thereby generating said speaker specific gain settings;wherein said data acquisition means comprises; a) an oral microphone for converting sound emanating from the speaker'"'"'s mouth into a first electrical signal; b) a nasal microphone for converting sound emanating from the speaker'"'"'s nose into a second electrical signal; c) first gain control means coupled to said oral microphone for producing a digitally controlled gain of said first electrical signal; d) second gain control means coupled to said nasal microphone for producing a digitally controlled gain of said second electrical signal; e) first band limiting means coupled to said first gain control means for producing a voiced band oral amplitude signal from said gain controlled first electrical signal; f) second band limiting means coupled to said second gain control means for producing a voiced band nasal amplitude signal from said gain controlled second electrical signal; g) energy computation means coupled to said first and second band limiting means for performing a wide band RMS to DC conversion on the output from each of said first and second band limiting means; h) first filter means coupled to said first band limiting means for producing a low pass Nyquist filtered output from said voiced band oral amplitude signal; i) second filter means coupled to said energy computation means for producing a low pass Nyquist filtered output from each of said DC converted outputs from said energy computation means; j) analog to digital converter means coupled to said first and second filter means for generating a digitalized oral amplitude signal from the output of said first filter means, and a digitalized oral energy signal and a digitalized nasal energy signal from the outputs of said second filter means. - View Dependent Claims (10)
-
Specification