Preprocessing system for speech recognition

US 5,054,085 A
Filed: 11/19/1990
Issued: 10/01/1991
Est. Priority Date: 05/18/1983
Status: Expired due to Fees

First Claim

Patent Images

1. A system for preprocessing the speech of a speaker to provide a normalized signal for subsequent processing, said system comprising:

means for generating speaker specific gain settings, speaker specific spectral settings, speaker specific pitch settings and speaker specific peak normalization settings for the speech of a particular speaker, said settings being generated during an enrollment for said particular speaker wherein words spoken during said enrollment may be a different set relative to words spoken by said speaker after said enrollment;

means coupled to said generating means for generating said normalized signal using said settings, which normalized signal represents the speech of the speaker which is to be processed;

wherein said means for generating said speaker specific settings comprises;

a) gain enrollment means for generating said speaker specific settings of the gain for controlling an overall signal level;

b) spectral and pitch enrollment means for generating said speaker specific spectral settings and said speaker specific pitch settings;

c) peak normalization enrollment means for generating said speaker specific peak normalization settings;

wherein said normalized signal includes a set of parameters, said set of parameters including spectral parameters, temporal parameters, pitch parameters, said normalized signal further including a nasal energy signal, an oral energy signal and a pitch epoch timing signal, and wherein said normalized signal generating means includes data acquisition means for generating from the speech of the speaker said oral energy signal, said nasal energy signal and an oral amplitude signal, wherein said oral amplitude signal is input to;

(i) spectral analyzer means for generating said spectral parameters;

(ii) temporal analyzer means for generating said temporal parameters; and

(iii) pitch analyzer means for generating said pitch parameters and said pitch epoch timing signal and wherein said data acquisition means comprises;

(a) an oral microphone for converting sound emanating from the speaker'"'"'s mouth into a first electrical signal;

(b) a nasal microphone for converting sound emanating from the speaker'"'"'s nose into a second electrical signal;

(c) first gain control means coupled to said oral microphone for producing a digitally controlled gain of said first electrical signal;

(d) second gain control means coupled to said nasal microphone for producing a digitally controlled gain of said second electrical signal;

(e) first band limiting means coupled to said first gain control means for producing a voiced band oral amplitude signal from said gain controlled first electrical signal;

(f) second band limiting means coupled to said second gain control means for producing a voiced band nasal amplitude signal from said gain controlled second electrical signal;

(g) energy computation means coupled to said first and second band limiting means for performing a wide band RMS to DC conversion on the output from each of said first and second band limiting means;

(h) first filter means coupled to said first band limiting means for producing a low pass Nyquist filtered output from said voiced band oral amplitude signal;

(i) second filter means coupled to said energy computation means for producing a low pass Nyquist filtered output from each of said DC converted outputs from said energy computation means;

(j) analog to digital converter means coupled to said first and second filter means for generating a digitalized oral amplitude signal from the output of said first filter means, and a digitalized oral energy signal and a digitalized nasal energy signal from the outputs of said second filter means; and

wherein said means for generating said speaker specific settings comprises;

a) gain enrollment means for generating said speaker specific settings of the gain for controlling an overall signal level;

b) spectral and pitch enrollment means for generating said speaker specific spectral settings and said speaker specific pitch settings;

(c) peak normalization enrollment means for generating said speaker specific peak normalization settings.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present invention processes an independent body of speech during an enrollment process and creates a set of speaker specific enrollment parameters for normalizing analysis parameters including the speaker'"'"'s pitch, the frequency spectrum of the speech as a function of time, and certain measurements of the speech signal in the time-domain. A particular objective of the invention is to make these analysis parameters have the same meaning from speaker to speaker. Thus after the pre-processing performed by this invention, the parameters would look much the same for the same word independent of speaker. In this manner, variations in the speech signal caused by the physical makeup of a speaker'"'"'s throat, mouth, lips, teeth, and nasal cavity would be, at least in part, reduced by the pre-processing.

Citations

10 Claims

1. A system for preprocessing the speech of a speaker to provide a normalized signal for subsequent processing, said system comprising:
- means for generating speaker specific gain settings, speaker specific spectral settings, speaker specific pitch settings and speaker specific peak normalization settings for the speech of a particular speaker, said settings being generated during an enrollment for said particular speaker wherein words spoken during said enrollment may be a different set relative to words spoken by said speaker after said enrollment;
  
  means coupled to said generating means for generating said normalized signal using said settings, which normalized signal represents the speech of the speaker which is to be processed;
  
  wherein said means for generating said speaker specific settings comprises;
  
  a) gain enrollment means for generating said speaker specific settings of the gain for controlling an overall signal level;
  
  b) spectral and pitch enrollment means for generating said speaker specific spectral settings and said speaker specific pitch settings;
  
  c) peak normalization enrollment means for generating said speaker specific peak normalization settings;
  
  wherein said normalized signal includes a set of parameters, said set of parameters including spectral parameters, temporal parameters, pitch parameters, said normalized signal further including a nasal energy signal, an oral energy signal and a pitch epoch timing signal, and wherein said normalized signal generating means includes data acquisition means for generating from the speech of the speaker said oral energy signal, said nasal energy signal and an oral amplitude signal, wherein said oral amplitude signal is input to;
  
  (i) spectral analyzer means for generating said spectral parameters;
  
  (ii) temporal analyzer means for generating said temporal parameters; and
  
  (iii) pitch analyzer means for generating said pitch parameters and said pitch epoch timing signal and wherein said data acquisition means comprises;
  
  (a) an oral microphone for converting sound emanating from the speaker'"'"'s mouth into a first electrical signal;
  
  (b) a nasal microphone for converting sound emanating from the speaker'"'"'s nose into a second electrical signal;
  
  (c) first gain control means coupled to said oral microphone for producing a digitally controlled gain of said first electrical signal;
  
  (d) second gain control means coupled to said nasal microphone for producing a digitally controlled gain of said second electrical signal;
  
  (e) first band limiting means coupled to said first gain control means for producing a voiced band oral amplitude signal from said gain controlled first electrical signal;
  
  (f) second band limiting means coupled to said second gain control means for producing a voiced band nasal amplitude signal from said gain controlled second electrical signal;
  
  (g) energy computation means coupled to said first and second band limiting means for performing a wide band RMS to DC conversion on the output from each of said first and second band limiting means;
  
  (h) first filter means coupled to said first band limiting means for producing a low pass Nyquist filtered output from said voiced band oral amplitude signal;
  
  (i) second filter means coupled to said energy computation means for producing a low pass Nyquist filtered output from each of said DC converted outputs from said energy computation means;
  
  (j) analog to digital converter means coupled to said first and second filter means for generating a digitalized oral amplitude signal from the output of said first filter means, and a digitalized oral energy signal and a digitalized nasal energy signal from the outputs of said second filter means; and
  
  wherein said means for generating said speaker specific settings comprises;
  
  a) gain enrollment means for generating said speaker specific settings of the gain for controlling an overall signal level;
  
  b) spectral and pitch enrollment means for generating said speaker specific spectral settings and said speaker specific pitch settings;
  
  (c) peak normalization enrollment means for generating said speaker specific peak normalization settings.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The system defined by claim 1 wherein said spectral analyzer means comprises:
    - (a) pre-emphasis filter means for emphasizing in said oral amplitude signal frequencies between approximately 600 Hz and 6000 Hz;
      
      (b) critical band filter bank means coupled to said pre-emphasis filter means for separating the output from. said pre-emphasis filter means into a plurality of non-overlapping frequency bands based upon said speaker specific spectral settings.
  - 3. The system defined by claim 1 wherein said pitch analyzer means comprises:
    - (a) pitch filter means for low pass filtering said oral amplitude signal based upon said speaker specific pitch settings;
      
      (b) peak detector means coupled to said pitch filter means for generating a pulse at each peak in said oral amplitude signal;
      
      (c) trough detector means for generating a pulse at each trough in said oral amplitude signal representing potential pitch period beginnings;
      
      (d) temporal thresholding means coupled to said peak detector means and said trough detector means for generating said pitch epoch timing signal based upon the pulses generated by said peak detector means and said trough detector means and said speaker specific pitch settings, said temporal thresholding means also generating said pitch parameters based upon the number of samples which occurred between consecutive pitch epochs.
  - 4. The system defined by claim 1 wherein said temporal analyzer means comprises:
    - (a) means for generating positive half-wave rectification of said oral amplitude signal,(b) means for generating negative half-wave rectification of said oral amplitude signal;
      
      (c) means for generating an absolute first difference signal of said oral amplitude signal.
  - 5. The system defined by claim 1 further comprising:
    - (a) pitch synchronous peak detector means for storing during each pitch period the peak sampled values of each of said nasal energy signal, oral energy signal, spectral parameters and temporal parameters;
      
      (b) normalization means coupled to said peak synchronous peak detection means for normalizing the peak sampled values using said speaker specific peak normalization settings.
  - 6. The system defined by claim 1 wherein said speaker specific spectral settings are generated by spectral enrollment means comprising:
    - data acquisition means for generating from the speech of the speaker an oral amplitude signal, a nasal energy signal and an oral energy signal;
      
      spectral analyzer means for generating spectral parameters based upon said oral amplitude signal and predetermined default spectral settings, which generated spectral parameters are stored as said speaker specific spectral settings.
  - 7. The system defined by claim 6 wherein said speaker specific pitch settings are generated by pitch enrollment means comprising:
    - pitch analyzer means for generating pitch parameters based upon said oral amplitude signal and predetermined default pitch settings, which generated pitch parameters are stored as said speaker specific pitch settings.
  - 8. The system defined by claim 7 wherein said speaker specific peak normalization settings are generated by peak normalization means comprising:
    - temporal analyzer means for generating temporal parameters based upon said oral amplitude signal;
      
      pitch synchronization means coupled to said spectral analyzer means said pitch analyzer means and said temporal analyzer means for generating peak spectral parameters and peak temporal parameters;
      
      means for computing the extreme quantiles of said peak spectral parameters and said peak temporal parameters which coupled values are stored as the speaker specific peak normalization settings.

9. A system for preprocessing the speaker of a speaker to provide a normalized signal for subsequent processing, said system comprising:
- means for generating speaker specific gain settings, speaker specific spectral settings, speaker specific pitch settings and speaker specific peak normalization settings for the speech of a particular speaker, said settings being generated during an enrollment for said particular speaker wherein words spoken during said enrollment may be a disjoint set relative to words spoken by said speaker after said enrollment;
  
  means coupled to said generating means for generating said normalized signal using said settings, which normalized signal represents the speech of the speaker which is to be processed,wherein said means for generating said speaker specific settings comprises;
  
  a) gain enrollment means for generating said speaker specific settings of the gain for controlling an overall signal level;
  
  b) spectral and pitch enrollment means for generating said speaker specific spectral settings and said speaker specific pitch settings;
  
  c) peak normalization enrollment means for generating said speaker specific peak normalization settings wherein said gain enrollment means comprises;
  
  data acquisition means for generating gain settings from the speech of the speaker during said enrollment using said speech and predetermined default gain settings;
  
  means for performing statistical analysis of said generated gain settings thereby generating said speaker specific gain settings;
  
  wherein said data acquisition means comprises;
  
  a) an oral microphone for converting sound emanating from the speaker'"'"'s mouth into a first electrical signal;
  
  b) a nasal microphone for converting sound emanating from the speaker'"'"'s nose into a second electrical signal;
  
  c) first gain control means coupled to said oral microphone for producing a digitally controlled gain of said first electrical signal;
  
  d) second gain control means coupled to said nasal microphone for producing a digitally controlled gain of said second electrical signal;
  
  e) first band limiting means coupled to said first gain control means for producing a voiced band oral amplitude signal from said gain controlled first electrical signal;
  
  f) second band limiting means coupled to said second gain control means for producing a voiced band nasal amplitude signal from said gain controlled second electrical signal;
  
  g) energy computation means coupled to said first and second band limiting means for performing a wide band RMS to DC conversion on the output from each of said first and second band limiting means;
  
  h) first filter means coupled to said first band limiting means for producing a low pass Nyquist filtered output from said voiced band oral amplitude signal;
  
  i) second filter means coupled to said energy computation means for producing a low pass Nyquist filtered output from each of said DC converted outputs from said energy computation means;
  
  j) analog to digital converter means coupled to said first and second filter means for generating a digitalized oral amplitude signal from the output of said first filter means, and a digitalized oral energy signal and a digitalized nasal energy signal from the outputs of said second filter means.
- View Dependent Claims (10)
- - 10. The system defined by claim 9 wherein said statistical analysis means comprises:
    - means for measuring over time the values of the generated settings until such values stabilize, which stabilized gain settings are stored as the speaker specific gain settings.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Speech Systems, Inc. (FAB Universal Corp.)
Original Assignee
Speech Systems, Inc. (FAB Universal Corp.)
Inventors
Meisel, William S., Wittenstein, W. Andreas
Primary Examiner(s)
Harkcom, Gary V.
Assistant Examiner(s)
Knepper, David D.

Application Number

US07/614,991
Time in Patent Office

316 Days
Field of Search

381/29-43, 381/45-50, 364/513.5
US Class Current

704/207
CPC Class Codes

G10L 15/02   Feature extraction for spee...

G10L 15/07   to the speaker

G10L 2021/02166   Microphone arrays; Beamforming

G10L 21/00   Speech or voice signal proc...

H04R 1/08   Mouthpieces; Microphones; A...

H04R 2201/403   Linear arrays of transducers

Preprocessing system for speech recognition

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

Citations

10 Claims

Specification

Solutions

Use Cases

Quick Links

Preprocessing system for speech recognition

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

10 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links