Speech recognition system and method for variable noise environment

US 4,610,023 A
Filed: 01/06/1983
Issued: 09/02/1986
Est. Priority Date: 06/04/1982
Status: Expired due to Term

First Claim

Patent Images

1. A speech recognition system for activating an automotive vehicle actuator in response to a command spoken instruction signal coupled through and transduced by a microphone while the system is responsive to a recognition-mode command signal, the system responding to the instruction to determine the extent to which the instruction resembles a reference spoken instruction signal previously coupled to the system through the microphone while the system was responsive to a recording-mode command signal, which comprises:

(a) first means for smoothing the spoken instruction signal coupled through the microphone;

(b) second means for smoothing the spoken instruction signal coupled through the microphone, said second smoothing means having a time constant longer than that of said first smoothing means;

(c) means for switching a smoothed spoken instruction signal derived by said second smoothing means to first and second terminals while the system is respectively activated to the recording and recognition modes in response to the recognition-mode and recording-mode command signals;

(d) first means connected to said first terminal for multiplying the level of the spoken instruction signal smoothed by said second smoothing means while the system is in the recording mode and for deriving a recording-mode threshold level signal corresponding thereto;

(e) second means connected to said second terminal for multiplying the level of the spoken instruction signal smoothed by said second smoothing means while the system is in the recognition mode for deriving a recognition-mode threshold level signal corresponding thereto, the multiplication rate of said second multiplying means being higher than that of said first multiplying means;

(f) means for comparing the smoothed spoken instruction signal level derived by said first smoothing means with (i) the recording-mode threshold level signal derived from said first voltage level multiplying means while the system is in the recording mode and (ii) the recognition-mode threshold level signal derived from said second voltage level multiplying means while the system is in the recognition mode and for deriving (i) a spoken instruction start command signal when the smoothed spoken instruction signal level derived by said first smoothing means exceeds one of the recording-mode and recognition-mode threshold levels for more than a reference start time and (ii) a spoken instruction end command signal when the smoothed spoken instruction signal level derived by said first smoothing means drops below one of the recording-mode and recognition-mode threshold levels for more than a reference end time; and

(g) a speech recognizer for starting recognition of the spoken instruction signal coupled through the microphone in response to the start command signal and stopping recognition of the same signal in response to the end command signal.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An automotive vehicle speech recognition system sets a recording-mode reference threshold level to be lower than a recognition-mode reference threshold level. Therefore, a spoken instruction supplied to the system in a low voice while the vehicle is parked in a quiet place during a record mode can be correlated with an instruction uttered in a loud voice during a recognition mode while the vehicle is running in a noisy environment. During the record and recognition modes reference threshold signals derived by smoothing a spoken instruction are multiplied by two different factors such that the reference threshold in the recognition mode has a greater multiplication factor than the reference threshold in the record mode. While the driver is uttering a command a first smoothed version of the utterance power spectrum is compared with a fixed reference threshold level, set to the value of a second smoothed version of the utterance power spectrum at the time the utterance began. While no command is being uttered the reference threshold varies in accordance with the second smoothed version. The first smoothed version includes higher frequency components than the second smoothed version.

42 Citations

View as Search Results

20 Claims

1. A speech recognition system for activating an automotive vehicle actuator in response to a command spoken instruction signal coupled through and transduced by a microphone while the system is responsive to a recognition-mode command signal, the system responding to the instruction to determine the extent to which the instruction resembles a reference spoken instruction signal previously coupled to the system through the microphone while the system was responsive to a recording-mode command signal, which comprises:
- (a) first means for smoothing the spoken instruction signal coupled through the microphone;
  
  (b) second means for smoothing the spoken instruction signal coupled through the microphone, said second smoothing means having a time constant longer than that of said first smoothing means;
  
  (c) means for switching a smoothed spoken instruction signal derived by said second smoothing means to first and second terminals while the system is respectively activated to the recording and recognition modes in response to the recognition-mode and recording-mode command signals;
  
  (d) first means connected to said first terminal for multiplying the level of the spoken instruction signal smoothed by said second smoothing means while the system is in the recording mode and for deriving a recording-mode threshold level signal corresponding thereto;
  
  (e) second means connected to said second terminal for multiplying the level of the spoken instruction signal smoothed by said second smoothing means while the system is in the recognition mode for deriving a recognition-mode threshold level signal corresponding thereto, the multiplication rate of said second multiplying means being higher than that of said first multiplying means;
  
  (f) means for comparing the smoothed spoken instruction signal level derived by said first smoothing means with (i) the recording-mode threshold level signal derived from said first voltage level multiplying means while the system is in the recording mode and (ii) the recognition-mode threshold level signal derived from said second voltage level multiplying means while the system is in the recognition mode and for deriving (i) a spoken instruction start command signal when the smoothed spoken instruction signal level derived by said first smoothing means exceeds one of the recording-mode and recognition-mode threshold levels for more than a reference start time and (ii) a spoken instruction end command signal when the smoothed spoken instruction signal level derived by said first smoothing means drops below one of the recording-mode and recognition-mode threshold levels for more than a reference end time; and
  
  (g) a speech recognizer for starting recognition of the spoken instruction signal coupled through the microphone in response to the start command signal and stopping recognition of the same signal in response to the end command signal.
- View Dependent Claims (3)
- - 3. A speech recognition system for an automatic vehicle as set forth in either claim 1 or 2, wherein the ratio of multiplication rate in recognition mode to that in recording mode is from 1.5 to 2.0.

2. A speech recognition system for activating an automotive vehicle actuator in response to a command spoken instruction signal coupled through and transduced by a microphone while the system is responsive to a recognition-mode command signal, the system responding to the instruction to determine the extent to which the instruction resembles a reference spoken instruction signal previously coupled to the system through the microphone while the system was responsive to a recording-mode command signal, which comprises:
- (a) a spectrum normalizing amplifer connected to said microphone for amplifying and spectrum-normalizing the spoken instruction signal transduced by the microphone;
  
  (b) a rectifier connected to said spectrum normalizing amplifier for rectifying the amplified spoken instruction signal;
  
  (c) a first smoother connected to said rectifier for smoothing the rectified spoken instruction signal and deriving a first smoothed signal;
  
  (d) a second smoother connected to said rectifier for smoothing the rectified spoken instruction signal with a time constant longer than that of said first smoother and deriving a second smoothed signal;
  
  (e) an analog switch having a movable contact connected to said second smoother, the movable contact contacting a first fixed contact in response to the recording-mode command signal and to a second fixed contact in response to the recognition-mode command signal;
  
  (f) a record multiplier connected to the first fixed contact of said analog switch for multiplying the second smoothed spoken instruction signal while the recording mode instruction command signal is derived and for deriving the recording-mode threshold level signal;
  
  (g) a recognition multiplier connected to the second fixed contact of said analog switch for multiplying the second smoothed spoken instruction signal while the recognition mode instruction command signal is derived and for deriving the recognition-mode threshold level signal, the multiplication rate of said recognition multiplier being greater than that of said record multiplier;
  
  (h) a holding circuit connected to said record and recognition multipliers for (i) passing the multiplied spoken instruction signal as a reference start threshold level when no holding signal is applied thereto and (ii) holding the multiplied spoken instruction signal as a constant end threshold level when the holding signal is applied thereto and for deriving the held signal thereafter until no holding signal is applied thereto;
  
  (i) a level comparator having first and second input terminals respectively connected to said first smoother and to said holding circuit for comparing (i) the first smoothed spoken instruction signal with the reference start threshold level when no holding signal is applied to said holding circuit, and (ii) the first smoothed spoken instruction signal with the reference end threshold level when the holding signal is applied to said holding circuit, said comparator deriving a signal respectively having H and L levels when the amplitude of the signal at the first terminal is greater than the amplitude of the signal at the second terminal and vice versa, in each of the recording mode and recognition mode;
  
  (j) a duration comparator connected to said level comparator and said holding circuit for comparing the pulse width of the H-voltage level signal with a reference start time and for deriving a spoken instruction start command signal when the H-voltage level pulse width exceeds the reference start time and for comparing the pulse width of the L-voltage level signal with a reference end time and for deriving a spoken instruction end command signal when the L-voltage level pulse width exceeds the reference end time, the H-voltage level signal from said duration comparator being applied to said holding circuit as the holding signal; and
  
  (k) a speech recognizer connected to said duration comparator for starting recognition of the spoken instruction signal coupled through the microphone in response to the start command signal and stopping recognition of the same signal in response to the end command signal.

4. A speech recognition system for activating an automotive vehicle actuator in response to a command spoken instruction signal coupled through and transduced by a microphone while the system is responsive to a recognition-mode command signal, the system responding to the instruction to determine the extent to which the instruction resembles a reference spoken instruction signal previously coupled to the system through the microphone while the system was responsive to a recording-mode command signal, which comprises:
- (a) a record switch for deriving a recording-mode command signal when closed;
  
  (b) a recognition switch for deriving a recognition-mode command signal when closed; and
  
  (c) a microcomputer connected to said record switch and said recognition switch for;
  
  (1) deriving a spoken instruction and sampling spoken instruction signal level data Sn for each of plural sampling time intervals,(2) calculating the power Pn of a spoken instruction signal as a first function of the sampled signal level data Sn,(3) determining whether the system is in recording mode or in recognition mode in response to the value of at least one of the recording-mode command signal and the recognition-mode command signal,(4) calculating a recording-mode reference threshold level E_n1 as a second function of the calculated sample power level data Pn when the microcomputer determines that the system is in the recording-mode,(5) calculating a recognition-mode reference threshold level E_n2 as a third function of the calculated sample power level data Pn when the microcomputer determines that the system is in the recognition mode, levels E_n1 and E_n2 being calculated in such a way that E_n2 is higher than E_n1,(6) comparing the sample power level data Pn with the calculated start reference threshold voltage level En,(7) counting the number M₁ of sample power level data Pn exceeding the reference threshold voltage level En while a sampled sample power level data Pn exceeds the calculated start reference threshold voltage level En,(8) comparing the counted number M₁ with a reference start number W₁,(9) deriving a spoken instruction start command signal M₃ and storing the sampled signal power level data Pn in a memory sequentially, while the counted number M₁ exceeds the reference start number W₁,(10) sampling the spoken instruction signal power level data Pn again while the counted number M₁ does not exceed the reference start number W₁,(11) counting the number M₂ of the sample power level data Pn dropping below the threshold level En while the sample power level data Pn is less than the calculated start threshold level En,(12) comparing the counted number M₂ with a reference end number W₂,(13) deriving a spoken instruction end command signal and storing the sample power level data Pn in a memory sequentially while the counted number M₁ exceeds the reference end number W₂,(14) sampling the spoken instruction signal power level data Pn while the counted number M₂ does not exceed the reference end number W₂, and(15) starting recognition of the spoken instruction signal coupled through the microphone in response to the start command signal and stopping recognition of the same signal in response to the end command signal.

5. A method of detecting the start and end of a spoken instruction coupled through a microphone to a speech recognition system capable of operating in a recognition mode and in a recording mode, the system previously being responsive to a spoken instruction coupled to it while comprising the steps of:
- (a) sampling spoken instruction signal level data Sn during each of plural sampling time intervals;
  
  (b) calculating the power Pn of a spoken instruction signal as a first function of the sampled signal voltage level data Sn;
  
  (c) determining whether the system is in recording mode or in recognition mode;
  
  (d) calculating a recording-mode reference threshold level E_n1 as a second function of the calculated sample power level data Pn while the system is determined to be in the recording-mode;
  
  (e) calculating a recognition-mode reference threshold level E_n2 as a third function of the calculated sample power level data Pn while the system is determined to be in the recognition-mode, the second and third functions having constants causing the calculated recording-mode reference threshold level to be lower than the calculated recognition-mode reference threshold level;
  
  (f) comparing the sampled power level data Pn with the calculated start reference threshold voltage levels En;
  
  (g) counting the number M₁ of power level data Pn exceeding the threshold level En while the sample power level data Pn exceeds the calculated start reference threshold level En;
  
  (h) comparing the counted number M₁ with a reference start number W₁ ;
  
  (i) deriving a spoken instruction start command signal M₃ and storing the sampled power level data Pn in a memory sequentially while the counted number M₁ exceeds the reference start number W₁ ;
  
  (j) returning to step (a) above in response to the counted number M₁ not exceeding the reference start number W₁ in step (i);
  
  (k) counting the number M₂ of power level data Pn dropping below the threshold level En while the sample power level data Pn drops below the calculated end reference threshold voltage level En;
  
  (l) comparing the counted number M₂ with a reference end number W₂ ;
  
  (m) deriving a spoken instruction end command signal and storing the sample power level data Pn in a memory sequentially while the counted number M₂ exceeds the reference end number W₂ ; and
  
  (n) returning to step (k) in response to the counted number M₂ not exceeding the reference end number W₂ in step (m).
- View Dependent Claims (6, 7)
- - 6. The detecting method of claim 5, wherein the first function for calculating the power level data Pn of a spoken instruction signal is ##EQU5## where q denotes the number of bandpass filters for analyzing the sampled spoken instruction signal level data Sn, and Wi denotes weighting coeffecients for each of said filters.
  - 7. The detecting method of claim 5, wherein the second function for calculating the recording-mode reference threshold level is ##EQU6## and the third function for calculating the recognition-mode reference threshold level is ##EQU7## where p denotes the time period of the sampling time intervals and α
    - ₁, α
      
      ₂ and β
      
      ₁, β
      
      ₂ denote constants determined in such a way that α
      
      ₁ is smaller than α
      
      ₂, and β
      
      ₁ is smaller than β
      
      ₂.

8. A speech recognition system capable of operating in environments having different background noise levels such that during a record operating mode the noise level has a tendency to be considerably lower than during a recognition operating mode, the system comprising means for establishing the record and recognition operating modes, means for deriving first and second signals respectively representing power in two different low pass spectra as derived in response to speech uttered for recognition and recording by the system in the presence of background noise in the environment where the utterance occurs, the first signal including frequency components higher than those of the second signal, means for changing the relative values of the first and second signals, signal amplitude comparison means, means coupled to the establishing means, the deriving means, the changing means and the comparison means for activating the comparison means so it derives:
- (a) a first signal level indicating that an utterance is occurring when (i) a first function of the magnitude associated with the first signal exceeds the magnitude of the second signal for more than a predetermined interval while the system is in the record operating mode, and (ii) a second function of the magnitude associated with the first signal exceeds the magnitude of the second signal for more than the predetermined interval while the system is in the recognition operation mode, (b) a second signal level indicating that an utterance is no longer occurring when (i) the first function of the magnitude associated with the first signal is less than the magnitude of the second signal for a set interval while the system is in the recording operating mode, and (ii) the second function of the magnitude associated with the first signal is less than the magnitude of the second signal for the set interval while the system is in the recognition operating mode, the first and second functions being respectively of the form
  space="preserve" listing-type="equation">f.sub.1 =α
  
  .sub.1 P+β
  
  .sub.1
  space="preserve" listing-type="equation">f.sub.2 =α
  
  .sub.2 P+β
  
  .sub.2
  wheref₁ =the first functionf₂ =the second functionα
  
  .sub. = a first predetermined, non-zero constantα
  
  ₂ =a second predetermined, non-zero constantα
  
  ₁ >
  
  α
  
  ₂P=a function of the power spectrum of the utterance and the background noise,β
  
  ₁ =a third predetermined constant that may be zeroβ
  
  ₂ =a fourth predetermined constant that may be zeroβ
  
  ₁ >
  
  β
  
  ₂ unless β
  
  ₁ =β
  
  ₂ =0,means for recording signals representing tonal characteristics of the utterance while the first signal level is derived and while the system is in the record operating mode, so that several different tonal characteristics representing signals are recorded for several different utterances,and means for comparing signals representing tonal characteristics of the utterance with the recorded tonal characteristics for the several different utterances while the first signal level is derived and while in the recognition operating mode,the tonal recording and tonal comparing means being disabled while the second signal level is derived.
- View Dependent Claims (9, 10, 11, 12, 13)
- - 9. The system of claim 8 wherein the first and second signal deriving means includes a microphone responsive to the utterance and the background noise for deriving a replica of the utterance and the background noise, means responsive to the replica for deriving an additional signal representing single polarity variations of the replica, means responsive to the additional signal for deriving the first and second signals, the first and second signals being first and second smoothing functions of the additional signal such that the first smoothing function has a time constant much less than the second smoothing function.
  - 10. The system of claim 9 wherein the activating means includes a signal holding means, the signal holding means being coupled to the another signal deriving means, the comparison means and the means for deriving first and second signals for maintaining a first input to the comparison means responsive to the second signal at a constant value while the first signal level is derived, the constant value being the amplitude of the second signal when the speech utterance begins.
  - 11. The system of claim 9 wherein the activating means includes a signal holding means, the signal holding means being coupled to the another signal deriving means, the comparison means and the means for deriving first and second signals for maintaining a first input to the comparison means responsive to the second signal at a constant value while the first signal level is derived and for causing said first input to be a replica of the second signal while the another signal indicates that the speech utterance is not occurring.
  - 12. The system of claim 8 wherein the activating means includes a signal holding means, the signal holding means being coupled to the another signal deriving means, the comparison means and the means for deriving first and second signals for maintaining a first input to the comparison means responsive to the second signal at a constant value while the first signal level is derived, the constant value being the amplitude of the second signal when the speech utterance begins.
  - 13. The system of claim 8 wherein the activating means includes a signal holding means, the signal holding means being coupled to the another signal deriving means, the comparison means and the means for deriving first and second signals for maintaining a first input to the comparison means responsive to the second signal at a constant value while the first signal level is derived and for causing said first input to be a replica of the second signal while the another signal indicates that the speech utterance is not occurring.

14. A speech recognition system capable of operating in environments having different background noise levels, the system comprising means for establishing record and recognition operating modes, means for deriving first and second signals respectively representing power in two different low pass spectra as derived in response to speech uttered for recognition and recording by the system in the presence of background noise in the environment where the utterance occurs, the first signal including frequency components higher than those of the second signal, signal amplitude comparison means, means coupled to the establishing means, the deriving means, and the comparison means for activating the comparison means so it derives:
- (a) a first signal level indicating that an utterance is occurring when a function of the magnitude associated with the first signal exceeds the magnitude of the second signal for more than a predetermined interval while the system is in the record and recognition operating modes, and (b) a second signal level indicating that an utterance is no longer occurring when the function of the magnitude associated with the first signal is less than the magnitude of the second signal for a set interval while the system is in the recording and recognition modes, the function f, being of the form
  space="preserve" listing-type="equation">f=α
  
  P+β
  whereα
  
  =a first predetermined non-zero constantβ
  
  =a second predetermined constant that may be zeroP=a function of the power spectrum of the utterance and the background noise such that P is a replica of the utterance and background noise power spectrum while the second signal level is derived, P being a constant value commensurate with the power spectrum of the utterance and background noise at the time the first signal level is initially derived, the constant value of P being derived throughout the interval of the first signal level;
  
  means for recording signals representing a voice power spectrum of the utterance while the first signal level is derived and while the system is in the record operating mode so that several different tonal characteristics representing signals are recorded for several different utterances,and means for comparing signals representing voice power spectrum of the utterance with the recorded voice power spectrum for the several different utterances while the first signal level is derived and while the system is in the recognition operating mode,the voice power spectrum recording and voice power spectrum comparing means being disabled while the second signal level is derived.
- View Dependent Claims (15)
- - 15. The system of claim 14 wherein the first and second signal deriving means includes a microphone responsive to the utterance and the background noise for deriving a replica of the utterance and the background noise, means responsive to the replica for deriving an additional signal representing single polarity variations of the replica, means repsonsive to the additional signal for deriving the first and second signals, the first and second signals being first and second smoothed functions of the additional signal such that the first smoothed function has a time constant much less than the second smoothed function.

16. A speech recognition method that is performed by a system in environments having different background noise levels such that during a record operating mode the noise level has a tendency to be considerably lower than during a recognition operating mode, the method comprising:
- establishing the record and recognition operating modes in the system,deriving first and second signals respectively representing power in two different low pass spectra as derived in response to speech uttered for recognition and recording by the system in the presence of background noise in the environment where the utterance is occurring,deriving a first signal level indicating that an utterance is occurring when (i) a first function of the magnitude associated with the first signal exceeds the magnitude of the second signal for more than a predetermined interval while the system is in the record operating mode, and (ii) a second function of the magnitude associated with the first signal exceeds the magnitude of the second signal for more than the predetermined interval while the system is in the recognition operation mode,
  deriving a second signal level indicating that an utterance is no longer occurring when (i) the first function of the magnitude associated with the first signal is less than the magnitude of the second signal for a set interval while the system is in the recording operating mode, and (ii) the second function of the magnitude associated with the first signal is less than the magnitude of the second signal for the set interval while the system is in the recognition operating mode, the first and second functions being respectively of the form
  space="preserve" listing-type="equation">f.sub.1 =α
  
  .sub.1 P+β
  
  .sub.1
  space="preserve" listing-type="equation">f.sub.2 =α
  
  .sub.2 P+β
  
  .sub.2
  wheref₁ =the first functionf₂ =the second functionα
  
  .sub. = a first predetermined, non-zero constantα
  
  ₂ =a second predetermine, non-zero constantα
  
  ₁ >
  
  α
  
  ₂P=a function of the power spectrum of the utterance and the background noise,β
  
  ₁ =a third predetermined constant that may be zeroβ
  
  ₂ =a fourth predetermined constant that may be zeroβ
  
  ₁ >
  
  β
  
  ₂ unless β
  
  ₁ =β
  
  ₂ =0,recording signals representing tonal characteristics of the utterance while the first signal level is derived and while the system is in the record operating mode, so that several different tonal characteristics representing signals are recorded for several different utterances,comparing signals representing tonal characteristics of the utterance with the recorded tonal characteristics for the several different utterances while the first signal level is derived and while the system is in the recognition operating mode,the tonal recording and comparing steps being disabled while the second signal level is derived.
- View Dependent Claims (17)
- - 17. The method of claim 16 wherein P is a replica of the utterance and background noise power spectrum while the second signal level is derived, and P is a constant value commensurate with the power spectrum of the utterance and background noise at the time the first signal level is initially derived, the constant value of P being derived throughout the interval of the first signal level.

18. A speech recognition method that is performed by a system in environments having different background noise levels, the method comprising:
- establishing record and recognition operating modes in the system,deriving first and second signals respectively representing power in two different low pass spectra as derived in response to speech uttered for recognition and recording by the system in the presence of background noise in the environment where the utterance is occurring,deriving a first signal level indicating that an utterance is occurring when a function of the magnitude associated with the first signal exceeds the magnitude of the second signal for more than a predetermined interval while the system is in the record and recognition operating modes,
  deriving a second signal level indicating that an utterance is no longer occurring when the function of the magnitude associated with the first signal is less than the magnitude of the second signal for a set interval while the system is in the recording and recognition operating modes, the function f being of the form
  space="preserve" listing-type="equation">f=α
  
  P+β
  whereα
  
  =a first predetermined non-zero constantβ
  
  =a second predetermined constant that may be zeroP=a function of the power spectrum of the utterance and the background noise such that P is a replica of the utterance and background noise power spectrum while the second signal level is derived, and P is a constant value commensurate with the power spectrum of the utterance and background noise at the time the first signal level is initially derived, the constant value of P being derived throughout the interval of the first signal level,recording signals representing a voice power spectrum of the utterance while the first signal level is derived and while the system is in the record operating mode so that several different voice power spectrum representing signals are recorded for several different utterances,and comparing signals representing a voice power spectrum of the utterance with the recorded voice power spectrum for the several different utterances while the first signal level is derived and while the system is in the recognition operating mode,the voice power spectrum recording and voice power spectrum comparing steps being disabled while the second signal level is derived.

19. A speech recognition system capable of operating in environments having different background noise levels, the system previously having recorded therein signals representing a voice power spectrum of speech utterances to be recognized thereby, the system comprising means for deriving first and second signals respectively representing power in two different low pass spectra as derived in response to speech uttered for recognition by the system in the presence of background noise in the environment where the utterance occurs, the first signal including frequency components higher than those of the second signal, signal amplitude comparison means, means coupled to the deriving means and the comparison means for activating the comparison means so it:
- derives (a) a first signal level indicating that an utterance is occurring when a function of the magnitude associated with the first signalo exceeds the magnitude of the second signal for more than a predetermined interval and (b) a second signal level indicating that an utterance is no longer occurring when the function of the magnitude associated with the first signal is less than the magnitude of the second signal for a set interval, the function f, being of the form
  space="preserve" listing-type="equation">f=α
  
  P+β
  whereα
  
  =a first predetermined non-zero constantβ
  
  =a second predetermined constant that may be zeroP=a function of the power spectrum of the utterance and the background noise such that P is a replica of the utterance and background noise power spectrum while the second signal level is derived, P being a constant value commensurate with the power spectrum of the utterance and background noise at the time the first signal is initially derived, the constant value of P being derived throughout the interval of the first signal level;
  
  means for comparing signals representing the voice power spectrum of the utterance with the recorded voice power spectrum for the several different utterances while the first signal level is derived; and
  
  means for disabling the voice power spectrum comparing means while the second signal level is derived.

20. A speech recognition method that is performed by a system in environments having different background noise levels, the system previously having recorded therein signals representing a voice power spectrum of speech utterances to be recognized thereby, the method comprising:
- deriving first and second signals respectively representing power in two different low pass spectra as derived in response to speech uttered for recognition by the system in the presence of background noise in the environment where the utterance is occurring;
  
  deriving a first signal level indicating that an utterance is occurring when a function of the magnitude associated with the first signal exceeds the magnitude of the second signal for more than a predetermined interval,
  deriving a second signal level indicating that an utterance is no longer occurring when the function of the magnitude associated with the first signal is less than the magnitude of the second signal for a set interval, the function f being of the form
  space="preserve" listing-type="equation">f=α
  
  P+β
  whereα
  
  =a first predetermined non-zero constantβ
  
  =a second predetermined constant that may be zeroP=a function of the power spectrum of the utterance and the background noise such that P is a replica of the utterance and background noise power spectrum while the second signal level is derived, and P is a constant value commensurate with the power spectrum of the utterance and background noise at the time the first signal level is initially derived, the constant value of P being derived throughout the interval of th first signal level,and comparing signals representing the voice power spectrum of the utterance with signals representing the recorded voice power spectrum for the several different utterances while the first signal level is derived,the voice power spectrum comprising step being disabled while the second signal level is derived.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nissan Motor Co., Ltd.
Original Assignee
Nissan Motor Co., Ltd.
Inventors
Futami, Toru, Kishi, Norimasa, Noso, Kazunori
Primary Examiner(s)
Kemeny, E. S. Matt

Application Number

US06/456,326
Time in Patent Office

1,335 Days
Field of Search

381/41-47, 381/110, 364/513, 364/513.5
US Class Current

704/233
CPC Class Codes

B60R 16/0373 Voice control in general G10L

G10L 15/20 Speech recognition techniqu...

Speech recognition system and method for variable noise environment

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

42 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Speech recognition system and method for variable noise environment

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

42 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links