Speech detection and recognition apparatus for use with background noise of varying levels

US 4,829,578 A
Filed: 10/02/1986
Issued: 05/09/1989
Est. Priority Date: 10/02/1986
Status: Expired due to Fees

First Claim

Patent Images

1. Apparatus for detecting whether a portion of an audio signal generated over successive time periods contains speech to be recognized, said apparatus comprising:

speech detection means for comparing the amplitude of the audio signal during successive time periods with one or more amplitude thresholds, and for generating, in response to said comparisons, an indication of whether or not a given portion of said audio signal contains speech to be recognized;

means for deriving a background amplitude level from the amplitude of said audio signal for one or more time periods in which the signal does not contain speech to be recognized, which level indicates the amplitude of the audio signal when it does not represent speech to be recognized;

means for deriving a measure of the spread of the distribution of the background amplitude level; and

means for altering, for purposes of the comparisons of the speech detection means, the relative magnitude of the audio signal amplitudes and the amplitude thresholds as a function of the background amplitude level and spread.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech detection system compares the amplitude of an audio signal during successive time periods with speech detection thresholds, and generates an indication of whether the signal contains speech. It derives a background amplitude level from portions of the signal which it indicates do not contain speech, and improves its speech detection by altering the amplitude of the audio signal relative to the speech detection thresholds as a function of this background level. Preferably the background amplitude level is a moving average, which is repeatedly recalculated and repeatedly used to alter the relative amplitude of the audio signal and the detection thresholds. The apparatus uses a measure of the variability of the background amplitude to improve its speech detection. It generates start-of-speech and end-of-speech indications when the amplitude crosses respective thresholds for specified numbers of frames. The background amplitude level is calculated from frames which precede the start-of-speech indication by a predetermined amount and which follow the end-of-speech indication. The invention also provides a speech recognition system which compares the amplitudes an audio signal against the amplitudes of acoustic models of vocabulary words to determine which vocabulary words correspond to the signal. The system compensates for background noise by using the background amplitude level, described above, to alter the audio signal amplitudes relative to the acoustic model amplitudes.

Citations

15 Claims

1. Apparatus for detecting whether a portion of an audio signal generated over successive time periods contains speech to be recognized, said apparatus comprising:
- speech detection means for comparing the amplitude of the audio signal during successive time periods with one or more amplitude thresholds, and for generating, in response to said comparisons, an indication of whether or not a given portion of said audio signal contains speech to be recognized;
  
  means for deriving a background amplitude level from the amplitude of said audio signal for one or more time periods in which the signal does not contain speech to be recognized, which level indicates the amplitude of the audio signal when it does not represent speech to be recognized;
  
  means for deriving a measure of the spread of the distribution of the background amplitude level; and
  
  means for altering, for purposes of the comparisons of the speech detection means, the relative magnitude of the audio signal amplitudes and the amplitude thresholds as a function of the background amplitude level and spread.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. Apparatus as described in claim 1 wherein said means for deriving a background amplitude level includes means, responsive to said speech detection means, for deriving said background amplitude level from the amplitudes of said audio signal for one or more periods of time which the speech detection means indicates do not contain speech to be recognized.
  - 3. Apparatus as described in claim 2 in which:
    - said means for deriving a background amplitude level includes means for repeatedly recalculating said background amplitude level in response to changes in the amplitude of the audio signal for time periods which the speech detection means indicates do not contain speech to be recognized; and
      
      said means for altering includes means for repeatedly altering the magnitude of said audio signal amplitudes relative to said amplitude thresholds in response to changes in said background amplitude level.
  - 4. Apparatus as described in claim 2 wherein said means for deriving a background amplitude level includes means for calculating that level as an average of said amplitudes during periods of time indicated by said speech detection means as not corresponding to speech.
  - 5. Apparatus as described in claim 4 wherein said means for calculating the background amplitude level includes means for calculating that level as a weighted average of such amplitudes.
  - 6. Apparatus as described in claim 1 wherein:
    - said speech detection means includes means for indicating an end of speech in the audio signal when a portion of that signal has audio signal amplitudes for one or more time periods below an end-of-speech threshold;
      
      said speech detection means includes means for raising and decreasing said end-of-speech threshold relative to said amplitude measurements in correspondence to rises and decreases in said measurement of spread of the background amplitude level.
  - 7. Apparatus as described in claim 1 wherein said speech detection means includes means for generating a start-of-speech indication for a portion of the signal in which audio signal amplitudes for a plurality of time periods exceed a certain threshold amplitude.
  - 8. Apparatus as described in claim 1 wherein said speech detection means include means for generating an end-of-speech indication for a portion of the signal in which audio signal amplitudes for a plurality of time periods are below a certain threshold amplitude.
  - 9. Apparatus as described in claim 1 wherein said speech detection means includes:
    - means for generating a start-of-speech indication for a portion of the signal in which audio signal amplitudes for a plurality of time periods exceed a speech threshold amplitude; and
      
      means for generating an end-of-speech indication for a second portion of the signal in which audio signal amplitudes for a plurality of time periods are below a no-speech threshold amplitude.
  - 10. Apparatus as described in claim 9 wherein:
    - said means for generating a start-of-speech indication includes means for generating that indication when a portion of the signal having a first maximum duration has amplitudes which exceed said speech threshold amplitude during a first number of time periods; and
      
      said means for generating an end-of-speech indication includes means for generating that indication when a portion of the signal having a second maximum duration, which is longer than the first maximum duration, has amplitudes which exceed said speech threshold amplitude during a second number of time periods, which is greater than said first number of time periods.
  - 11. Apparatus as described in claim 9 wherein said means for deriving a background amplitude level includes selecting means for causing it to derive said level only from audio signal amplitudes for time periods which do not occur between a start-of-speech indication and its following end-of-speech indication generated by said means for generating.
  - 12. Apparatus as described in claim 11 wherein said selecting means includes means for causing the means for deriving a background level not to derive said level from audio signal amplitudes for time periods which occur within a certain time period before a time period associated with a start-of-speech indication.

13. A speech recognition system comprising:
- means for receiving a representation of an audio signal, including amplitude measurements of successive parts of said signal;
  
  means for strong acoustic models, including amplitude descriptions, associated with the sounds of vocabulary words;
  
  recognition means for comparing the representation of a portion of the audio signal against the acoustic models, and for determining, as a result of those comparisons, which one or more vocabulary words most probably correspond to that representation, the comparison being based, at least in part, on the comparison of the amplitude measurements of the signal representation against the amplitude descriptions of the acoustic models;
  
  means for deriving a background amplitude description from one or more amplitude measurements taken from a portion of the signal representation which does not contain speech to be recognized, which description provides a model of said one or more amplitude measurements; and
  
  normalization means for altering the magnitude of the amplitude measurements from the signal representation relative to the magnitude of the amplitude descriptions from the acoustic models as a function of the background amplitude description.
- View Dependent Claims (14, 15)
- - 14. A speech recognition system as described in claim 13 wherein:
    - said system further includes speech detection means for generating an indication of whether or not a given portion of the signal representation contains speech to be recognized; and
      
      said means for producing a background amplitude description includes means for responding to the indication of whether or not the signal representation contains speech to be recognized in determining from which portions of the signal representation to take amplitude measurements used to derive said background amplitude description.
  - 15. A speech recognition system as described in claim 14 wherein:
    - said speech detection means includes means for comparing the amplitude measurements of the signal representation against one or more amplitude thresholds and for generating said indication of whether or not a given portion of the signal representation contains speech to be recognized in response to such comparisons; and
      
      said normalization means includes means for altering the magnitude of the amplitude measurements from the signal representation relative to the magnitude of the one or more amplitude thresholds used by said speech detecting means as a function of the background amplitude description.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Dragon Systems, Inc. (Microsoft Corporation)
Original Assignee
Dragon Systems, Inc. (Microsoft Corporation)
Inventors
Roberts, Jed M.
Primary Examiner(s)
Salce, Patrick R.
Assistant Examiner(s)
Hoff, Marc S.

Application Number

US06/914,667
Time in Patent Office

950 Days
Field of Search

381/43, 381/46, 381/47, 364/513.5
US Class Current

704/233
CPC Class Codes

G10L 15/20 Speech recognition techniqu...

G10L 25/87 Detection of discrete point...

Speech detection and recognition apparatus for use with background noise of varying levels

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

15 Claims

Specification

Solutions

Use Cases

Quick Links

Speech detection and recognition apparatus for use with background noise of varying levels

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

15 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links