Apparatus and method to classify sound to detect speech

US 9,299,344 B2
Filed: 07/01/2015
Issued: 03/29/2016
Est. Priority Date: 03/12/2013
Status: Active Grant

First Claim

Patent Images

1. A method of operating a system comprising memory and a processor for executing instructions stored in the memory, the instructions comprising a sound classifier, the method comprising:

receiving an audio signal from an audio input device;

generating a plurality of frames from the audio signal;

analyzing, using the sound classifier, each of the plurality of frames of audio;

classifying, using the sound classifier, a first number of the frames of audio as non-transient background noise;

classifying, using the sound classifier, a second number of the frames of audio as transient noise events;

updating, using the system, a background noise estimate using the audio corresponding to the frames classified as non-transient background noise and not using the audio corresponding to the frames classified as transient noise events; and

providing, using the sound classifier, signals indicative of at least the classifications of the frames of audio to the system.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Audio frames are classified as either speech, non-transient background noise, or transient noise events. Probabilities of speech or transient noise event, or other metrics may be calculated to indicate confidence in classification. Frames classified as speech or noise events are not used in updating models (e.g., spectral subtraction noise estimates, silence model, background energy estimates, signal-to-noise ratio) of non-transient background noise. Frame classification affects acceptance/rejection of recognition hypothesis. Classifications and other audio related information may be determined by circuitry in a headset, and sent (e.g., wirelessly) to a separate processor-based recognition device.

282 Citations

20 Claims

1. A method of operating a system comprising memory and a processor for executing instructions stored in the memory, the instructions comprising a sound classifier, the method comprising:
- receiving an audio signal from an audio input device;
  
  generating a plurality of frames from the audio signal;
  
  analyzing, using the sound classifier, each of the plurality of frames of audio;
  
  classifying, using the sound classifier, a first number of the frames of audio as non-transient background noise;
  
  classifying, using the sound classifier, a second number of the frames of audio as transient noise events;
  
  updating, using the system, a background noise estimate using the audio corresponding to the frames classified as non-transient background noise and not using the audio corresponding to the frames classified as transient noise events; and
  
  providing, using the sound classifier, signals indicative of at least the classifications of the frames of audio to the system.
- View Dependent Claims (2, 3, 4, 5)
- - 2. The method of operation of claim 1, comprising providing input to the sound classifier from at least two microphones.
  - 3. The method of operation of claim 1, wherein providing the signals indicative of at least the classifications of the frames of audio includes wirelessly providing the signals with at least a logical relationship to respective data that represents audio of at least some of the frames of audio.
  - 4. The method of operation of claim 3, wherein the respective data that represents audio of at least some of the frames of audio includes at least one of autocorrelation coefficients or digitized audio fragments.
  - 5. The method of operation of claim 3, wherein classifying a first number of the frames of audio as non-transient background noise by the sound classifier includes, for each frame of audio determining a metric for the respective frame, comparing the determined metric for the respective frame to an average metric for a plurality of frames of audio, classifying the respective frame as a transient noise if the determined metric for the respective frame exceeds the average metric for the plurality of frames of audio by at least a first threshold, and otherwise classifying the respective frame as a non-transient background noise.

6. A headset, comprising:
- a first microphone for receiving audio input;
  
  a memory; and
  
  a processor for executing instructions stored in the memory, the instructions comprising a sound classifier, wherein, when executing the sound classifier, the processor is configured for;
  
  receiving a plurality of frames of audio generated from the audio input received by the first microphone;
  
  analyzing each of the plurality of frames of audio;
  
  classifying a first number of the frames of audio as speech;
  
  classifying a second number of the frames of audio as non-transient background noise;
  
  classifying a third number of the frames of audio as transient noise events; and
  
  transmitting signals indicative of at least the classifications of the frames of audio to a speech recognition system.
- View Dependent Claims (7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
- - 7. The headset of claim 6, comprising a second microphone for receiving audio input, wherein, when executing the sound classifier, the processor is configured for receiving a plurality of frames of audio generated from the audio input received by the second microphone.
  - 8. The headset of claim 6, wherein transmitting signals indicative of at least the classifications of the frames of audio comprises wirelessly providing the signals with at least a logical relationship to respective data that represents audio of at least some of the frames of audio.
  - 9. The headset of claim 8, wherein the respective data that represents audio of at least some of the frames of audio includes at least one of autocorrelation coefficients or digitized audio fragments.
  - 10. The headset of claim 8, wherein classifying a second number of the frames of audio as non-transient background noise comprises for each frame of audio:
    - determining a metric for the respective frame;
      
      comparing the determined metric for the respective frame to an average metric for a plurality of frames of audio; and
      
      classifying the respective frame as a transient noise if the determined metric for the respective frame exceeds the average metric for the plurality of frames of audio by at least a first threshold and otherwise classifying the respective frame as a non-transient background noise.
  - 11. The headset of claim 6, wherein:
    - the headset wirelessly transmits the signals indicative of at least the classifications of the frames of audio to a speech recognition system separate from the headset that implements a speech recognizer; and
      
      the speech recognizer comprises a speech detector configured for detecting, using the classifications, at least one of a start or a stop of speech.
  - 12. The headset of claim 11, wherein detecting at least one of a start or a stop of speech comprises for each of a set of two or more fragments:
    - determining how many of the fragments in the set are classified as a first one of the classifications; and
      
      treating the entire set as either speech or non-speech based on how many of the fragments in the set are classified as the first one of the classifications.
  - 13. The headset of claim 12, wherein detecting a start of speech comprises identifying a set of fragments in which the number of fragments individually classified as speech exceeds a threshold as constituting speech.
  - 14. The headset of claim 13, comprising for sets of fragments identified as speech, at least one of prepending or postpending additional fragments of audio to the respective set for processing, where the additional fragments of audio occurred immediately before or immediately after the audio fragments of the respective set of fragments.
  - 15. The headset of claim 6, wherein:
    - the headset wirelessly transmits the signals indicative of at least the classifications of the frames of audio to a speech recognition system separate from the headset that implements a speech recognizer; and
      
      the speech recognizer is configured for adjusting a threshold at which a recognized hypothesis based on the audio is either rejected or accepted based at least in part on distinguishing among speech events, non-transient background noise, and transient noise events.
  - 16. The headset of claim 6, wherein:
    - the headset wirelessly transmits the signals indicative of at least the classifications of the frames of audio to a speech recognition system separate from the headset that implements a speech recognizer; and
      
      the speech recognizer is configured for adjusting a confidence value of a hypothesis or portion thereof based at least in part on distinguishing among speech events, non-transient background noise and transient noise events.

17. A method of operating a system comprising (i) a headset comprising a microphone, memory, and a processor for executing instructions stored in the memory, the instructions comprising a sound classifier and (ii) a speech recognition device comprising memory and a processor for executing instructions stored in the memory, the instructions comprising a speech recognizer, the method comprising:
- analyzing, with the headset processor, each of a plurality of frames of audio from the microphone;
  
  classifying, with the headset processor, a first number of the frames of audio as speech;
  
  classifying, with the headset processor, a second number of the frames of audio as non-transient background noise;
  
  classifying, with the headset processor, a third number of the frames of audio as transient noise events;
  
  generating, with the headset processor, signals indicative of at least the classifications of the frames of audio;
  
  receiving, with the speech recognition device processor, the generated signals indicative of at least the classifications of the frames of audio;
  
  analyzing, with the speech recognition device processor, the audio from the microphone using the classifications of the frames of audio, stored models, and stored grammars;
  
  updating, with the speech recognition device processor, a stored model of the non-transient background noise based on the classifications of the frames of audio; and
  
  transmitting, with the speech recognition device processor, recognized text and/or metadata.
- View Dependent Claims (18, 19, 20)
- - 18. The method of claim 17, comprising providing the plurality of frames with at least two microphones.
  - 19. The method of claim 17, wherein the headset comprises a second microphone and the method comprises analyzing, with the headset processor, each of a plurality of frames of audio from the second microphone.
  - 20. The method of claim 17, wherein classifying a second number of the frames of audio as non-transient background noise by the sound classifier comprises, for each frame of audio:
    - determining a metric for the respective frame;
      
      comparing the determined metric for the respective frame to an average metric for a plurality of frames of audio; and
      
      classifying the respective frame as a transient noise if the determined metric for the respective frame exceeds the average metric for the plurality of frames of audio by at least a first threshold, and otherwise classifying the respective frame as a non-transient background noise.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Intermec IP Corporation (Honeywell International Inc.)
Original Assignee
Intermec IP Corporation (Honeywell International Inc.)
Inventors
Braho, Keith P., Hardek, David D.
Primary Examiner(s)
Baker, Charlotte M

Application Number

US14/789,267
Publication Number

US 20150302853A1
Time in Patent Office

272 Days
Field of Search

704/233
US Class Current

1/1
CPC Class Codes

G10L 15/20 Speech recognition techniqu...

G10L 25/78 Detection of presence or ab...

Apparatus and method to classify sound to detect speech

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

282 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Apparatus and method to classify sound to detect speech

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

282 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links