Apparatus and method to classify sound to detect speech
First Claim
1. A method of operating a system comprising memory and a processor for executing instructions stored in the memory, the instructions comprising a sound classifier, the method comprising:
- receiving an audio signal from an audio input device;
generating a plurality of frames from the audio signal;
analyzing, using the sound classifier, each of the plurality of frames of audio;
classifying, using the sound classifier, a first number of the frames of audio as non-transient background noise;
classifying, using the sound classifier, a second number of the frames of audio as transient noise events;
updating, using the system, a background noise estimate using the audio corresponding to the frames classified as non-transient background noise and not using the audio corresponding to the frames classified as transient noise events; and
providing, using the sound classifier, signals indicative of at least the classifications of the frames of audio to the system.
0 Assignments
0 Petitions
Accused Products
Abstract
Audio frames are classified as either speech, non-transient background noise, or transient noise events. Probabilities of speech or transient noise event, or other metrics may be calculated to indicate confidence in classification. Frames classified as speech or noise events are not used in updating models (e.g., spectral subtraction noise estimates, silence model, background energy estimates, signal-to-noise ratio) of non-transient background noise. Frame classification affects acceptance/rejection of recognition hypothesis. Classifications and other audio related information may be determined by circuitry in a headset, and sent (e.g., wirelessly) to a separate processor-based recognition device.
282 Citations
20 Claims
-
1. A method of operating a system comprising memory and a processor for executing instructions stored in the memory, the instructions comprising a sound classifier, the method comprising:
-
receiving an audio signal from an audio input device; generating a plurality of frames from the audio signal; analyzing, using the sound classifier, each of the plurality of frames of audio; classifying, using the sound classifier, a first number of the frames of audio as non-transient background noise; classifying, using the sound classifier, a second number of the frames of audio as transient noise events; updating, using the system, a background noise estimate using the audio corresponding to the frames classified as non-transient background noise and not using the audio corresponding to the frames classified as transient noise events; and providing, using the sound classifier, signals indicative of at least the classifications of the frames of audio to the system. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A headset, comprising:
-
a first microphone for receiving audio input; a memory; and a processor for executing instructions stored in the memory, the instructions comprising a sound classifier, wherein, when executing the sound classifier, the processor is configured for; receiving a plurality of frames of audio generated from the audio input received by the first microphone; analyzing each of the plurality of frames of audio; classifying a first number of the frames of audio as speech; classifying a second number of the frames of audio as non-transient background noise; classifying a third number of the frames of audio as transient noise events; and transmitting signals indicative of at least the classifications of the frames of audio to a speech recognition system. - View Dependent Claims (7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
-
-
17. A method of operating a system comprising (i) a headset comprising a microphone, memory, and a processor for executing instructions stored in the memory, the instructions comprising a sound classifier and (ii) a speech recognition device comprising memory and a processor for executing instructions stored in the memory, the instructions comprising a speech recognizer, the method comprising:
-
analyzing, with the headset processor, each of a plurality of frames of audio from the microphone; classifying, with the headset processor, a first number of the frames of audio as speech; classifying, with the headset processor, a second number of the frames of audio as non-transient background noise; classifying, with the headset processor, a third number of the frames of audio as transient noise events; generating, with the headset processor, signals indicative of at least the classifications of the frames of audio; receiving, with the speech recognition device processor, the generated signals indicative of at least the classifications of the frames of audio; analyzing, with the speech recognition device processor, the audio from the microphone using the classifications of the frames of audio, stored models, and stored grammars; updating, with the speech recognition device processor, a stored model of the non-transient background noise based on the classifications of the frames of audio; and transmitting, with the speech recognition device processor, recognized text and/or metadata. - View Dependent Claims (18, 19, 20)
-
Specification