Detection of voice inactivity within a sound stream
First Claim
1. A method of identifying end-of-speech within an audio stream, comprising:
- analyzing each window of the audio stream in a speech discriminator;
assigning a classification to said each window based on speech discriminator output corresponding to said each window, the classification being selected from a classification set comprising a first classification label corresponding to presence of speech within said each window, a second classification label corresponding to silence within said each window, and a third classification label corresponding to noise in said each window;
incrementing a speech counter when said each window is assigned the first classification label;
incrementing a silence counter when said each window is assigned the second classification label;
incrementing a noise counter when said each window is assigned the third classification label;
clearing the speech counter, the silence counter, and the noise counter when the speech counter exceeds a first limit;
weighting at least one of the silence counter and the noise counter to obtain weighted silence and noise values;
combining the weighted silence and noise values in a result;
comparing the result to a second limit; and
identifying end-of-speech within the audio stream when the non-voice counter reaches a second limit;
wherein the steps of analyzing, assigning, incrementing a speech counter, incrementing a silence counter, incrementing a noise counter, clearing, weighting, combining, comparing, and identifying are performed by at least one processor.
9 Assignments
0 Petitions
Accused Products
Abstract
A method for identifying end of voiced speech within an audio stream of a noisy environment employs a speech discriminator. The discriminator analyzes each window of the audio stream, producing an output corresponding to the window. The output is used to classify the window in one of several classes, for example, (1) speech, (2) silence, or (3) noise. A state machine processes the window classifications, incrementing counters as each window is classified: speech counter for speech windows, silence counter for silence, and noise counter for noise. If the speech counter indicates a predefined number of windows, the state machine clears all counters. Otherwise, the state machine appropriately weights the values in the silence and noise counters, adds the weighted values, and compares the sum to a limit imposed on the number of non-voice windows. When the non-voice limit is reached, the state machine terminates processing of the audio stream.
34 Citations
27 Claims
-
1. A method of identifying end-of-speech within an audio stream, comprising:
-
analyzing each window of the audio stream in a speech discriminator; assigning a classification to said each window based on speech discriminator output corresponding to said each window, the classification being selected from a classification set comprising a first classification label corresponding to presence of speech within said each window, a second classification label corresponding to silence within said each window, and a third classification label corresponding to noise in said each window; incrementing a speech counter when said each window is assigned the first classification label; incrementing a silence counter when said each window is assigned the second classification label; incrementing a noise counter when said each window is assigned the third classification label; clearing the speech counter, the silence counter, and the noise counter when the speech counter exceeds a first limit; weighting at least one of the silence counter and the noise counter to obtain weighted silence and noise values; combining the weighted silence and noise values in a result; comparing the result to a second limit; and identifying end-of-speech within the audio stream when the non-voice counter reaches a second limit; wherein the steps of analyzing, assigning, incrementing a speech counter, incrementing a silence counter, incrementing a noise counter, clearing, weighting, combining, comparing, and identifying are performed by at least one processor. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
-
-
21. A method of identifying end-of-speech within an audio stream, comprising:
-
step for analyzing each window of the audio stream in a speech discriminator; step for assigning a classification to said each window based on speech discriminator output corresponding to said each window, the classification being selected from a classification set comprising a first classification label corresponding to presence of speech within said each window, a second classification label corresponding to silence within said each window, and a third classification label corresponding to noise in said each window; incrementing a speech counter in response to said each window being assigned the first classification label; incrementing a silence counter in response to said each window being assigned the second classification label; incrementing a noise counter in response to said each window being assigned the third classification label; step for determining when the speech counter exceeds a first limit; clearing the speech counter, the silence counter, and the noise counter in response to the speech counter exceeds a first limit; step for weighting at least one of the silence counter and the noise counter to obtain weighted silence and noise values; step for combining the weighted silence and noise values in a result; step for comparing the result to a second limit; and step for identifying end-of-speech within the audio stream in response to the result reaching the second limit; wherein the steps for analyzing, assigning are performed by at least one processor. - View Dependent Claims (22)
-
-
23. Apparatus for processing an audio stream, comprising:
-
a memory storing program code; and a digital processor under control of the program code; wherein the program code comprises; instructions to cause the processor to receive the audio stream in digitized blocks; instructions to segment the digitized blocks into windows; instructions to cause the processor to analyze each window in a speech discriminator; instructions to cause the processor to assign a classification to said each window based on speech discriminator output corresponding to said each window, the classification being selected from a classification set comprising a first classification label corresponding to presence of speech within said each window, a second classification label corresponding, to silence in said each window, and a third classification label corresponding to noise in said each window; instructions to cause the processor to increment a speech counter in response to said each window being assigned the first classification label; instructions to cause the processor to increment a silence counter in response to said each window being assigned the second classification label; instructions to cause the processor to increment a noise counter in response to said each window being assigned the third classification label; instructions to cause the processor to clear the speech counter, the silence counter, and the noise counter in response to the speech counter exceeding a first limit; instructions to cause the processor to weight at least one of the silence counter and the noise counter to obtain weighted silence and noise values; instructions to cause the processor to combine the weighted silence and noise values in a result; instructions to cause the processor to compare the result to a second limit; and instructions to cause the processor to identify end-of-speech within the audio stream in response to the result reaching the second limit. - View Dependent Claims (24, 25, 26, 27)
-
Specification