Detection of voice inactivity within a sound stream
First Claim
1. A method of identifying end-of-speech within an audio stream, comprising:
- analyzing each window of the audio stream in a speech discriminator;
assigning a classification to said each window based on speech discriminator output corresponding to said each window, the classification being selected from a classification set comprising a first classification label corresponding to presence of speech within said each window, and one or more classification labels corresponding to absence of speech in said each window;
incrementing a speech counter when said each window is assigned the first classification label;
incrementing a non-voice counter when said each window is assigned a classification label corresponding to absence of speech;
clearing the speech counter and the non-voice counter when the speech counter exceeds a first limit; and
identifying end-of-speech within the audio stream when the non-voice counter reaches a second limit.
9 Assignments
0 Petitions
Accused Products
Abstract
A method for identifying end of voiced speech within an audio stream of a noisy environment employs a speech discriminator. The discriminator analyzes each window of the audio stream, producing an output corresponding to the window. The output is used to classify the window in one of several classes, for example, (1) speech, (2) silence, or (3) noise. A state machine processes the window classifications, incrementing counters as each window is classified: speech counter for speech windows, silence counter for silence, and noise counter for noise. If the speech counter indicates a predefined number of windows, the state machine clears all counters. Otherwise, the state machine appropriately weights the values in the silence and noise counters, adds the weighted values, and compares the sum to a limit imposed on the number of non-voice windows. When the non-voice limit is reached, the state machine terminates processing of the audio stream.
58 Citations
69 Claims
-
1. A method of identifying end-of-speech within an audio stream, comprising:
-
analyzing each window of the audio stream in a speech discriminator;
assigning a classification to said each window based on speech discriminator output corresponding to said each window, the classification being selected from a classification set comprising a first classification label corresponding to presence of speech within said each window, and one or more classification labels corresponding to absence of speech in said each window;
incrementing a speech counter when said each window is assigned the first classification label;
incrementing a non-voice counter when said each window is assigned a classification label corresponding to absence of speech;
clearing the speech counter and the non-voice counter when the speech counter exceeds a first limit; and
identifying end-of-speech within the audio stream when the non-voice counter reaches a second limit. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34)
-
-
35. A method of identifying end-of-speech within an audio stream, comprising:
-
analyzing each window in a speech discriminator;
assigning a classification to said each window based on speech discriminator output corresponding to said each window, the classification being selected from a classification set comprising a first classification label corresponding to presence of speech within said each window, a second classification label corresponding to silence within said each window, and a third classification label corresponding to noise in said each window;
incrementing a speech counter when said each window is assigned the first classification label;
incrementing a silence counter when said each window is assigned the second classification label;
incrementing a noise counter when said each window is assigned the third classification label;
clearing the speech counter, the silence counter, and the noise counter when the speech counter exceeds a first limit;
weighting at least one of the silence counter and the noise counter to obtain weighted silence and noise values;
combining the weighted silence and noise values in a result;
comparing the result to a second limit; and
identifying end-of-speech within the audio stream when the result reaches the second limit. - View Dependent Claims (36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54)
-
-
55. A method of identifying end-of-speech within an audio stream, comprising:
-
step for analyzing each window of the audio stream in a speech discriminator;
step for assigning a classification to said each window based on speech discriminator output corresponding to said each window, the classification being selected from a classification set comprising a first classification label corresponding to presence of speech within said each window, and one or more classification labels corresponding to absence of speech in said each window;
incrementing a speech counter when said each window is assigned the first classification label;
incrementing a non-voice counter when said each window is assigned a classification label corresponding to absence of speech;
step for determining when the speech counter exceeds a first limit;
clearing the speech counter and the non-voice counter when the speech counter exceeds the first limit;
step for determining when the non-voice counter reaches a second limit; and
step for identifying end-of-speech within the audio stream when the non-voice counter reaches the second limit. - View Dependent Claims (56)
-
-
57. A method of identifying end-of-speech within an audio stream, comprising:
-
step for analyzing each window of the audio stream in a speech discriminator;
step for assigning a classification to said each window based on speech discriminator output corresponding to said each window, the classification being selected from a classification set comprising a first classification label corresponding to presence of speech within said each window, a second classification label corresponding to silence within said each window, and a third classification label corresponding to noise in said each window;
incrementing a speech counter when said each window is assigned the first classification label;
incrementing a silence counter when said each window is assigned the second classification label;
incrementing a noise counter when said each window is assigned the third classification label;
step for determining when the speech counter exceeds a first limit;
clearing the speech counter, the silence counter, and the noise counter when the speech counter exceeds the first limit;
step for weighting at least one of the silence counter and the noise counter to obtain weighted silence and noise values;
step for combining the weighted silence and noise values in a result;
step for comparing the result to a second limit; and
step for identifying end-of-speech within the audio stream when the result reaches the second limit. - View Dependent Claims (58)
-
-
59. Apparatus for processing an audio stream, comprising:
-
a memory storing program code; and
a digital processor under control of the program code;
wherein the program code comprises;
instructions to cause the processor to receive the audio stream in digitized blocks;
instructions to segment the digitized blocks into windows;
instructions to cause the processor to analyze each window in a speech discriminator;
instructions to cause the processor to assign a classification to said each window based on speech discriminator output corresponding to said each window, the classification being selected from a classification set comprising a first classification label corresponding to presence of speech within said each window, and one or more classification labels corresponding to absence of speech in said each window;
instructions to cause the processor to increment a speech counter when said each window is assigned the first classification label;
instructions to cause the processor to increment a non-voice counter when said each window is assigned a classification label corresponding to absence of speech;
instructions to cause the processor to clear the speech counter and the non-voice counter when the speech counter exceeds a first limit; and
instructions to cause the processor to identify end-of-speech within the audio stream when the non-voice counter reaches a second limit. - View Dependent Claims (60, 61, 62, 63)
-
-
64. Apparatus for processing an audio stream, comprising:
-
a memory storing program code; and
a digital processor under control of the program code;
wherein the program code comprises;
instructions to cause the processor to receive the audio stream in digitized blocks;
instructions to segment the digitized blocks into windows;
instructions to cause the processor to analyze each window in a speech discriminator;
instructions to cause the processor to assign a classification to said each window based on speech discriminator output corresponding to said each window, the classification being selected from a classification set comprising a first classification label corresponding to presence of speech within said each window, a second classification label corresponding to silence in said each window, and a third classification label corresponding to noise in said each window;
instructions to cause the processor to increment a speech counter when said each window is assigned the first classification label;
instructions to cause the processor to increment a silence counter when said each window is assigned the second classification label;
instructions to cause the processor to increment a noise counter when said each window is assigned the third classification label;
instructions to cause the processor to clear the speech counter, the silence counter, and the noise counter when the speech counter exceeds a first limit;
instructions to cause the processor to weight at least one of the silence counter and the noise counter to obtain weighted silence and noise values;
instructions to cause the processor to combine the weighted silence and noise values in a result;
instructions to cause the processor to compare the result to a second limit; and
instructions to cause the processor to identify end-of-speech within the audio stream when the result reaches the second limit. - View Dependent Claims (65, 66, 67, 68)
-
-
69. An article of manufacture comprising a machine-readable storage medium with instruction code stored in the medium, said instruction code, when executed by a data processing apparatus comprising a processor receiving an audio stream in digitized blocks, causes the processor to
segment the digitized blocks into windows; -
analyze each window in a speech discriminator;
assign a classification to said each window based on speech discriminator output corresponding to said each window, the classification being selected from a classification set comprising a first classification label corresponding to presence of speech within said each window, and one or more classification labels corresponding to absence of speech in said each window;
increment a speech counter when said each window is assigned the first classification label;
increment a non-voice counter when said each window is assigned a classification label corresponding to absence of speech;
clear the speech counter and the non-voice counter when the speech counter exceeds a first limit; and
identify end-of-speech within the audio stream when the non-voice counter reaches a second limit.
-
Specification