System for detecting voice activity
First Claim
1. A method for detecting voice activity in a communications signal comprising, in combination:
- passing a representation of said communications signal through a first filter and a second filter, whereby the first filter provides a first output that represents a noise floor estimate for said communications signal, and whereby the second filter provides a second output that represents an energy level estimate for said communications signal;
integrating a difference between said first output and said second output over blocks of time, thereby establishing a reference value for each such block;
for each such block, determining whether said reference value represents voice activity;
outputting speech-indicia in response to a determination that said reference value represents voice activity; and
outputting silence-indicia in response to a determination that the reference values established for each of a predetermined number of blocks do not represent voice activity.
7 Assignments
0 Petitions
Accused Products
Abstract
A system for detection of voice activity in a communications signal, employing a nonlinear two filter voice detection algorithm, in which one filter has a low time constant (the fast filter) and one filter has a high time constant (the slow filter). The slow filter serves to provide a noise floor estimate for the incoming signal, and the fast filter serves to more closely represent the total energy in the signal. The absolute value of incoming data is presented to both filters, and the difference in filter outputs is integrated over each of a series of successive frames, thereby giving an indication of the energy level above the noise floor in each frame of the incoming signal. Voice activity is detected if the measured energy level for a frame exceeds a specified threshold level. Silence (e.g., leaving only noise) is detected if the measured energy level for each of a specified number of successive frames does not exceed a specified threshold level. The system enables voice activity to be distinguished from common noise such as pops, clicks and low level cross-talk.
141 Citations
18 Claims
-
1. A method for detecting voice activity in a communications signal comprising, in combination:
-
passing a representation of said communications signal through a first filter and a second filter, whereby the first filter provides a first output that represents a noise floor estimate for said communications signal, and whereby the second filter provides a second output that represents an energy level estimate for said communications signal;
integrating a difference between said first output and said second output over blocks of time, thereby establishing a reference value for each such block;
for each such block, determining whether said reference value represents voice activity;
outputting speech-indicia in response to a determination that said reference value represents voice activity; and
outputting silence-indicia in response to a determination that the reference values established for each of a predetermined number of blocks do not represent voice activity. - View Dependent Claims (2, 3, 4)
-
-
5. A method for detecting voice activity in a communications signal comprising, in combination, the following steps:
-
receiving said communications signal;
rectifying said communications signal, thereby establishing a rectified signal;
passing said rectified signal through at least a first low-pass filter and a second low-pass filter, said first low-pass filter providing a slow filter output representing a noise floor in said rectified signal, and said second low pass filter providing a fast filter output representing an energy level in said rectified signal, whereby a difference between said fast filter output and said slow filter output at a given time defines a filter output difference at said given time;
over a block of time, integrating said filter output difference, thereby establishing a reference value for said block of time;
determining whether said reference value represents voice activity; and
in response to a determination that said reference value represents voice activity, providing an output signal indicating that voice activity is present in said communication signal. - View Dependent Claims (6, 7, 8)
-
-
9. A method for detecting voice activity in a communications signal, said communications signal defining a plurality of successive frames, said method comprising, in combination:
-
(A) receiving as an input signal at least a plurality of said frames;
(B) rectifying said input signal, thereby establishing a rectified signal;
(C) passing said rectified signal through at least a first low-pass filter and a second low-pass filter, said first low-pass filter providing a slow filter output representing a noise floor in said communications signal, and said second low pass filter providing a fast filter output representing an energy level in said communications signal, whereby a difference between said fast filter output and said slow filter output at a given time defines a filter output difference at said given time;
(D) over each of a plurality of said frames, (i) integrating said filter output difference, thereby establishing a reference value for said frame, (ii) determining whether said reference value represents voice activity, (iii) in response to a determination that said reference value represents voice activity, providing a speech-indicia signal, and (iv) in response to a determination that said reference value does not represent voice activity, providing a quiescence-indicia signal; and
(E) in response to more than a predetermined number of successive quiescence-indicia signals, providing a silence-indicia signal.
-
-
10. A system for detecting voice activity in a communications signal, said system comprising a processor and a set of machine language instructions stored in a storage medium and executed by said processor for performing a set of functions comprising, in combination:
-
passing a representation of said communications signal through a first filter and a second filter, whereby the first filter provides a first output that represents a noise floor estimate for said communications signal, and whereby the second filter provides a second output that represents an energy level estimate for said communications signal;
integrating a difference between said first output and said second output over blocks of time, thereby establishing a reference value for each such block;
for each such block, determining whether said reference value represents voice activity;
outputting speech-indicia in response to a determination that said reference value represents voice activity; and
outputting silence-indicia in response to a determination that the reference values established for each of a predetermined number of blocks do not represent voice activity. - View Dependent Claims (11, 12, 13)
-
-
14. An apparatus for detecting voice activity in a communications signal comprising, in combination:
-
a rectifier for rectifying said signal, thereby providing a rectified signal;
a first filter for filtering said rectified signal and providing a first filter output representing a noise floor for said communications signal;
a second filter for filtering said rectified signal and providing a second filter output representing an energy level for said communications signal;
an integrator for summing the difference between said first filter output and said second filter output over each of a plurality of frames of said communications signal, thereby providing a sum for each such frame; and
a comparator for determining whether said sum for a given frame exceeds a threshold value indicative of voice activity, whereby said apparatus finds voice activity in said communications signal in response to the sum for a given frame exceeding said threshold value. - View Dependent Claims (15, 16, 17, 18)
whereby said apparatus finds silence in said communications signal in response to said count reaching a specified value. -
16. An apparatus as claimed in claim 14 further comprising means for resetting said first filter output to the lesser of said first filter output and said second filter output.
-
17. A method as claimed in claim 14, wherein the blocks of time are defined by a sliding window over time.
-
18. A method as claimed in claim 14, wherein the blocks of time comprise successive blocks of time.
-
Specification