SYSTEMS, METHODS, AND APPARATUS FOR SPEECH FEATURE DETECTION
First Claim
Patent Images
1. A method of processing an audio signal, said method comprising:
- for each of a first plurality of consecutive segments of the audio signal, determining that voice activity is present in the segment;
for each of a second plurality of consecutive segments of the audio signal that occurs immediately after the first plurality of consecutive segments in the audio signal, determining that voice activity is not present in the segment;
detecting that a transition in a voice activity state of the audio signal occurs during one among the second plurality of consecutive segments that is not the first segment to occur among the second plurality; and
producing a voice activity detection signal that has, for each segment in the first plurality and for each segment in the second plurality, a corresponding value that indicates one among activity and lack of activity,wherein, for each of the first plurality of consecutive segments, the corresponding value of the voice activity detection signal indicates activity, andwherein, for each of the second plurality of consecutive segments that occurs before the segment in which the detected transition occurs, and based on said determining, for at least one segment of the first plurality, that voice activity is present in the segment, the corresponding value of the voice activity detection signal indicates activity, andwherein, for each of the second plurality of consecutive segments that occurs after the segment in which the detected transition occurs, and in response to said detecting that a transition in the speech activity state of the audio signal occurs, the corresponding value of the voice activity detection signal indicates a lack of activity.
2 Assignments
0 Petitions
Accused Products
Abstract
Implementations and applications are disclosed for detection of a transition in a voice activity state of an audio signal, based on a change in energy that is consistent in time across a range of frequencies of the signal.
-
Citations
48 Claims
-
1. A method of processing an audio signal, said method comprising:
-
for each of a first plurality of consecutive segments of the audio signal, determining that voice activity is present in the segment; for each of a second plurality of consecutive segments of the audio signal that occurs immediately after the first plurality of consecutive segments in the audio signal, determining that voice activity is not present in the segment; detecting that a transition in a voice activity state of the audio signal occurs during one among the second plurality of consecutive segments that is not the first segment to occur among the second plurality; and producing a voice activity detection signal that has, for each segment in the first plurality and for each segment in the second plurality, a corresponding value that indicates one among activity and lack of activity, wherein, for each of the first plurality of consecutive segments, the corresponding value of the voice activity detection signal indicates activity, and wherein, for each of the second plurality of consecutive segments that occurs before the segment in which the detected transition occurs, and based on said determining, for at least one segment of the first plurality, that voice activity is present in the segment, the corresponding value of the voice activity detection signal indicates activity, and wherein, for each of the second plurality of consecutive segments that occurs after the segment in which the detected transition occurs, and in response to said detecting that a transition in the speech activity state of the audio signal occurs, the corresponding value of the voice activity detection signal indicates a lack of activity. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 45)
-
-
12. An apparatus for processing an audio signal, said apparatus comprising:
-
means for determining, for each of a first plurality of consecutive segments of the audio signal, that voice activity is present in the segment; means for determining, for each of a second plurality of consecutive segments of the audio signal that occurs immediately after the first plurality of consecutive segments in the audio signal, that voice activity is not present in the segment; means for detecting that a transition in a voice activity state of the audio signal occurs during one among the second plurality of consecutive segments; and means for producing a voice activity detection signal that has, for each segment in the first plurality and for each segment in the second plurality, a corresponding value that indicates one among activity and lack of activity, and wherein, for each of the first plurality of consecutive segments, the corresponding value of the voice activity detection signal indicates activity, and wherein, for each of the second plurality of consecutive segments that occurs before the segment in which the detected transition occurs, and based on said determining, for at least one segment of the first plurality, that voice activity is present in the segment, the corresponding value of the voice activity detection signal indicates activity, and wherein, for each of the second plurality of consecutive segments that occurs after the segment in which the detected transition occurs, and in response to said detecting that a transition in the speech activity state of the audio signal occurs, the corresponding value of the voice activity detection signal indicates a lack of activity. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 46)
-
-
23. An apparatus for processing an audio signal, said apparatus comprising:
-
a first voice activity detector configured to determine; for each of a first plurality of consecutive segments of the audio signal, that voice activity is present in the segment, and for each of a second plurality of consecutive segments of the audio signal that occurs immediately after the first plurality of consecutive segments in the audio signal, that voice activity is not present in the segment; a second voice activity detector configured to detect that a transition in a voice activity state of the audio signal occurs during one among the second plurality of consecutive segments; and a signal generator configured to produce a voice activity detection signal that has, for each segment in the first plurality and for each segment in the second plurality, a corresponding value that indicates one among activity and lack of activity, wherein, for each of the first plurality of consecutive segments, the corresponding value of the voice activity detection signal indicates activity, and wherein, for each of the second plurality of consecutive segments that occurs before the segment in which the detected transition occurs, and based on said determining, for at least one segment of the first plurality, that voice activity is present in the segment, the corresponding value of the voice activity detection signal indicates activity, and wherein, for each of the second plurality of consecutive segments that occurs after the segment in which the detected transition occurs, and in response to said detecting that a transition in the speech activity state of the audio signal occurs, the corresponding value of the voice activity detection signal indicates a lack of activity. - View Dependent Claims (24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 47, 48)
-
-
34. A computer-readable medium having tangible structures that store machine-executable instructions that when executed by one or more processors cause the one or more processors to:
-
determine, for each of a first plurality of consecutive segments of the multichannel signal, and based on a difference between a first channel of the multichannel signal during the segment and a second channel of the multichannel signal during the segment, that voice activity is present in the segment; determine, for each of a second plurality of consecutive segments of the multichannel signal that occurs immediately after the first plurality of consecutive segments in the multichannel signal, and based on a difference between a first channel of the multichannel signal during the segment and a second channel of the multichannel signal during the segment, that voice activity is not present in the segment; detect that a transition in a voice activity state of the multichannel signal occurs during one among the second plurality of consecutive segments that is not the first segment to occur among the second plurality; and produce a voice activity detection signal that has, for each segment in the first plurality and for each segment in the second plurality, a corresponding value that indicates one among activity and lack of activity, wherein, for each of the first plurality of consecutive segments, the corresponding value of the voice activity detection signal indicates activity, and wherein, for each of the second plurality of consecutive segments that occurs before the segment in which the detected transition occurs, and based on said determining, for at least one segment of the first plurality, that voice activity is present in the segment, the corresponding value of the voice activity detection signal indicates activity, and wherein, for each of the second plurality of consecutive segments that occurs after the segment in which the detected transition occurs, and in response to said detecting that a transition in the speech activity state of the multichannel signal occurs, the corresponding value of the voice activity detection signal indicates a lack of activity. - View Dependent Claims (35, 36, 37, 38, 39, 40, 41, 42, 43, 44)
-
Specification