SYSTEMS, METHODS, AND APPARATUS FOR SPEECH FEATURE DETECTION

US 20110264447A1
Filed: 04/22/2011
Published: 10/27/2011
Est. Priority Date: 04/22/2010
Status: Active Grant

First Claim

Patent Images

1. A method of processing an audio signal, said method comprising:

for each of a first plurality of consecutive segments of the audio signal, determining that voice activity is present in the segment;

for each of a second plurality of consecutive segments of the audio signal that occurs immediately after the first plurality of consecutive segments in the audio signal, determining that voice activity is not present in the segment;

detecting that a transition in a voice activity state of the audio signal occurs during one among the second plurality of consecutive segments that is not the first segment to occur among the second plurality; and

producing a voice activity detection signal that has, for each segment in the first plurality and for each segment in the second plurality, a corresponding value that indicates one among activity and lack of activity,wherein, for each of the first plurality of consecutive segments, the corresponding value of the voice activity detection signal indicates activity, andwherein, for each of the second plurality of consecutive segments that occurs before the segment in which the detected transition occurs, and based on said determining, for at least one segment of the first plurality, that voice activity is present in the segment, the corresponding value of the voice activity detection signal indicates activity, andwherein, for each of the second plurality of consecutive segments that occurs after the segment in which the detected transition occurs, and in response to said detecting that a transition in the speech activity state of the audio signal occurs, the corresponding value of the voice activity detection signal indicates a lack of activity.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Implementations and applications are disclosed for detection of a transition in a voice activity state of an audio signal, based on a change in energy that is consistent in time across a range of frequencies of the signal.

Citations

48 Claims

1. A method of processing an audio signal, said method comprising:
- for each of a first plurality of consecutive segments of the audio signal, determining that voice activity is present in the segment;
  
  for each of a second plurality of consecutive segments of the audio signal that occurs immediately after the first plurality of consecutive segments in the audio signal, determining that voice activity is not present in the segment;
  
  detecting that a transition in a voice activity state of the audio signal occurs during one among the second plurality of consecutive segments that is not the first segment to occur among the second plurality; and
  
  producing a voice activity detection signal that has, for each segment in the first plurality and for each segment in the second plurality, a corresponding value that indicates one among activity and lack of activity,wherein, for each of the first plurality of consecutive segments, the corresponding value of the voice activity detection signal indicates activity, andwherein, for each of the second plurality of consecutive segments that occurs before the segment in which the detected transition occurs, and based on said determining, for at least one segment of the first plurality, that voice activity is present in the segment, the corresponding value of the voice activity detection signal indicates activity, andwherein, for each of the second plurality of consecutive segments that occurs after the segment in which the detected transition occurs, and in response to said detecting that a transition in the speech activity state of the audio signal occurs, the corresponding value of the voice activity detection signal indicates a lack of activity.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 45)
- - 2. The method according to claim 1, wherein said method comprises calculating a time derivative of energy for each of a plurality of different frequency components of the first channel during said one among the second plurality of segments, andwherein said detecting that the transition occurs during said one among the second plurality of segments is based on the calculated time derivatives of energy.
  - 3. The method according to claim 2, wherein said detecting that the transition occurs includes, for each of the plurality of different frequency components, and based on the corresponding calculated time derivative of energy, producing a corresponding indication of whether the frequency component is active, andwherein said detecting that the transition occurs is based on a relation between the number of said indications that indicate that the corresponding frequency component is active and a first threshold value.
  - 4. The method according to claim 3, wherein said method comprises, for a segment that occurs prior to the first plurality of consecutive segments in the audio signal:
    - calculating a time derivative of energy for each of a plurality of different frequency components of the first channel during the segment;
      
      for each of the plurality of different frequency components, and based on the corresponding calculated time derivative of energy, producing a corresponding indication of whether the frequency component is active; and
      
      determining that a transition in a voice activity state of the audio signal does not occur during the segment, based on a relation between (A) the number of said indications that indicate that the corresponding frequency component is active and (B) a second threshold value that is higher than said first threshold value.
  - 5. The method according to claim 3, wherein said method comprises, for a segment that occurs prior to the first plurality of consecutive segments in the audio signal:
    - calculating, for each of a plurality of different frequency components of the first channel during the segment, a second derivative of energy with respect to time;
      
      for each of the plurality of different frequency components, and based on the corresponding calculated second derivative of energy with respect to time, producing a corresponding indication of whether the frequency component is impulsive; and
      
      determining that a transition in a voice activity state of the audio signal does not occur during the segment, based on a relation between the number of said indications that indicate that the corresponding frequency component is impulsive and a threshold value.
  - 6. The method according to claim 1, wherein, for each of the first plurality of consecutive segments of the audio signal, said determining that voice activity is present in the segment is based on a difference between a first channel of the audio signal during the segment and a second channel of the audio signal during the segment, andwherein, for each of the second plurality of consecutive segments of the audio signal, said determining that voice activity is not present in the segment is based on a difference between a first channel of the audio signal during the segment and a second channel of the audio signal during the segment.
  - 7. The method according to claim 6, wherein, for each segment of said first plurality and for each segment of said second plurality, said difference is a difference between a level of the first channel and a level of the second channel during the segment.
  - 8. The method according to claim 6, wherein, for each segment of said first plurality and for each segment of said second plurality, said difference is a difference in time between an instance of a signal in the first channel during the segment and an instance of said signal in the second channel during the segment.
  - 9. The method according to claim 6, wherein, for each segment of said first plurality, said determining that voice activity is present in the segment comprises calculating, for each of a first plurality of different frequency components of the audio signal during the segment, a difference between a phase of the frequency component in the first channel and a phase of the frequency component in the second channel, wherein said difference between the first channel during the segment and the second channel during the segment is one of said calculated phase differences, andwherein, for each segment of said second plurality, said determining that voice activity is not present in the segment comprises calculating, for each of the first plurality of different frequency components of the audio signal during the segment, a difference between a phase of the frequency component in the first channel and a phase of the frequency component in the second channel, wherein said difference between the first channel during the segment and the second channel during the segment is one of said calculated phase differences.
  - 10. The method according to claim 9, wherein said method comprises calculating a time derivative of energy for each of a second plurality of different frequency components of the first channel during said one among the second plurality of segments, andwherein said detecting that the transition occurs during said one among the second plurality of segments is based on the calculated time derivatives of energy, andwherein a frequency band that includes the first plurality of frequency components is separate from a frequency band that includes the second plurality of frequency components.
  - 11. The method according to claim 9, wherein, for each segment of said first plurality, said determining that voice activity is present in the segment is based on a corresponding value of a coherency measure that indicates a degree of coherence among the directions of arrival of at least the plurality of different frequency components, wherein said value is based on information from the corresponding plurality of calculated phase differences, andwherein, for each segment of said second plurality, said determining that voice activity is not present in the segment is based on a corresponding value of the coherency measure that indicates a degree of coherence among the directions of arrival of at least the plurality of different frequency components, wherein said value is based on information from the corresponding plurality of calculated phase differences.
  - 45. The method according to claim 1, wherein said method comprises:
    - calculating a time derivative of energy for each of a plurality of different frequency components of the first channel during a segment of one of the first and second pluralities of segments; and
      
      producing a voice activity detection indication for said segment of one of the first and second pluralities,wherein said producing the voice activity detection indication includes comparing a value of a test statistic for the segment to a value of a threshold, andwherein said producing the voice activity detection indication includes modifying a relation between the test statistic and the threshold, based on said calculated plurality of time derivatives of energy, andwherein a value of said voice activity detection signal for said segment of one of the first and second pluralities is based on said voice activity detection indication.

12. An apparatus for processing an audio signal, said apparatus comprising:
- means for determining, for each of a first plurality of consecutive segments of the audio signal, that voice activity is present in the segment;
  
  means for determining, for each of a second plurality of consecutive segments of the audio signal that occurs immediately after the first plurality of consecutive segments in the audio signal, that voice activity is not present in the segment;
  
  means for detecting that a transition in a voice activity state of the audio signal occurs during one among the second plurality of consecutive segments; and
  
  means for producing a voice activity detection signal that has, for each segment in the first plurality and for each segment in the second plurality, a corresponding value that indicates one among activity and lack of activity, andwherein, for each of the first plurality of consecutive segments, the corresponding value of the voice activity detection signal indicates activity, andwherein, for each of the second plurality of consecutive segments that occurs before the segment in which the detected transition occurs, and based on said determining, for at least one segment of the first plurality, that voice activity is present in the segment, the corresponding value of the voice activity detection signal indicates activity, andwherein, for each of the second plurality of consecutive segments that occurs after the segment in which the detected transition occurs, and in response to said detecting that a transition in the speech activity state of the audio signal occurs, the corresponding value of the voice activity detection signal indicates a lack of activity.
- View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 46)
- - 13. The apparatus according to claim 12, wherein said apparatus comprises means for calculating a time derivative of energy for each of a plurality of different frequency components of the first channel during said one among the second plurality of segments, andwherein said means for detecting that the transition occurs during said one among the second plurality of segments is configured to detect the transition based on the calculated time derivatives of energy.
  - 14. The apparatus according to claim 13, wherein said means for detecting that the transition occurs includes means for producing, for each of the plurality of different frequency components, and based on the corresponding calculated time derivative of energy, a corresponding indication of whether the frequency component is active, andwherein said means for detecting that the transition occurs is configured to detect the transition based on a relation between the number of said indications that indicate that the corresponding frequency component is active and a first threshold value.
  - 15. The apparatus according to claim 14, wherein said apparatus comprises:
    - means for calculating, for a segment that occurs prior to the first plurality of consecutive segments in the audio signal, a time derivative of energy for each of a plurality of different frequency components of the first channel during the segment;
      
      means for producing, for each of said plurality of different frequency components of said segment that occurs prior to the first plurality of consecutive segments in the audio signal, and based on the corresponding calculated time derivative of energy, a corresponding indication of whether the frequency component is active; and
      
      means for determining that a transition in a voice activity state of the audio signal does not occur during said segment that occurs prior to the first plurality of consecutive segments in the audio signal, based on a relation between (A) the number of said indications that indicate that the corresponding frequency component is active and (B) a second threshold value that is higher than said first threshold value.
  - 16. The apparatus according to claim 14, wherein said apparatus comprises:
    - means for calculating, for a segment that occurs prior to the first plurality of consecutive segments in the audio signal, a second derivative of energy with respect to time for each of a plurality of different frequency components of the first channel during the segment;
      
      means for producing, for each of the plurality of different frequency components of said segment that occurs prior to the first plurality of consecutive segments in the audio signal, and based on the corresponding calculated second derivative of energy with respect to time, a corresponding indication of whether the frequency component is impulsive; and
      
      means for determining that a transition in a voice activity state of the audio signal does not occur during said segment that occurs prior to the first plurality of consecutive segments in the audio signal, based on a relation between the number of said indications that indicate that the corresponding frequency component is impulsive and a threshold value.
  - 17. The apparatus according to claim 12, wherein, for each of the first plurality of consecutive segments of the audio signal, said means for determining that voice activity is present in the segment is configured to perform said determining based on a difference between a first channel of the audio signal during the segment and a second channel of the audio signal during the segment, andwherein, for each of the second plurality of consecutive segments of the audio signal, said means for determining that voice activity is not present in the segment is configured to perform said determining based on a difference between a first channel of the audio signal during the segment and a second channel of the audio signal during the segment.
  - 18. The apparatus according to claim 17, wherein, for each segment of said first plurality and for each segment of said second plurality, said difference is a difference between a level of the first channel and a level of the second channel during the segment.
  - 19. The apparatus according to claim 17, wherein, for each segment of said first plurality and for each segment of said second plurality, said difference is a difference in time between an instance of a signal in the first channel during the segment and an instance of said signal in the second channel during the segment.
  - 20. The apparatus according to claim 17, wherein said means for determining that voice activity is present in the segment comprises means for calculating, for each segment of said first plurality and for each segment of said second plurality, and for each of a first plurality of different frequency components of the audio signal during the segment, a difference between a phase of the frequency component in the first channel and a phase of the frequency component in the second channel, wherein said difference between the first channel during the segment and the second channel during the segment is one of said calculated phase differences.
  - 21. The apparatus according to claim 20, wherein said apparatus comprises means for calculating a time derivative of energy for each of a second plurality of different frequency components of the first channel during said one among the second plurality of segments, andwherein said means for detecting that the transition occurs during said one among the second plurality of segments is configured to detect that the transition occurs based on the calculated time derivatives of energy, andwherein a frequency band that includes the first plurality of frequency components is separate from a frequency band that includes the second plurality of frequency components.
  - 22. The apparatus according to claim 20, wherein said means for determining, for each segment of said first plurality, that voice activity is present in the segment is configured to determine that said voice activity is present based on a corresponding value of a coherency measure that indicates a degree of coherence among the directions of arrival of at least the plurality of different frequency components, wherein said value is based on information from the corresponding plurality of calculated phase differences, andwherein said means for determining, for each segment of said second plurality, that voice activity is not present in the segment is configured to determine that voice activity is not present based on a corresponding value of the coherency measure that indicates a degree of coherence among the directions of arrival of at least the plurality of different frequency components, wherein said value is based on information from the corresponding plurality of calculated phase differences.
  - 46. The apparatus according to claim 12, wherein said apparatus comprises:
    - means for calculating a time derivative of energy for each of a plurality of different frequency components of the first channel during a segment of one of the first and second pluralities of segments; and
      
      means for producing a voice activity detection indication for said segment of one of the first and second pluralities,wherein said means for producing the voice activity detection indication includes means for comparing a value of a test statistic for the segment to a threshold value, andwherein said means for producing the voice activity detection indication includes means for modifying a relation between the test statistic and the threshold, based on said calculated plurality of time derivatives of energy, andwherein a value of said voice activity detection signal for said segment of one of the first and second pluralities is based on said voice activity detection indication.

23. An apparatus for processing an audio signal, said apparatus comprising:
- a first voice activity detector configured to determine;
  
  for each of a first plurality of consecutive segments of the audio signal, that voice activity is present in the segment, andfor each of a second plurality of consecutive segments of the audio signal that occurs immediately after the first plurality of consecutive segments in the audio signal, that voice activity is not present in the segment;
  
  a second voice activity detector configured to detect that a transition in a voice activity state of the audio signal occurs during one among the second plurality of consecutive segments; and
  
  a signal generator configured to produce a voice activity detection signal that has, for each segment in the first plurality and for each segment in the second plurality, a corresponding value that indicates one among activity and lack of activity,wherein, for each of the first plurality of consecutive segments, the corresponding value of the voice activity detection signal indicates activity, andwherein, for each of the second plurality of consecutive segments that occurs before the segment in which the detected transition occurs, and based on said determining, for at least one segment of the first plurality, that voice activity is present in the segment, the corresponding value of the voice activity detection signal indicates activity, andwherein, for each of the second plurality of consecutive segments that occurs after the segment in which the detected transition occurs, and in response to said detecting that a transition in the speech activity state of the audio signal occurs, the corresponding value of the voice activity detection signal indicates a lack of activity.
- View Dependent Claims (24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 47, 48)
- - 24. The apparatus according to claim 23, wherein said apparatus comprises a calculator configured to calculate a time derivative of energy for each of a plurality of different frequency components of the first channel during said one among the second plurality of segments, andwherein said second voice activity detector is configured to detect said transition based on the calculated time derivatives of energy.
  - 25. The apparatus according to claim 24, wherein said second voice activity detector includes a comparator configured to produce, for each of the plurality of different frequency components, and based on the corresponding calculated time derivative of energy, a corresponding indication of whether the frequency component is active, andwherein said second voice activity detector is configured to detect the transition based on a relation between the number of said indications that indicate that the corresponding frequency component is active and a first threshold value.
  - 26. The apparatus according to claim 25, wherein said apparatus comprises:
    - a calculator configured to calculate, for a segment that occurs prior to the first plurality of consecutive segments in the multichannel signal, a time derivative of energy for each of a plurality of different frequency components of the first channel during the segment; and
      
      a comparator configured to produce, for each of said plurality of different frequency components of said segment that occurs prior to the first plurality of consecutive segments in the multichannel signal, and based on the corresponding calculated time derivative of energy, a corresponding indication of whether the frequency component is active,wherein said second voice activity detector is configured to determine that a transition in a voice activity state of the multichannel signal does not occur during said segment that occurs prior to the first plurality of consecutive segments in the multichannel signal, based on a relation between (A) the number of said indications that indicate that the corresponding frequency component is active and (B) a second threshold value that is higher than said first threshold value.
  - 27. The apparatus according to claim 25, wherein said apparatus comprises:
    - a calculator configured to calculate, for a segment that occurs prior to the first plurality of consecutive segments in the multichannel signal, a second derivative of energy with respect to time for each of a plurality of different frequency components of the first channel during the segment; and
      
      a comparator configured to produce, for each of the plurality of different frequency components of said segment that occurs prior to the first plurality of consecutive segments in the multichannel signal, and based on the corresponding calculated second derivative of energy with respect to time, a corresponding indication of whether the frequency component is impulsive,wherein said second voice activity detector is configured to determine that a transition in a voice activity state of the multichannel signal does not occur during said segment that occurs prior to the first plurality of consecutive segments in the multichannel signal, based on a relation between the number of said indications that indicate that the corresponding frequency component is impulsive and a threshold value.
  - 28. The apparatus according to claim 23, wherein said first voice activity detector is configured to determine, for each of the first plurality of consecutive segments of the audio signal, that voice activity is present in the segment, based on a difference between a first channel of the audio signal during the segment and a second channel of the audio signal during the segment, andwherein said first voice activity detector is configured to determine, for each of the second plurality of consecutive segments of the audio signal, that voice activity is not present in the segment, based on a difference between a first channel of the audio signal during the segment and a second channel of the audio signal during the segment.
  - 29. The apparatus according to claim 28, wherein, for each segment of said first plurality and for each segment of said second plurality, said difference is a difference between a level of the first channel and a level of the second channel during the segment.
  - 30. The apparatus according to claim 28, wherein, for each segment of said first plurality and for each segment of said second plurality, said difference is a difference in time between an instance of a signal in the first channel during the segment and an instance of said signal in the second channel during the segment.
  - 31. The apparatus according to claim 28, wherein said first voice activity detector includes a calculator configured to calculate, for each segment of said first plurality and for each segment of said second plurality, and for each of a first plurality of different frequency components of the multichannel signal during the segment, a difference between a phase of the frequency component in the first channel and a phase of the frequency component in the second channel, wherein said difference between the first channel during the segment and the second channel during the segment is one of said calculated phase differences.
  - 32. The apparatus according to claim 31, wherein said apparatus comprises a calculator configured to calculate a time derivative of energy for each of a second plurality of different frequency components of the first channel during said one among the second plurality of segments, andwherein said second voice activity detector is configured to detect that the transition occurs based on the calculated time derivatives of energy, andwherein a frequency band that includes the first plurality of frequency components is separate from a frequency band that includes the second plurality of frequency components.
  - 33. The apparatus according to claim 31, wherein said first voice activity detector is configured to determine, for each segment of said first plurality, that said voice activity is present in the segment based on a corresponding value of a coherency measure that indicates a degree of coherence among the directions of arrival of at least the plurality of different frequency components, wherein said value is based on information from the corresponding plurality of calculated phase differences, andwherein said first voice activity detector is configured to determine, for each segment of said second plurality, that voice activity is not present in the segment based on a corresponding value of the coherency measure that indicates a degree of coherence among the directions of arrival of at least the plurality of different frequency components, wherein said value is based on information from the corresponding plurality of calculated phase differences.
  - 47. The apparatus according to claim 23, wherein said apparatus comprises:
    - a third voice activity detector configured to calculate a time derivative of energy for each of a plurality of different frequency components of the first channel during a segment of one of the first and second pluralities of segments; and
      
      a fourth voice activity detector configured to produce a voice activity detection indication for said segment of one of the first and second pluralities, based on a result of comparing a value of a test statistic for the segment to a threshold value,wherein said fourth voice activity detector is configured to modify a relation between the test statistic and the threshold, based on said calculated plurality of time derivatives of energy, andwherein a value of said voice activity detection signal for said segment of one of the first and second pluralities is based on said voice activity detection indication.
  - 48. The apparatus according to claim 47, wherein the fourth voice activity detector is the first voice activity detector, andwherein said determining that voice activity is present or not present in the segment includes producing said voice activity detection indication.

34. A computer-readable medium having tangible structures that store machine-executable instructions that when executed by one or more processors cause the one or more processors to:
- determine, for each of a first plurality of consecutive segments of the multichannel signal, and based on a difference between a first channel of the multichannel signal during the segment and a second channel of the multichannel signal during the segment, that voice activity is present in the segment;
  
  determine, for each of a second plurality of consecutive segments of the multichannel signal that occurs immediately after the first plurality of consecutive segments in the multichannel signal, and based on a difference between a first channel of the multichannel signal during the segment and a second channel of the multichannel signal during the segment, that voice activity is not present in the segment;
  
  detect that a transition in a voice activity state of the multichannel signal occurs during one among the second plurality of consecutive segments that is not the first segment to occur among the second plurality; and
  
  produce a voice activity detection signal that has, for each segment in the first plurality and for each segment in the second plurality, a corresponding value that indicates one among activity and lack of activity,wherein, for each of the first plurality of consecutive segments, the corresponding value of the voice activity detection signal indicates activity, andwherein, for each of the second plurality of consecutive segments that occurs before the segment in which the detected transition occurs, and based on said determining, for at least one segment of the first plurality, that voice activity is present in the segment, the corresponding value of the voice activity detection signal indicates activity, andwherein, for each of the second plurality of consecutive segments that occurs after the segment in which the detected transition occurs, and in response to said detecting that a transition in the speech activity state of the multichannel signal occurs, the corresponding value of the voice activity detection signal indicates a lack of activity.
- View Dependent Claims (35, 36, 37, 38, 39, 40, 41, 42, 43, 44)
- - 35. The medium according to claim 34, wherein said instructions when executed by the one or more processors cause the one or more processors to calculate a time derivative of energy for each of a plurality of different frequency components of the first channel during said one among the second plurality of segments, andwherein said detecting that the transition occurs during said one among the second plurality of segments is based on the calculated time derivatives of energy.
  - 36. The medium according to claim 35, wherein said detecting that the transition occurs includes, for each of the plurality of different frequency components, and based on the corresponding calculated time derivative of energy, producing a corresponding indication of whether the frequency component is active, andwherein said detecting that the transition occurs is based on a relation between the number of said indications that indicate that the corresponding frequency component is active and a first threshold value.
  - 37. The medium according to claim 36, wherein said instructions when executed by one or more processors cause the one or more processors, for a segment that occurs prior to the first plurality of consecutive segments in the multichannel signal:
    - to calculate a time derivative of energy for each of a plurality of different frequency components of the first channel during the segment;
      
      to produce, for each of the plurality of different frequency components, and based on the corresponding calculated time derivative of energy, a corresponding indication of whether the frequency component is active; and
      
      to determine that a transition in a voice activity state of the multichannel signal does not occur during the segment, based on a relation between (A) the number of said indications that indicate that the corresponding frequency component is active and (B) a second threshold value that is higher than said first threshold value.
  - 38. The medium according to claim 36, wherein said instructions when executed by one or more processors cause the one or more processors, for a segment that occurs prior to the first plurality of consecutive segments in the multichannel signal:
    - to calculate, for each of a plurality of different frequency components of the first channel during the segment, a second derivative of energy with respect to time;
      
      to produce, for each of the plurality of different frequency components, and based on the corresponding calculated second derivative of energy with respect to time, a corresponding indication of whether the frequency component is impulsive; and
      
      to determine that a transition in a voice activity state of the multichannel signal does not occur during the segment, based on a relation between the number of said indications that indicate that the corresponding frequency component is impulsive and a threshold value.
  - 39. The medium according to claim 34, wherein, for each of the first plurality of consecutive segments of the audio signal, said determining that voice activity is present in the segment is based on a difference between a first channel of the audio signal during the segment and a second channel of the audio signal during the segment, andwherein, for each of the second plurality of consecutive segments of the audio signal, said determining that voice activity is not present in the segment is based on a difference between a first channel of the audio signal during the segment and a second channel of the audio signal during the segment.
  - 40. The medium according to claim 39, wherein, for each segment of said first plurality and for each segment of said second plurality, said difference is a difference between a level of the first channel and a level of the second channel during the segment.
  - 41. The medium according to claim 39, wherein, for each segment of said first plurality and for each segment of said second plurality, said difference is a difference in time between an instance of a signal in the first channel during the segment and an instance of said signal in the second channel during the segment.
  - 42. The medium according to claim 39, wherein, for each segment of said first plurality, said determining that voice activity is present in the segment comprises calculating, for each of a first plurality of different frequency components of the multichannel signal during the segment, a difference between a phase of the frequency component in the first channel and a phase of the frequency component in the second channel, wherein said difference between the first channel during the segment and the second channel during the segment is one of said calculated phase differences, andwherein, for each segment of said second plurality, said determining that voice activity is not present in the segment comprises calculating, for each of the first plurality of different frequency components of the multichannel signal during the segment, a difference between a phase of the frequency component in the first channel and a phase of the frequency component in the second channel, wherein said difference between the first channel during the segment and the second channel during the segment is one of said calculated phase differences.
  - 43. The medium according to claim 42, wherein said instructions when executed by one or more processors cause the one or more processors to calculate a time derivative of energy for each of a second plurality of different frequency components of the first channel during said one among the second plurality of segments, andwherein said detecting that the transition occurs during said one among the second plurality of segments is based on the calculated time derivatives of energy, andwherein a frequency band that includes the first plurality of frequency components is separate from a frequency band that includes the second plurality of frequency components.
  - 44. The medium according to claim 42, wherein, for each segment of said first plurality, said determining that voice activity is present in the segment is based on a corresponding value of a coherency measure that indicates a degree of coherence among the directions of arrival of at least the plurality of different frequency components, wherein said value is based on information from the corresponding plurality of calculated phase differences, andwherein, for each segment of said second plurality, said determining that voice activity is not present in the segment is based on a corresponding value of the coherency measure that indicates a degree of coherence among the directions of arrival of at least the plurality of different frequency components, wherein said value is based on information from the corresponding plurality of calculated phase differences.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Qualcomm, Inc.
Original Assignee
Qualcomm, Inc.
Inventors
Liu, Ian Ernan, Shin, Jongwon, Visser, Erik

Granted Patent

US 9,165,567 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/208
CPC Class Codes

G10L 25/78 Detection of presence or ab...

SYSTEMS, METHODS, AND APPARATUS FOR SPEECH FEATURE DETECTION

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

48 Claims

Specification

Solutions

Use Cases

Quick Links

SYSTEMS, METHODS, AND APPARATUS FOR SPEECH FEATURE DETECTION

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

48 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links