Apparatus and method for classification and segmentation of audio content, based on the audio signal

US 8,428,949 B2
Filed: 06/30/2009
Issued: 04/23/2013
Est. Priority Date: 06/30/2008
Status: Active Grant

First Claim

Patent Images

1. An apparatus for classifying an input audio signal into audio contents of a first class and of a second class, the apparatus comprising:

an audio segmentation module adapted to segment said input audio signal into one or more of segments of a predetermined length;

a feature computation module adapted to calculate for each of said one or more segments one or more features characterizing said audio input signal;

a threshold comparison module adapted to generate a feature vector for each of said one or more segments by comparing the one or more features in each segment with a plurality of predetermined thresholds, the plurality of predetermined thresholds including for each of the audio contents of the first class and of the second class a substantially near certainty threshold, a substantially high certainty threshold, and a substantially low certainty threshold, wherein each threshold of the plurality of thresholds represents a statistical measure relating to the one or more features; and

a classification module adapted to analyze the feature vector and classify each one of said one or more segments as audio contents of the first class, of the second class, or as non-decisive audio contents;

wherein a segment is classified as audio contents of the first class when the feature vector includes at least one feature surpassing the substantially near certainty threshold of the first class and no features surpassing the substantially near certainty threshold and the substantially high certainty threshold of the second class;

wherein the classification module is further adapted to, at one or more subsequent intermediate classifications stages, to classify a non-decisive segment as audio contents of the first class when a majority of features in the feature vector surpass the substantially high certainty threshold of the first class and no features surpass the substantially high certainty threshold of the second threshold; and

wherein the classification module is further adapted to, at a subsequent separation classifications stage, classify segments of non-decisive audio contents into audio contents of the first class or of the second class.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An apparatus for classifying an input audio signal into audio contents of a first and second class, comprising an audio segmentation module adapted to segment said input audio signal into segments of a predetermined length; a feature computation module adapted to calculate for the segments features characterizing said audio input signal; a threshold comparison module adapted to generate a feature vector for each of said one or more segments based on a plurality of predetermined thresholds, the thresholds including for each of the audio contents of the first class and of the second class a substantially near certainty threshold, a substantially high certainty threshold, and a substantially low certainty threshold; and a classification module adapted to analyze the feature vector and classify each one of said one or more segments as audio contents of the first class, of the second class, or as non-decisive audio contents.

Citations

18 Claims

1. An apparatus for classifying an input audio signal into audio contents of a first class and of a second class, the apparatus comprising:
- an audio segmentation module adapted to segment said input audio signal into one or more of segments of a predetermined length;
  
  a feature computation module adapted to calculate for each of said one or more segments one or more features characterizing said audio input signal;
  
  a threshold comparison module adapted to generate a feature vector for each of said one or more segments by comparing the one or more features in each segment with a plurality of predetermined thresholds, the plurality of predetermined thresholds including for each of the audio contents of the first class and of the second class a substantially near certainty threshold, a substantially high certainty threshold, and a substantially low certainty threshold, wherein each threshold of the plurality of thresholds represents a statistical measure relating to the one or more features; and
  
  a classification module adapted to analyze the feature vector and classify each one of said one or more segments as audio contents of the first class, of the second class, or as non-decisive audio contents;
  
  wherein a segment is classified as audio contents of the first class when the feature vector includes at least one feature surpassing the substantially near certainty threshold of the first class and no features surpassing the substantially near certainty threshold and the substantially high certainty threshold of the second class;
  
  wherein the classification module is further adapted to, at one or more subsequent intermediate classifications stages, to classify a non-decisive segment as audio contents of the first class when a majority of features in the feature vector surpass the substantially high certainty threshold of the first class and no features surpass the substantially high certainty threshold of the second threshold; and
  
  wherein the classification module is further adapted to, at a subsequent separation classifications stage, classify segments of non-decisive audio contents into audio contents of the first class or of the second class.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The apparatus according to claim 1, wherein the classification module is further adapted to classify a segment as audio contents of the first class when the feature vector includes at least two features surpassing the substantially near certainty threshold of the first class and no features surpassing the substantially near certainty threshold of the second class.
  - 3. The apparatus according to claim 1, wherein the classification module is adapted to implement two or more intermediate classifications stages, and wherein classifying segments in the intermediate classification stages includes cascading one or more thresholds between subsequent intermediate classifications stages.
  - 4. The apparatus according to claim 1, wherein the classification module is adapted to implement two or more intermediate classifications stages, and wherein classifying segments in the intermediate classification stages includes cascading between subsequent intermediate classifications stages the number of features in the feature vector that are required to surpass the substantially high certainty threshold of the first class in order for a non-decisive segment to be classified as audio contents of the first class.
  - 5. The apparatus according to claim 1 wherein for each segment of the said one or more segments said classification yields a numerical measure of certainty with respect to being either a first or a second type of audio content, where said numerical measure is a number between a first low extreme value and a second high extreme value, wherein the high extreme value is a high indication of first said type and wherein the low extreme value is a high indication of second said type, and wherein numerical measure values in between said extremes indicate each said type with certainty related to the absolute difference between the value and each said extreme.
  - 6. The apparatus according to claim 5 wherein for each segment of the said one or more segments said numerical measure is additionally smoothed using a smoothing filter in time, wherein the sequence of said numerical measures for the said one or more segments is used as an input signal to the filter, and wherein the final classification decision for each segment is given by:
    - obtaining two thresholds for final classification;
      
      if the output value on a segment of said smoothing filter is greater than first of said thresholds then first said type is concluded;
      
      otherwise if the output value on said segment of said smoothing filter is smaller than second of said thresholds then second said type is concluded;
      
      otherwise the decision is taken with respect to a well-defined function on the history of past decisions, e.g. the direction in time of the output signal of said smoothing filter, wherein upward numerical direction results in conclusion of first said type and wherein downward numerical direction results in conclusion of second said type.
  - 7. The apparatus according to claim 1 wherein the audio contents of the second class is speech.
  - 8. The apparatus according to claim 1 wherein the audio contents of the first class is music, environmental sound, silence, or any combination thereof.
  - 9. The apparatus according to claim 1 further comprising an audio framer module adapted to separate each segment in the one or more segments into frames of a predetermined length.

10. A method for segmenting an input audio signal into audio contents of a first class and of a second class, the method comprising:
- separating said input audio signal into one or more of segments of a predetermined length;
  
  calculating for each of said one or more segment one or more features characterizing said audio input signal;
  
  generating a feature vector for each of said one or more segments by comparing the one or more features in each segment with a plurality of predetermined thresholds, the plurality of predetermined thresholds including for each of the audio contents of the first class and of the second class a substantially near certainty threshold, a substantially high certainty threshold, and a substantially low certainty threshold, wherein each threshold of the plurality of thresholds represents a statistical measure relating to the one or more features; and
  
  analyzing the feature vector and classifying each one of said one or more segments as audio contents of the first class, of the second class, or as non-decisive audio contents;
  
  wherein a segment is classified as audio contents of the first class when the feature vector includes at least one feature surpassing the substantially near certainty threshold of the first class and no features surpassing the substantially near certainty threshold and the substantially high certainty threshold of the second class;
  
  wherein the classification module is further adapted to, at one or more subsequent intermediate classifications stages, to classify a non-decisive segment as audio contents of the first class when a majority of features in the feature vector surpass the substantially high certainty threshold of the first class and no features surpass the substantially high certainty threshold of the second class; and
  
  wherein the classification module is further adapted to, at a subsequent separation classifications stage, classify segments of non-decisive audio contents into audio contents of the first class or of the second class.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17)
- - 11. The method according to claim 10 further comprising classifying a segment as audio contents of the first class when the feature vector includes at least two features surpassing the substantially near certainty threshold of the first class and no features surpassing the substantially near certainty threshold of the second class.
  - 12. The method according to claim 10 further comprising implementing two or more intermediate classifications stages, and wherein classifying segments in the intermediate classification stages includes cascading between subsequent intermediate classifications stages the number of features in the feature vector that are required to surpass the substantially high certainty threshold of the first class in order for a non-decisive segment to be classified as audio contents of the first class.
  - 13. The method according to claim 10 wherein for each segment of the said one or more segments said classification yields a numerical measure of certainty with respect to being either a first or a second type of audio content, where said numerical measure is a number between a first low extreme value and a second high extreme value, wherein the high extreme value is a high indication of first said type and wherein the low extreme value is a high indication of second said type, and wherein numerical measure values in between said extremes indicate each said type with certainty related to the absolute difference between the value and each said extreme.
  - 14. The method according to claim 13 wherein for each segment of the said one or more segments said numerical measure is additionally smoothed using a smoothing filter in time, wherein the sequence of said numerical measures for the said one or more segments is used as an input signal to the filter, and wherein the final classification decision for each segment is given by:
    - obtaining two thresholds for final classification;
      
      if the output value on a segment of said smoothing filter is greater than first of said thresholds then first said type is concluded;
      
      otherwise if the output value on said segment of said smoothing filter is smaller than second of said thresholds then second said type is concluded;
      
      otherwise the decision is taken with respect to a well-defined function on the history of past decisions, e.g. the direction in time of the output signal of said smoothing filter, wherein upward numerical direction results in conclusion of first said type and wherein downward numerical direction results in conclusion of second said type.
  - 15. The method according to claim 10 further comprising implementing two or more intermediate classifications stages, and wherein classifying segments in the intermediate classification stages includes cascading one or more thresholds between subsequent intermediate classifications stages.
  - 16. The method according to claim 10 wherein the audio contents of the second class is speech.
  - 17. The method according to claim 10 wherein the audio contents of the first class is music, environmental sound, silence, or any combination thereof.

18. A system for segmenting audio content into a first class and a second class, the system comprising:
- an apparatus for segmenting an input audio signal into audio contents of a first class and of a second class, the apparatus comprising an audio segmentation module adapted to separate said input audio signal into one or more segments of a predetermined length;
  
  a feature computation module adapted to calculate for each segment in the said one or more segments one or more features characterizing said audio input signal;
  
  a threshold comparison module adapted to generate a feature vector for each segment in the said one or more segments by comparing the one or more features in each segment with a plurality of predetermined thresholds, the plurality of predetermined thresholds including for each of the audio contents of the first class and of the second class a substantially near certainty threshold, a substantially high certainty threshold, and a substantially low certainty threshold, wherein each threshold of the plurality of thresholds represents a statistical measure relating to the one or more features; and
  
  a classification module adapted to analyze the feature vector and classify each segment in the said one or more segments as audio contents of the first class, of the second class, or as non-decisive audio contents;
  
  wherein a segment is classified as audio contents of the first class when the feature vector includes at least one feature surpassing the substantially near certainty threshold of the first class and no features surpassing the substantially near certainty threshold and the substantially high certainty threshold of the second class;
  
  wherein the classification module is further adapted to, at one or more subsequent intermediate classifications stages, to classify a non-decisive segment as audio contents of the first class when a majority of features in the feature vector surpass the substantially high certainty threshold of the first class and no features surpass the substantially high certainty threshold of the second class; and
  
  wherein the classification module is further adapted to, at a subsequent separation classifications stage, classify segments of non-decisive audio contents into audio contents of the first class or of the second class;
  
  an audio interface unit for transferring the input audio signal from an audio source to the apparatus; and
  
  a processing unit for processing the audio content classified into the first class and the second class.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Waves Audio Ltd.
Original Assignee
Waves Audio Ltd.
Inventors
Neoran, Itai, Lavner, Yizhar, Ruinskiy, Dima
Primary Examiner(s)
Abebe, Daniel D

Application Number

US12/495,171
Publication Number

US 20100004926A1
Time in Patent Office

1,393 Days
Field of Search

704/253, 704/254, 381/28
US Class Current

704/253
CPC Class Codes

G10L 25/48 specially adapted for parti...

Apparatus and method for classification and segmentation of audio content, based on the audio signal

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

Apparatus and method for classification and segmentation of audio content, based on the audio signal

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links