Robust detection and classification of objects in audio using limited training data

US 7,263,485 B2
Filed: 05/28/2003
Issued: 08/28/2007
Est. Priority Date: 05/31/2002
Status: Expired due to Fees

First Claim

Patent Images

1. A method of classifying a homogeneous audio segment into one of a plurality of classes, said method comprising the steps of:

dividing said homogeneous audio segment into a plurality of sub-segments;

extracting for each sub-segment a feature vector; and

classifying said homogeneous audio segment by comparing said feature vectors of said plurality of sub-segments with a plurality of continuous distribution functions, wherein each continuous distribution function defines one of said plurality of classes.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method (200) and apparatus (100) for classifying a homogeneous audio segment are disclosed. The homogeneous audio comprises a sequence of audio samples (x(n)). The method (200) starts by forming a sequence of frames (701-704) along the sequence of audio samples (x(n)), each frame (701-704) comprising a plurality of the audio samples (x(n)). The homogeneous audio segment is next divided (206) into a plurality of audio clips (711-714), with each audio clip being associated with a plurality of the frames (701-704). The method (200) then extracts (208) at least one frame feature for each clip (711-714). A clip feature vector (f) is next extracted from frame features of frames associated with the audio clip (711-714). Finally the segment is classified based on a continuous function during the distribution of the clip feature vectors (f).

26 Citations

View as Search Results

16 Claims

1. A method of classifying a homogeneous audio segment into one of a plurality of classes, said method comprising the steps of:
- dividing said homogeneous audio segment into a plurality of sub-segments;
  
  extracting for each sub-segment a feature vector; and
  
  classifying said homogeneous audio segment by comparing said feature vectors of said plurality of sub-segments with a plurality of continuous distribution functions, wherein each continuous distribution function defines one of said plurality of classes.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method as claimed in claim 1 wherein said feature vector includes at least one feature extracted from the group comprising:
    - a bandwidth of a plurality of audio samples of said sub-segment; and
      
      an energy value of a plurality of audio samples of said sub-segment.
  - 3. The method as claimed in claim 1 wherein said feature vector comprises at least two features selected from the group consisting of:
    - Volume standard deviation;
      
      Vlume dynamic range;
      
      Volume undulation;
      
      4 Hz volume peak;
      
      Zero-crossing rate STD;
      
      Bandwidth; and
      
      Pitch STD.
  - 4. The method as claimed in claim 1 wherein said classifying step comprises the sub-steps of:
    - calculating a measure of similarity between said feature vectors of said plurality of sub-segments and each of said continuous distribution functions associated with said plurality of classes;
      
      determining the highest measure of similarity;
      
      determining a confidence measure for said highest measure of similarity;
      
      comparing said confidence measure with a confidence threshold; and
      
      classifying said homogeneous audio segment as belonging to the class associated with said highest measure of similarity, upon said confidence measure being greater than said confidence threshold.
  - 5. The method as claimed in claim 4 wherein said classifying step comprises the further sub-step of:
    - classifying said homogeneous audio segment as belonging to an unknown class upon said confidence measure being less than said confidence threshold.
  - 6. The method as claimed in claim 4 wherein said confidence measure includes a measure of separation between said continuous distribution functions.
  - 7. The method as claimed in claim 1 wherein said continuous distribution functions are Gaussian Mixture Models.

8. An apparatus for classifying a homogeneous audio segment into one of a plurality of classes, said apparatus comprising:
- means for dividing said homogeneous audio segment into a plurality of sub-segments;
  
  means for extracting for each sub-segment a feature vector; and
  
  means for classifying said homogeneous audio segment by comparing said feature vectors of said plurality of sub-segments with a plurality of continuous distribution functions, wherein each continuous distribution function defines one of said plurality of classes.
- View Dependent Claims (9, 10, 11, 12, 13)
- - 9. The apparatus as claimed in claim 8 wherein said feature vector includes at least one feature extracted from the group comprising:
    - a bandwidth of a plurality of audio samples of said sub-segment; and
      
      an energy value of a plurality of audio samples of said sub-segment.
  - 10. The apparatus as claimed in claim 8 wherein said feature vector comprises at least two features selected from the group consisting of:
    - Volume standard deviation;
      
      Volume dynamic range;
      
      Volume undulation;
      
      4 Hz volume peak;
      
      Zero-crossing rate STD;
      
      Bandwidth; and
      
      Pitch STD.
  - 11. The apparatus as claimed in claim 8 wherein said means for classifying comprises:
    - means for calculating a measure of similarity between said feature vectors of said plurality of sub-segments and each of said continuous distribution functions associated with said plurality of classes;
      
      means for determining the highest measure of similarity;
      
      means for determining a confidence measure for said highest measure of similarity;
      
      means for comparing said confidence measure with a confidence threshold; and
      
      means for classifying said homogeneous audio segment as belonging to the class associated with said highest measure of similarity, upon said confidence measure being greater than said confidence threshold.
  - 12. The apparatus as claimed in claim 11 wherein said means for classifying further comprises means for classifying said homogeneous audio segment as belonging to an unknown class upon said confidence measure being less than said confidence threshold.
  - 13. The apparatus as claimed in claim 11 wherein said confidence measure includes a measure of separation between said continuous distribution functions.

14. A program stored in a memory medium for classifying a homogeneous audio segment into one of a plurality of classes, said program comprising;
- code for dividing said homogeneous audio segment into a plurality of sub-segments;
  
  code for extracting for each sub-segment a feature vector; and
  
  code for classifying said homogeneous audio segment by comparing said feature vectors of said plurality of sub-segments with a plurality of continuous distribution functions, wherein each continuous distribution function defines one of said plurality of classes.
- View Dependent Claims (15, 16)
- - 15. The program as claimed in claim 14 wherein said code for classifying comprises:
    - code for calculating a measure of similarity between said feature vectors of said plurality of sub-segments and each of said continuous distribution functions associated with said plurality of classes;
      
      code for determining the highest measure of similarity;
      
      code for determining a confidence measure for said highest measure of similarity;
      
      code for comparing said confidence measure with a confidence threshold; and
      
      code for classifying said homogeneous audio segment as belonging to the class associated with said highest measure of similarity, upon said confidence measure being greater than said confidence threshold.
  - 16. The program as claimed in claim 15 wherein said code for classifying further comprises:
    - code for classifying said homogeneous audio segment as belonging to an unknown class upon said confidence measure being less than said confidence threshold.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Canon Kabushiki Kaisha (Canon Inc.)
Original Assignee
Canon Kabushiki Kaisha (Canon Inc.)
Inventors
Wark, Timothy John
Primary Examiner(s)
Azad; Abul K.

Application Number

US10/446,099
Publication Number

US 20030231775A1
Time in Patent Office

1,553 Days
Field of Search

None
US Class Current

704/240
CPC Class Codes

G10L 15/02   Feature extraction for spee...

G10L 15/04   Segmentation; Word boundary...

G10L 25/78   Detection of presence or ab...

Robust detection and classification of objects in audio using limited training data

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

26 Citations

16 Claims

Specification

Solutions

Use Cases

Quick Links

Robust detection and classification of objects in audio using limited training data

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

26 Citations

16 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links