Noise-robust feature extraction using multi-layer principal component analysis

US 7,082,394 B2
Filed: 06/25/2002
Issued: 07/25/2006
Est. Priority Date: 06/25/2002
Status: Expired due to Fees

First Claim

Patent Images

1. A system for training a feature extractor for extracting features from an input signal comprising:

receiving at least one training signal;

receiving at least one distorted copy of the at least one training signal;

transforming each training signal and each distorted copy of the at least one training signal into a suitable representation for taking projections;

performing a multi-layer oriented principal component analysis (OPCA) of the at least one transformed training signal and the at least one transformed distorted copy of the at least one training signal to compute a set of training projections for each layer; and

constructing a signal feature extractor from two or more layers of said projections.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Extracting features from signals for use in classification, retrieval, or identification of data represented by those signals uses a “Distortion Discriminant Analysis” (DDA) of a set of training signals to define parameters of a signal feature extractor. The signal feature extractor takes signals having one or more dimensions with a temporal or spatial structure, applies an oriented principal component analysis (OPCA) to limited regions of the signal, aggregates the output of multiple OPCAs that are spatially or temporally adjacent, and applies OPCA to the aggregate. The steps of aggregating adjacent OPCA outputs and applying OPCA to the aggregated values are performed one or more times for extracting low-dimensional noise-robust features from signals, including audio signals, images, video data, or any other time or frequency domain signal. Such extracted features are useful for many tasks, including automatic authentication or identification of particular signals, or particular elements within such signals.

Citations

42 Claims

1. A system for training a feature extractor for extracting features from an input signal comprising:
- receiving at least one training signal;
  
  receiving at least one distorted copy of the at least one training signal;
  
  transforming each training signal and each distorted copy of the at least one training signal into a suitable representation for taking projections;
  
  performing a multi-layer oriented principal component analysis (OPCA) of the at least one transformed training signal and the at least one transformed distorted copy of the at least one training signal to compute a set of training projections for each layer; and
  
  constructing a signal feature extractor from two or more layers of said projections.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21)
- - 2. The system of claim 1 wherein performing a multi-layer OPCA of the at least one transformed training signal and the at least one transformed distorted copy of the training signal to compute the set of training projections for each layer comprises;
    - computing a first OPCA layer directly from the at least one transformed training signal and the at least one transformed distorted copy of the at least one training signal; and
      
      computing at least one subsequent OPCA layer from an aggregate of the projections from an immediately preceding OPCA layer, beginning with an aggregate of the training projections from the first OPCA layer.
  - 3. The system of claim 1 further comprising pre-processing the at least one training signal, and the at least one distorted copy of the at least one training signal, to remove known distortions from the at least one training signal and the at least one distorted copy of the training signal.
  - 4. The system of claim 1 further comprising normalizing the training projections output by each OPCA layer.
  - 5. The system of claim 1 wherein the set of training projections computed for each layer is populated by a predetermined number of highest generalized eigenvalue OPCA projections computed for each layer.
  - 6. The system of claim 1 further comprising applying a suitable normalization to each projection at each layer.
  - 7. The system of claim 1 further comprising transforming each input signal into a representation suitable for projection.
  - 8. The system of claim 1 wherein the at least one training signal and each distorted copy of the at least one training signal comprise audio signals and wherein transforming each training signal and each distorted copy of the at least one training signal into a suitable representation for taking projections comprises transforming the audio signals into a time-frequency representation.
  - 9. The system of claim 8 wherein transforming the audio signals into a time-frequency representation comprises applying Fourier transforms to windowed subsets of the audio signals.
  - 10. The system of claim 7 wherein said at least one input signal comprises an audio signal and said transforming comprises transforming the audio signal into a time-frequency representation.
  - 11. The system of claim 7 further comprising extracting at least one feature from the at least one input signal by passing at least one transformed input signal through each layer of the feature extractor in the order in which the layers were originally computed.
  - 12. The system of claim 2 further comprising:
    - receiving at least one input signal and transforming each input signal into a representation suitable for projection; and
      
      passing at least one transformed input signal through each layer of the feature extractor in the order in which the layers were originally computed.
  - 13. The system of claim 12 wherein passing the least one transformed input signal through each layer of the feature extractor comprises;
    - computing a first set of output projections by applying the training projections of the first OPCA layer to the at least one transformed input signal;
      
      computing at least one subsequent set of output projections by applying the training projections of each layer of the feature extractor to previous aggregate layers of output projections, wherein each aggregate layer of output projections is generated by collating output projections from adjacent positions in a layer.
  - 14. The system of claim 13 wherein a final set of output projections produced by a last layer of the feature extractor represents features extracted from the input signal.
  - 15. The system of claim 14 wherein at least one of the input signals represent a known data signal.
  - 16. The system of claim 15 wherein at least one of the input signals represents an unknown data signal.
  - 17. The system of claim 16 further comprising comparing the features extracted from the known data signal to the features extracted from the unknown data signal, and wherein one or more portions of the unknown data signal are identified by the comparison of the extracted features.
  - 18. The system of claim 1 wherein transforming each training signal and each distorted copy of the training signal into a representation suitable for projection is performed on sequential frames of the training signal, and wherein performing a multi-layer oriented principal component analysis (OPCA) of the transformed training signal and the at least one transformed distorted copy of the at least one training signal to compute a set of training projections for each layer is performed on each sequential frame of the at least one training signal.
  - 19. The system of claim 7 wherein transforming each input signal into a representation suitable for projection is performed on sequential frames of the input signal, and wherein extracting at least one feature from the at least one input signal by passing at least one transformed input signal through each layer of the feature extractor in the order in which the layers were originally computed is performed on each sequential frame of the input signal.
  - 20. The system of claim 1 wherein the at least one training signal and the input signal are of the same signal type, and wherein the signal type represents any of audio signals, images, and video data.
  - 21. The system of claim 1 further comprising normalizing the training projections for each layer by computing scores on a validation signal such that a mean distance between each training projection and projections computed for the validation signal is one.

22. A method for training a feature extractor for extracting features from an input signal comprising using a computing device to:
- divide at least one training signal into a set of adjacent frames, each frame having a same size;
  
  apply a first oriented principal component analysis (OPCA) to the adjacent frames to produce a first set of generalized eigenvectors for each frame;
  
  choose a number N of highest value eigenvectors for each frame;
  
  project each frame along the eigenvectors computed for each frame to produce a first set of N projections for each frame;
  
  aggregate the projections for adjacent frames to produce at least one aggregate;
  
  apply a second OPCA to each aggregate, with the second OPCA producing a second set of generalized eigenvectors for each aggregate frame;
  
  choose N highest value elgenvectors produced by the second OPCA for each aggregate frame;
  
  project each aggregate frame along the eigenvectors computed for the each aggregate frame to produce a second set of N projections for each aggregate frame; and
  
  train a feature extractor by assigning the first set of N projections to a first feature extractor layer, and assigning the second set of N projections to a second feature extractor layer.
- View Dependent Claims (23, 24, 25, 26)
- - 23. The method of claim 22 wherein the at least one training signal is transformed prior to performing the OPCA.
  - 24. The method of claim 22 further comprising normalizing the projections.
  - 25. The method of claim 24 wherein normalizing the projections comprises normalizing the projections for the last layer by computing scores on a validation signal such that a mean distance between each projection computed from the at least one training signal and projections computed for the validation signal is one.
  - 26. The method of claim 22 further comprising:
    - computing at least one subsequent layer of projections by aggregating a number of adjacent projections of an immediately preceding layer, beginning with the second set of projections to produce a subsequent aggregate frame;
      
      applying a subsequent OPCA to this aggregate, with the OPCA outputting a new set of generalized eigenvectors;
      
      choosing N highest value elgenvectors produced by the subsequent OPCA for each subsequent aggregate frame;
      
      project each subsequent aggregate frame along the elgenvectors computed for the each subsequent aggregate frame to produce a subsequent set of N projections for each subsequent aggregate frame; and
      
      further training the feature extractor by assigning each new subsequent set of N projections to a subsequent feature extractor layer.

27. A computer-readable medium having computer executable instructions for extracting features from an input signal, said computer executable instructions comprising:
- applying a multi-layer oriented principal component analysis (OPCA) to a set of at least one training signals for producing a set of training projections for each OPCA layer, wherein each subsequent layer of the OPCA is performed on an aggregate of outputs from an immediately preceding OPCA layer;
  
  training a feature extractor by assigning the set of training projections for each OPCA layer to a corresponding layer of the feature extractor; and
  
  extracting features from at least one input signal by passing each input signal through each layer of the feature extractor in the order in which the layers were originally computed.
- View Dependent Claims (28, 29)
- - 28. The computer-readable medium of claim 27 wherein applying a multi-layer OPCA to the set of training signals for producing a set of training projections for each OPCA layer comprises:
    - computing a first OPCA layer by;
      
      transforming each training signal;
      
      computing generalized elgenvectors over the transformed training signals,projecting each training signal over a number of highest value eigenvectors to produce a number of projections from the training signal; and
      
      computing a second OPCA layer by;
      
      collating a number of adjacent projections from the first OPCA layer into an aggregate of projections,computing generalized eigenvectors over the aggregate of projections, andprojecting the aggregate of projections over a number of highest value eigenvectors computed from the projections to produce a number of projections from the aggregate of projections.
  - 29. The computer-readable medium of claim 28 further comprising computing at least one additional OPCA layer by applying an OPCA to an aggregate of the projections from an immediately preceding OPCA layer, beginning with the second OPCA layer.

30. A computer-implemented process for training an audio signal feature extractor, comprising using a computing device to:
- receive an audio input comprising representative audio data;
  
  transform the audio input into a time-frequency representation;
  
  compute generalized eigenvalues over the transformed audio data;
  
  compute at least one eigenvector corresponding to at least one highest value elgenvalue and assign those elgenvectors to a first layer of an audio signal feature extractor;
  
  collate a number of adjacent eigenvectors into an aggregate;
  
  compute generalized eigenvalues over the aggregate;
  
  compute at least one eigenvector corresponding to at least one highest value eigenvalue of the aggregate and assign those eigenvectors to a second layer of the audio feature extractor.
- View Dependent Claims (31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42)
- - 31. The computer-implemented process of claim 30 further comprising extracting features from at least one first audio signal by passing a time-frequency transformation of the first audio signal through each layer of the audio feature extractor.
  - 32. The computer-implemented process of claim 30 wherein the audio input is distorted prior to transforming the audio input into a time-frequency representation.
  - 33. The computer-implemented process of claim 30 wherein at least one copy of the audio input is distorted prior to transforming the audio data.
  - 34. The computer-implemented process of claim 30 wherein at least one copy of the audio input is pre-processed prior to transforming the audio input by combining any multi-channel audio information into a single audio channel.
  - 35. The computer-implemented process of claim 30 wherein the audio input is pre-processed prior to transforming the audio input by downsampling the audio input.
  - 36. The computer-implemented process of claim 30 wherein the audio input is pre-processed prior to transforming the audio input by using a human psychoacoustic masking model for removing audio frequency components from the audio input which can not be heard by a typical human listener.
  - 37. The computer-implemented process of claim 30 wherein the audio input is randomly shifted forward and backwards in time, Up to a predefined maximum time offset, to provide at least one temporally misaligned copy of the audio input, and wherein the feature extractor trained using the time-shifted audio data is robust against temporal misalignment.
  - 38. The computer-implemented process of claim 30 wherein the audio input is transformed using a complex modulated lapped transform to produce the transformed audio data.
  - 39. The computer-implemented process of claim 30 wherein the audio input is transformed using a windowed FFT to produce the transformed audio data.
  - 40. The computer-implemented process of claim 31 wherein the first audio signal represents a known audio signal, and wherein each extracted audio feature is stored in an exemplary feature database.
  - 41. The computer-implemented process of claim 40 further comprising extracting at least one second audio feature from at least one second audio signal.
  - 42. The computer-implemented process of claim 41 further comprising comparing the audio features extracted from the first audio signal to the audio features extracted from the second audio signal.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Platt, John, Burges, Chris
Primary Examiner(s)
{hacek over (S)}mits, Talivaldis Ivars
Assistant Examiner(s)
SAINT CYR, LEONARD

Application Number

US10/180,271
Publication Number

US 20030236661A1
Time in Patent Office

1,491 Days
Field of Search

704/205, 704/228, 704/235, 704/243, 382/190
US Class Current

704/243
CPC Class Codes

G06F 18/213   Feature extraction, e.g. by...

G06V 10/507   Summing image-intensity val...

G10L 15/02   Feature extraction for spee...

G10L 15/20   Speech recognition techniqu...

Noise-robust feature extraction using multi-layer principal component analysis

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

42 Claims

Specification

Solutions

Use Cases

Quick Links

Noise-robust feature extraction using multi-layer principal component analysis

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

42 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links