Adaptive bayes feature extraction

US 7,961,955 B1
Filed: 01/28/2008
Issued: 06/14/2011
Est. Priority Date: 01/28/2008
Status: Expired due to Fees

First Claim

Patent Images

1. A computer-implemented method for extracting discriminately informative features from input patterns, which provide discrimination between two classes, a class-of-interest and a class-other, while reducing the number of features, comprising the steps of:

receiving a training set of class-of-interest patterns, a set of unlabeled patterns from an input-data-set, and an estimate of a class-of-interest a priori probability in said input-data-set, said input-data-set being at least one of an image, video or speech data set;

selecting elements of a predetermined polynomial function;

executing a training stage using said class-of-interest a priori probability, said training set of class-of-interest patterns, and said unlabeled patterns from said input-data-set, said training stage including a step of selecting a set of weights for said polynomial function that ensure a least squares approximation of a class-of-interest posterior distribution function using said polynomial function;

classifying said pattern from said input-data-set as being either said class-of-interest or said class-other in accordance with a conditional test defined by an adaptive Bayes decision rule;

extracting a predetermined percent of said classified patterns that lie near a decision boundary;

locating points lying on said decision boundary using said extracted patterns that lie near said decision boundary;

calculating normal vectors to said decision boundary using said points lying on said decision boundary;

calculating an effective decision boundary feature matrix;

calculating eigenvalues, eigenvectors, and a rank of said effective decision boundary feature matrix;

selecting a set of said eigenvectors for use in a feature extraction matrix; and

extracting a reduced set of features using said feature extraction matrix,whereby said discriminately informative features are extracted from input patterns which provide discrimination between a class-of-interest and a class-other while reducing the number of features, using only said training set of class-of-interest patterns, and said unlabeled patterns from said input-data-set, and without any a priori knowledge of said class-other.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and method for extracting “discriminately informative features” from input patterns which provide accurate discrimination between two classes, a class-of-interest and a class-other, while reducing the number of features under the condition where training samples or otherwise, are provided a priori only for the class-of-interest thus eliminating the requirement for any a priori knowledge of the other classes in the input-data-set while exploiting the potentially robust and powerful feature extraction capability provided by fully supervised feature extraction approaches. The system and method extracts discriminate features by exploiting the ability of the adaptive Bayes classifier to define an optimal Bayes decision boundary between the class-of-interest and class-other using only labeled samples from the class-of-interest and unlabeled samples from the data to be classified. Optimal features are derived from vectors normal to the decision boundary defined by the adaptive Bayes classifier.

Citations

27 Claims

1. A computer-implemented method for extracting discriminately informative features from input patterns, which provide discrimination between two classes, a class-of-interest and a class-other, while reducing the number of features, comprising the steps of:
- receiving a training set of class-of-interest patterns, a set of unlabeled patterns from an input-data-set, and an estimate of a class-of-interest a priori probability in said input-data-set, said input-data-set being at least one of an image, video or speech data set;
  
  selecting elements of a predetermined polynomial function;
  
  executing a training stage using said class-of-interest a priori probability, said training set of class-of-interest patterns, and said unlabeled patterns from said input-data-set, said training stage including a step of selecting a set of weights for said polynomial function that ensure a least squares approximation of a class-of-interest posterior distribution function using said polynomial function;
  
  classifying said pattern from said input-data-set as being either said class-of-interest or said class-other in accordance with a conditional test defined by an adaptive Bayes decision rule;
  
  extracting a predetermined percent of said classified patterns that lie near a decision boundary;
  
  locating points lying on said decision boundary using said extracted patterns that lie near said decision boundary;
  
  calculating normal vectors to said decision boundary using said points lying on said decision boundary;
  
  calculating an effective decision boundary feature matrix;
  
  calculating eigenvalues, eigenvectors, and a rank of said effective decision boundary feature matrix;
  
  selecting a set of said eigenvectors for use in a feature extraction matrix; and
  
  extracting a reduced set of features using said feature extraction matrix,whereby said discriminately informative features are extracted from input patterns which provide discrimination between a class-of-interest and a class-other while reducing the number of features, using only said training set of class-of-interest patterns, and said unlabeled patterns from said input-data-set, and without any a priori knowledge of said class-other.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. The method of claim 1 wherein said step of selecting the elements of a predetermined polynomial function includes the step of specifying the elements of a vector defining said polynomial function which is a function of the polynomial order and the measurements and has the form
    F(X)=(f(X)₁, f(X)₂, . . . f(X)_n)^Twhere F(X) is as vector containing polynomial elements, f(X), which are functions of the measurements.
  - 3. The method of claim 1 wherein said step of selecting said set of weights for said polynomial function elements includes a step of estimating said set of weights using the expression
  - 4. The method of claim 1 wherein said step of classifying said pattern from said input-data-set as being said class-of-interest or said class-other includes a step of estimating a value for said class-of-interest posterior probability, {circumflex over (P)}(C_int/X), for said pattern using the expression
    {circumflex over (P)}(C_int/X)≅
    - A^TF(X)where F(X) is said vector containing functions of the measurements as defined by said polynomial elements
      F(X)=(f(X)₁, f(X)₂, . . . f(X)_n)^Tand A is a vector of weights for the f(X)'"'"'s
      A=(a₁, a₂, . . . a_n)^Tand A^TF(X) is the said estimated value of said class-of-interest posterior probability, {circumflex over (P)}(C_int/X).
  - 5. The method of claim 1 wherein said step of classifying said pattern from said input-data-set includes a step of assigning said pattern to either said class-of-interest or said class-other based on said value of said class-of-interest posterior probability, {circumflex over (P)}(C_int/X), and a conditional test defined by said adaptive Bayes decision rule, as defined in the following expression
    If:
    - A^TF(X)≧
      
      ½
      
      ,Classify X as the class-of-interestOtherwise, classify X as class-otherwhere A^TF(X) is the said estimated value of said class-of-interest posterior probability, {circumflex over (P)}(C_int/X).
  - 6. The method of claim 1 wherein said step of extraction of said predetermined percent of said patterns that lie near the said decision boundary includes a step of selection of patterns lying near the decision boundary using the following criterion
    If |A^TF(X)−
    - ½
      
      |<
      
      t,Identify “
      
      X”
      
      as laying “
      
      near”
      
      the decision boundarywhere the threshold “
      
      t”
      
      is selected using a means that ensures that a predetermined percent of samples near the decision boundary are identified.
  - 7. The method of claim 1 wherein said step of location of said point lying on said decision boundary using said patterns that lie near said decision boundary includes a step of interpolating between points on opposite sides of the decision boundary using the following expressions
    X₀=μ
    - V+V₀where
      V₀=X_int
      V=X_other−
      
      X_intwhere the point, X₀, lying on the decision boundary is found by incrementing μ
      
      above, to find the point where
      A^TF(X)−
      
      ½
      
      =0
  - 8. The method of claim 1 wherein said step of calculation of said normal vectors to said decision boundary using said points lying on said decision boundary includes the step of calculating a unit normal vector to said decision boundary at each said decision boundary point X₀using the gradient operator.
  - 9. The method of claim 1 wherein said step of calculation of said effective decision boundary feature matrix, M_EDBFM, includes the step of estimation of M_EDBFMusing the expression
  - 10. The method of claim 1 wherein said step of calculation of said eigenvalues, said eigenvectors, and said rank of said effective decision boundary feature matrix includes the step of evaluating the square matrix M_EDBFM, to find d orthogonal said eigenvectors ν
    - and associated said eigenvalues λ
      
      , that satisfy the relation
      M_EDBF·
      
      ν
      
      =λ
      
      ·
      
      ν
      
      where d is the number of dimensions of the original space and the rank of M_EDBFMis the order of the highest order, non-vanishing determinant, within said M_EDBFMmatrix.
  - 11. The method of claim 1 wherein said step of selection of a set of said eigenvectors for using in said feature extraction matrix includes the step of using a means to determine the number of features to be extracted from the input-data-set measurements where said means causes said eigenvectors with low said eigenvalues to be removed from said M_EDBFMmatrix to produce a new eigenvector matrix V_Rwith reduce number of columns or
  - 12. The method of claim 1 wherein said step of extraction of a reduce set of features using said reduced feature extraction matrix includes a step of processing said measurement vectors from said input-data-set to provide a new measurement vector Y with reduce dimensionality where
    Y=V_R^T·
    - Xwhere Y is said reduced dimension measurement vector, V_Ris said reduced matrix of eigenvectors, and X is a measurement vector from said input-data-set.

13. A computer-implemented method for extracting discriminately informative features from input patterns, which provide discrimination between a class-of-interest and a class-other while reducing the number of features, comprising the steps of:
- receiving a training set of class-of-interest patterns, a set of unlabeled patterns from an input-data-set, and an estimate of a class-of-interest a priori probability in said input-data-set, said input-data-set being at least one of an image, video or speech data set;
  
  selecting a predetermined number of Gaussian kernel densities functions;
  
  selecting parameter values for said Gaussian kernel densities functions where said selected parameter values cause said Gaussian kernel densities to approximate the probability density function of said input-data-set;
  
  executing a training stage using said a priori probability of said class-of-interest, said training set of class-of-interest patterns, and said unlabeled patterns from said input-data-set, said training stage including a step of least squares approximation of a class-of-interest posterior distribution function using a linear combination of weighted said Gaussian kernel density functions;
  
  classifying said pattern from said input-data-set as being either said class-of-interest or said class-other in accordance with a conditional test defined by a adaptive Bayes decision rule;
  
  extracting a predetermined percent of said classified patterns that lie near a decision boundary;
  
  locating points lying on said decision boundary using said extracted patterns that lie near said decision boundary;
  
  calculating normal vectors to said decision boundary using said points lying on said decision boundary;
  
  calculating an effective decision boundary feature matrix;
  
  calculating eigenvalues, eigenvectors, and rank of said effective decision boundary feature matrix;
  
  selecting a set of eigenvectors for use in a feature extraction matrix; and
  
  extracting a reduced set of features using said feature extraction matrix,whereby said discriminately informative features are extracted from input patterns which provide discrimination between a class-of-interest and a class-other while reducing the number of features, using only said training set of class-of-interest patterns, and said unlabeled patterns from said input-data-set, and without any a priori knowledge of said class-other.
- View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27)
- - 14. The method of claim 13 wherein said step of a means for selecting said parameter values for said Gaussian kernel densities functions includes a step of randomly selecting said predetermined number of said patterns from said input-data-set and setting the mean vector of each said Gaussian kernel density function equal to one of said selected patterns.
  - 15. The method of claim 13 wherein said step of a means for selecting said parameter values for said Gaussian kernel densities functions includes a step of specifying a kernel smoothing parameter value for each said Gaussian kernel density function, with said kernel smoothing parameter value estimated using the following iterative equation
  - 16. The method of claim 13 wherein said step of executing said training stage includes a step of providing a plurality of said weights for each said Gaussian kernel density function.
  - 17. The method of claim 16 wherein said step of providing a plurality of said weights for each said Gaussian kernel density function includes a step of providing a first weight for each said Gaussian kernel density function whose value is equal to the inverse of the probability of said pattern in said input-data-set.
  - 18. The method of claim 16 wherein said step of providing a plurality of said weights for each said Gaussian kernel density function includes a step of providing a second weight for each said Gaussian kernel density function whose value is selected to ensure that said linear combination of weighted Gaussian kernel density functions approximates said class-of-interest posterior distribution function in a least squares sense using the expression
  - 19. The method of claim 13 wherein said step of classifying said pattern from said input-data-set as being said class-of-interest or said class-other includes a step of estimating a value for said class-of-interest posterior probability, {circumflex over (P)}(C_int/X), for said pattern using the expression
    {circumflex over (P)}(C_int/X)≅
    - A^TF(X)where
  - 20. The method of claim 13 wherein said step of classifying said pattern from said input-data-set includes a step of assigning said pattern to either said class-of-interest or said class-other based on said value of said class-of-interest posterior probability, {circumflex over (P)}(C_int/X), and a conditional test defined by said adaptive Bayes decision rule, as defined in the following expression
    If:
    - A^TF(X)≧
      
      ½
      
      ,Classify X as the class-of-interestOtherwise, classify X as class-otherwhere A^TF(X) is the said estimated value of said class-of-interest posterior probability, {circumflex over (P)}(C_int/X).
  - 21. The method of claim 13 wherein said step of extraction of said predetermined percent of said patterns that lie near the said decision boundary includes a step of selection of said patterns lying near said decision boundary using the following criterion
    If |A^TF(X)−
    - ½
      
      |<
      
      t,Identify “
      
      X”
      
      as laying “
      
      near”
      
      the decision boundarywhere the threshold “
      
      t”
      
      is selected using a means that ensures that a predetermined percent of samples near the decision boundary are selected.
  - 22. The method of claim 13 wherein said step of location of said point lying on said decision boundary using said patterns that lie near said decision boundary includes a step of interpolating between said patterns lying on said opposite sides of said decision boundary using the following expressions
    X₀=μ
    - V+V₀where
      V₀=X_int
      V=X_other−
      
      X_intwhere the point, X₀, lying on the decision boundary is found by incrementing μ
      
      above, to find the point where
      A^TF(X)−
      
      ½
      
      =0
  - 23. The method of claim 13 wherein said step of calculation of said normal vectors to said decision boundary using said points lying on said decision boundary includes the step of calculating a unit normal vector to said decision boundary at each said decision boundary point X₀using the following expression
  - 24. The method of claim 13 wherein said step of calculation of said effective decision boundary feature matrix, M_EDBFM, includes the step of estimation of M_EDBFMusing the expression
  - 25. The method of claim 13 wherein said step of calculation of said eigenvalues, said eigenvectors, and said rank of said effective decision boundary feature matrix includes the step of evaluating the square matrix M_EDBFM, to find d orthogonal said eigenvectors ν
    - and associated said eigenvalues λ
      
      , that satisfy the relation
      M_EDBF·
      
      ν
      
      =λ
      
      ·
      
      ν
      
      where d is the number of dimensions of the original space and said rank of said M_EDBFMmatrix is the order of the highest order, non-vanishing determinant, within said M_EDBFMmatrix.
  - 26. The method of claim 13 wherein said step of selection of a set of said eigenvectors for using in said feature extraction matrix includes the step of determining the number of said features to be extracted from said input-data-set measurements with said eigenvectors with low eigenvalues being removed from the eigenvector matrix of M_EDBFMto produce a new eigenvector matrix V_Rwith reduce number of columns or
  - 27. The method of claim 13 wherein said step of extraction of a reduce set of features using said feature extraction matrix includes a step of processing said measurement vectors from the input-data-set to provide a new measurement vector Y with reduce dimensionality where
    Y=V_R^T·
    - Xwhere Y is said reduced dimension measurement vector, V_Ris said reduced matrix of eigenvectors, and X is a said measurement vector from said input-data-set.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Thomas Cecil Minter
Original Assignee
Thomas Cecil Minter
Inventors
Minter, Thomas Cecil
Primary Examiner(s)
PERUNGAVOOR, SATHYANARAYA V

Application Number

US12/011,518
Time in Patent Office

1,233 Days
Field of Search

382/224
US Class Current

382/224
CPC Class Codes

G06F 18/2135   based on approximation crit...

G06F 18/2155   characterised by the incorp...

G06F 18/24155   Bayesian classification

G06F 18/2433   Single-class perspective, e...

G06F 18/245   relating to the decision su...

G06V 10/7753   Incorporation of unlabelled...

Adaptive bayes feature extraction

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

Citations

27 Claims

Specification

Solutions

Use Cases

Quick Links

Adaptive bayes feature extraction

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

27 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links