Machine learning method
First Claim
1. A method utilizing machine-readable data storage of a set of machine-executable instructions for using a data processing system to perform a method of machine learning in which the learning is an assignment of a feature vector to a classification carried out by recursively creating and using first, a binary classifier tree having nodes, which nodes further comprise branch nodes and leaf nodes, and second, an Bayesian classifier and the method comprising the steps of:
- a. using a binary tree classifier to create a node, wherein the node comprises training data having a dataset size, the training data comprising multiple sets of feature vectors and corresponding classifications;
b. hypothesizing the node just constructed is a leaf node;
c. constructing a Bayesian leaf node classifier for the node;
d. testing the hypothesis by applying the leaf node classifier against the training data and if the hypothesis is correct then the node is a leaf node, if the hypothesis fails, then the node is a branch node;
e. creating, for each node determined to be a branch node, a branch node hyperplane comprising a hyperplane point and a hyperplane normal;
f. splitting for each node determined to be a branch node, the training data for the branch node into two subsets, a left subset and a right subset, according to which side of the branch node hyperplane each element of training data resides; and
g. passing the subsets to the next nodes recursively to create a tree.
1 Assignment
0 Petitions
Accused Products
Abstract
A method of machine learning utilizing binary classification trees combined with Bayesian classifiers in which a training component includes creating nodes, utilizing the expectation maximization algorithm to create statistical kernels and mixtures for leaf nodes and kappa statistics to identify nodes as leaf nodes or branch nodes, creating hyperplanes for branch nodes, splitting the training data for branch nodes into subsets to be provided to subtrees, and in which an operating component traverses a binary classifier tree with a feature vector to be classified by determining for each branch node whether the feature vector lies to the left or the right of the branch node hyperplane and finally classifies the feature vector upon arriving at a leaf node by computing log likelihoods for the feature vector for each mixture in the leaf node and determining the classification of the feature vector according to the highest log likelihood, in which the preferred embodiment described is classification of internal combustion engine cylinder firings as nominal firings or misfires.
47 Citations
8 Claims
-
1. A method utilizing machine-readable data storage of a set of machine-executable instructions for using a data processing system to perform a method of machine learning in which the learning is an assignment of a feature vector to a classification carried out by recursively creating and using first, a binary classifier tree having nodes, which nodes further comprise branch nodes and leaf nodes, and second, an Bayesian classifier and the method comprising the steps of:
-
a. using a binary tree classifier to create a node, wherein the node comprises training data having a dataset size, the training data comprising multiple sets of feature vectors and corresponding classifications;
b. hypothesizing the node just constructed is a leaf node;
c. constructing a Bayesian leaf node classifier for the node;
d. testing the hypothesis by applying the leaf node classifier against the training data and if the hypothesis is correct then the node is a leaf node, if the hypothesis fails, then the node is a branch node;
e. creating, for each node determined to be a branch node, a branch node hyperplane comprising a hyperplane point and a hyperplane normal;
f. splitting for each node determined to be a branch node, the training data for the branch node into two subsets, a left subset and a right subset, according to which side of the branch node hyperplane each element of training data resides; and
g. passing the subsets to the next nodes recursively to create a tree. - View Dependent Claims (2, 3, 4, 5)
a. constructing a leaf node classifier for all of the training data in the node by use of the Expectation Maximization “
EM”
algorithm;
b. forming a confusion matrix comprising m2 numbers, where m is the number of classes represented by mixtures in a leaf node classifier and ijth element of the matrix comprises the number of training feature vectors belonging to class i and classified in class j;
c. computing the kappa statistic from the numbers in the confusion matrix;
d. comparing the kappa statistic to the kappa threshold;
e. determining the node to be a branch node if the kappa statistic is less than the kappa threshold;
f. comparing the dataset size of the training data to the minimum dataset size; and
g. determining the node to be a branch node if the dataset size of the training data is less than the minimum dataset size.
-
-
3. The method of claim 2 wherein the step of creating a branch node hyperplane further comprises the steps of:
-
a. constructing a branch mixture comprising two kernels, a first branch mixture kernel and a second branch mixture kernel, wherein the branch mixture is constructed by using the EM algorithm with all the training data in the branch node;
b. attempting to calculate the hyperplane point by identifying the point along the line connecting the centers of the two branch mixture kernels such that the kernel log likelihoods at the hyperplane point are equal, wherein this step fails if the hyperplane point does not exist;
c. constructing, if the hyperplane point exists, the equivocation hyperconic section comprising a locus of points for which Mahanoblis distances between the points and the branch mixture kernels are the same;
d. calculating, if the hyperplane point exists, the hyperplane normal as the normal to the equivocation hyperconic section at the hyperplane point; and
e. determining the node to be a leaf node if the hyperplane point does not exist.
-
-
4. The method of claim 3 wherein the step of splitting the training data further comprises the steps of:
-
a. computing for each feature vector in the training data a difference vector between the hyperplane point and the feature vector;
b. calculating an inner product of the hyperplane normal and the difference vector calculated in step a;
c. identifying an algebraic sign of the inner product; and
d. determining the side to be left if the algebraic sign is negative and right if the algebraic sign is positive.
-
-
5. The method of claim 1 further comprising the steps of:
-
a. traversing the classifier tree with a feature vector to be classified until a leaf node is reached, wherein the tree is traversed to the left or to the right at each branch node according to whether the feature vector resides to the left or the right of the branch node hypetplane;
b. deciding, upon arriving at a leaf node, how to classify the feature vector, wherein the decision how to classify the feature vector is made by computing log likelihoods for the feature vector for each mixture in the leaf node classifier and classifying the feature vector in the class identified by the mixture having the highest log likelihood; and
c. whereby a feature vector is classified.
-
-
6. A method utilizing machine-readable data storage of a set of machine-executable instructions for using a data processing system to perform a method of machine learning in which the learning is an assignment of a feature vector to a classification carried out by creating and using first a binary classifier tree having nodes, which nodes further comprise branch nodes and leaf nodes and second a Bayesian classifier to obtain a result with, the method comprising the steps of:
-
a. determining a kappa threshold and a minimum dataset size;
b. using a binary tree classifier to create a node, wherein the node comprises training data having a dataset size, the training data comprising multiple sets of feature vectors and corresponding classifications;
c. assuming that the node just constructed is a leaf node;
d. constructing a Bayesian leaf node classifier for all of an training data in the node by use of the Expectation Maximization “
EM”
algorithm;
e. forming a confusion matrix comprising m2 numbers, where m is the number of classes represented by mixtures in a leaf node classifier and the ijth element of the matrix comprises the number of training feature vectors belonging to class i and classified in class j;
f. computing a kappa statistic from the numbers in the confusion matrix;
g. comparing the kappa statistic to the kappa threshold;
h. comparing the dataset size of the training data to the minimum dataset size; and
i. determining the node to be a branch node if the dataset size of the training data is exceeds the minimum dataset size;
j. creating for each node determined to be a branch node, a branch node hyperplane comprising a hyperplane point and a hyperplane normal, this step further comprising the steps of;
(1) constructing a branch mixture comprising two kernels, a first branch mixture kernel and a second branch mixture kernel, wherein the branch mixture is constructed by using the EM algorithm with all the training data in the branch node;
(2) calculating the hyperplane point by identifying the point along the line connecting the centers of the two branch mixture kernels such that the kernel log likelihoods at the hyperplane point are equal, wherein this step fails if the hyperplane point does not exist;
(3) constructing, if the hyperplane point exists, the equivocation hyperconic section comprising a locus of points for which Mahanoblis distances between the points and the branch mixture kernels are the same; and
(4) calculating the hyperplane normal as the normal to the equivocation hypereonic section at the hyperplane point;
k. determining the node to be a leaf node if the hyperplane point does not exist;
l. splitting the training data for the branch node into two subsets, a left subset and a right subset, according to which side of the branch node hyperplane each element of training data resides, this step comprising the further steps of;
(1) computing for each feature vector in the training data a difference vector between the hyperplane point and the feature vector;
(2) calculating an inner product of the hyperplane normal and the difference vector calculated in step a;
(3) identifying an algebraic sign of the inner product;
(4) determining the side to be left if the algebraic sign is negative and right if the algebraic sign is positive;
m. passing the subsets to the next nodes recursively to be created by passing the left subset to a next left node and the right subset to a next right node;
n. creating, for each node determined to be a branch node, a left node pointer and a right node pointer for pointing to a next layer of nodes recursively to be created in the binary classifier tree; and
o. repeating steps b through h recursively until the application of step c or step e results in a determination that the current node is a leaf node. - View Dependent Claims (7, 8)
a. traversing the classifier tree with a feature vector to be classified until a leaf node is reached, this step comprising the further steps of;
(1) calculating for a branch node the inner product of the hyperplane normal with the difference between the hyperplane point and the feature vector;
(2) determining an algebraic sign of the inner product;
(3) traversing the classifier tree to a next node, wherein the traversal is to the left of the algebraic sign is negative and to the right if the algebraic sum is positive;
(4) repeating steps 7.a (1) through 7.a. (3) until a leaf node is reached;
b. deciding, upon arriving at a leaf node, how to classify the feature vector, this step comprising the further steps of;
(1) computing a kernel log likelihood for the feature vector with respect to each kernel in each mixture in the leaf node classifier, wherein the kernel log likelihood is computed by use of the Mahanoblis distance between the feature vector and the kernel, the size of the kernel, and normalizing factors, wherein the kernel log likelihood of a feature vector x with respect to a kernel with mean m, covariance C and dimension n is;
-
-
8. The method of claim 6 further comprising the step of determining how a feature vector is to be classified in a class identified by a mixture in the leaf node classifier, this step comprising the further steps of:
-
a. traversing the classifier tree with a feature vector to be classified until a leaf node is reached, this step comprising the further steps of;
(1) calculating an inner product of the hyperplane normal and the difference vector calculated in step a;
(2) determining an algebraic sign of the inner product;
(3) traversing the classifier tree to a next node, wherein the traversal is to the left if the algebraic sign is negative and to the right if the algebraic sum is positive;
(4) repeating steps 8.a. (1) through 8.a. (3) until a leaf node is reached;
b. deciding, upon arriving at a leaf node, how to classify the feature vector, this step comprising the further steps of;
(1) Computing a mixture log likelihood for each mixture in the leaf node classifier, this step comprising the further steps of;
1. computing a kernel log likelihood for the feature vector with respect to each kernel in the mixture, wherein the kernel log likelihood is computed by use of the Mahanoblis distance between the feature vector and a kernel, and a kernel size, wherein the kernel log likelihood of a feature vector x with respect to a kernel with mean m, covariance C, and dimension n is;
-
Specification