Scalable set oriented classifier
First Claim
1. A method for classifying set-oriented data in a computer by generating a classification tree, the computer being coupled to a data storage device for storing the set-oriented data, the method comprising the steps of:
- storing the set-oriented data as a table in a relational database in the data storage device coupled to the computer, the table being comprised of rows having attributes and node identifiers, wherein each node identifier indicates a node in the classification tree to which a row belongs;
iteratively performing a sequence of steps in the computer until all of the rows have been classified, the sequence of steps comprising;
determining a gini index value for each split value of each attribute for each node that can be partitioned in the classification tree;
selecting an attribute and a split value for each node that can be partitioned based on the determined gini index value corresponding to the split value of the attribute; and
growing the classification tree by a new level based on the selected attribute and split value for each node that can be partitioned, further comprising;
using the node identifier associated with a row to locate a node in the classification tree;
identifying the selected split value for that node;
applying the split value to the row; and
updating the node identifier according to the result of the split test.
1 Assignment
0 Petitions
Accused Products
Abstract
A method, apparatus, and article of manufacture for a computer implemented scaleable set-oriented classifier. The scalable set-oriented classifier stores set-oriented data as a table in a relational database. The table is comprised of rows having attributes. The scalable set-oriented classifier classifies the rows by building a classification tree. The scalable set-oriented classifier determines a gini index value for each split value of each attribute for each node that can be partitioned in the classification tree. The scalable set-oriented classifier selects an attribute and a split value for each node that can be partitioned based on the determined gini index value corresponding to the split value. Then, the scalable set-oriented classifier grows the classification tree by another level based on the selected attribute and split value for each node. The scalable set-oriented classifier repeats this process until each row of the table has been classified in the classification tree.
-
Citations
21 Claims
-
1. A method for classifying set-oriented data in a computer by generating a classification tree, the computer being coupled to a data storage device for storing the set-oriented data, the method comprising the steps of:
-
storing the set-oriented data as a table in a relational database in the data storage device coupled to the computer, the table being comprised of rows having attributes and node identifiers, wherein each node identifier indicates a node in the classification tree to which a row belongs; iteratively performing a sequence of steps in the computer until all of the rows have been classified, the sequence of steps comprising; determining a gini index value for each split value of each attribute for each node that can be partitioned in the classification tree; selecting an attribute and a split value for each node that can be partitioned based on the determined gini index value corresponding to the split value of the attribute; and growing the classification tree by a new level based on the selected attribute and split value for each node that can be partitioned, further comprising; using the node identifier associated with a row to locate a node in the classification tree; identifying the selected split value for that node; applying the split value to the row; and updating the node identifier according to the result of the split test. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. An apparatus for classifying set-oriented data, comprising:
-
a computer coupled to a data storage device for storing the set-oriented data; means, performed by the computer, for storing the set-oriented data as a table in a relational database in the data storage device coupled to the computer, the table being comprised of rows having attributes and node identifiers, wherein each node identifier indicates a node in the classification tree to which a row belongs; and means, performed by the computer, for performing a sequence of steps in the computer until all of the rows of the table have been classified in a classification tree, further comprising; means, performed by the computer, for determining a gini index value for each split value of each attribute for each node that can be partitioned in the classification tree; means, performed by the computer, for selecting an attribute and a split value for the attribute for each node that can be partitioned based on the determined gini index value corresponding to the split value of the attribute; and means, performed by the computer, for growing the classification tree by a new level based on the selected attribute and split value for each node that can be partitioned, further comprising; means, performed by the computer, for using the node identifier associated with a row to locate a node in the classification tree; means, performed by the computer, for identifying the selected split value for that node; means, performed by the computer, for applying the split value to the row; and means, performed by the computer, for updating the node identifier according to the result of the split test. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A program storage device, readable by a computer, tangibly embodying one or more programs of instructions executable by the computer to perform method steps of a classification method for classifying set-oriented data by generating a classification tree, the computer the computer being coupled to a data storage device for storing the set-oriented data, the method comprising the steps of:
-
storing the set-oriented data as a table in a relational database in the data storage device coupled to the computer, the table being comprised of rows having attributes and node identifiers, wherein each node identifier indicates a node in the classification tree to which a row belongs; and iteratively performing a sequence of steps in the computer until all of the rows have been classified, the sequence of steps comprising; determining a gini index value for each split value of each attribute for each node that can be partitioned in the classification tree; selecting an attribute and a split value for each node that can be partitioned based on the determined gini index value corresponding to the split value of the attribute; and growing the classification tree by a new level based on the selected attribute and split value for each node that can be partitioned, further comprising; using the node identifier associated with a row to locate a node in the classification tree; identifying the selected split value for that node; applying the split value to the row; and updating the node identifier according to the result of the split test. - View Dependent Claims (16, 17, 18, 19, 20, 21)
-
Specification