Segmenting information records with missing values using multiple partition trees
First Claim
1. A method for classifying an information record when said record is incomplete, said method comprising the steps of:
- a) receiving a record comprising a plurality of variables, wherein said record comprises information for a first portion of said variables and wherein information for a second portion of said variables is incomplete;
b) using a first classification tool to classify said record according to said information from said first portion of said variables; and
c) using a second classification tool to classify said record when said first classification tool requires a particular item of information that is missing from said second portion of said variables.
2 Assignments
0 Petitions
Accused Products
Abstract
A method and system for predicting the class membership of a record where information for one or more variables in the record is missing. Multiple classification trees are generated. A first classification tree is computed using a substantially complete set of information for all of the variables. Other classification trees are computed for different subsets of the variables. Variables are selected for inclusion in a subset based on how strongly they influence the prediction of class membership. The first classification tree (based on the substantially complete set of information) is applied to a record with missing information. If missing information is needed by this tree in order to classify the record, another classification tree that is not based on the missing variable is selected. The class membership for a record with information missing is predicted more accurately without substantially increasing the complexity of the prediction.
-
Citations
22 Claims
-
1. A method for classifying an information record when said record is incomplete, said method comprising the steps of:
-
a) receiving a record comprising a plurality of variables, wherein said record comprises information for a first portion of said variables and wherein information for a second portion of said variables is incomplete;
b) using a first classification tool to classify said record according to said information from said first portion of said variables; and
c) using a second classification tool to classify said record when said first classification tool requires a particular item of information that is missing from said second portion of said variables. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A computer system comprising:
-
a bus;
a memory unit coupled to said bus; and
a processor coupled to said bus, said processor for executing a method for classifying an information record when said record is incomplete, said method comprising the steps of;
a) receiving a record comprising a plurality of variables, wherein said record comprises information for a first portion of said variables and wherein information for a second portion of said variables is incomplete;
b) using a first classification tool to classify said record according to said information from said first portion of said variables; and
c) using a second classification tool to classify said record when said first classification tool requires a particular item of information that is missing from said second portion of said variables. - View Dependent Claims (9, 10, 11, 12, 13, 14, 16, 17, 18, 19, 20, 21)
-
-
15. A computer-usable medium having computer-readable program code embodied therein for causing a computer system to perform the steps of:
-
a) receiving a record comprising a plurality of variables, wherein said record comprises information for a first portion of said variables and wherein information for a second portion of said variables is incomplete;
b) using a first classification tool to classify said record according to said information from said first portion of said variables; and
c) using a second classification tool to classify said record when said first classification tool requires a particular item of information that is missing from said second portion of said variables.
-
-
22. A method for classifying an information record when said record is incomplete, wherein said record comprises a plurality of variables, said method comprising the steps of:
-
a) ranking said plurality of variables according to their respective influence on said classifying;
b) grouping said plurality of variables into subsets of variables using said ranking, wherein a classification tree is computed for each of said subsets;
c) receiving a record comprising information for a first portion of said variables, wherein information for a second portion of said variables is incomplete;
d) using a first classification tree to classify said record according to said information from said first portion of said variables, wherein said first classification tree is based on a substantially complete set of information for said plurality of variables; and
e) using a second classification tree to classify said record when said first classification tool requires a particular item of information that is missing from said second portion of said variables, wherein said second classification tree is based on information for one of said subsets of variables of said step b), wherein said one of said subsets does not include said particular item of information that is missing.
-
Specification