Method and apparatus for representing and generating evaluation functions in a data classification system
First Claim
1. A method for partitioning a domain dataset, said domain dataset having a plurality of examples, each of said examples characterized by at least one feature and one class value, said feature having a plurality of possible feature values, said method comprising the steps of:
- establishing an evaluation function to partition said domain dataset, wherein said evaluation function includes a class uniformity measure and a discrimination power measure, and a weight for each of said class uniformity and discrimination power measures; and
partitioning said domain dataset using said evaluation function.
1 Assignment
0 Petitions
Accused Products
Abstract
A unified framework is disclosed for representing and generating evaluation functions for a classification system. The disclosed unified framework provides evaluation functions having characteristics of both traditional or purity-based evaluation functions (class uniformity) and discrimination-based evaluation functions (discrimination power). The disclosed framework is based on a set of configurable parameters and is a function of the distance between examples. By varying the choice of parameters and the distance function, more emphasis is placed on either the class uniformity or the discrimination power of the induced example subsets. A user-configurable function is used to score each of the features based on the class uniformity and discrimination power measures and thereby select the feature having a highest score to partition the data (e.g., using a decision tree or rule-base). This process is recursively applied until all of the examples are partitioned.
18 Citations
31 Claims
-
1. A method for partitioning a domain dataset, said domain dataset having a plurality of examples, each of said examples characterized by at least one feature and one class value, said feature having a plurality of possible feature values, said method comprising the steps of:
-
establishing an evaluation function to partition said domain dataset, wherein said evaluation function includes a class uniformity measure and a discrimination power measure, and a weight for each of said class uniformity and discrimination power measures; and
partitioning said domain dataset using said evaluation function. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A method for partitioning a domain dataset, said domain dataset having a plurality of examples, each of said examples characterized by at least one feature and one class value, said feature having a plurality of possible feature values, said method comprising the steps of:
-
evaluating a class uniformity measure for each of said examples for every feature value;
evaluating a discrimination power measure for each of said examples for every feature value;
determining a score for each of said features using a function that considers both said class uniformity measure and said discrimination power measure;
selecting a feature having a highest score to use to partition said data; and
recursively applying said two evaluating steps and said determining and selecting steps until all of said examples are partitioned. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18, 19)
-
-
20. A method for establishing an evaluation function for partitioning a domain dataset, said domain dataset having a plurality of examples, each of said examples characterized by at least one feature and one class value, said feature having a plurality of possible feature values, said method comprising the steps of:
-
providing one or more configurable parameters that evaluate a class uniformity measure and a discrimination power measure and provide a weight for each of said class uniformity and discrimination power measures; and
providing a configurable function that is based on said class uniformity measure and said discrimination power measure to determine a score for each of said features, said score used to identify a feature to partition said domain dataset. - View Dependent Claims (21, 22, 23, 24, 25)
-
-
26. A system for partitioning a domain dataset, said domain dataset having a plurality of examples, each of said examples characterized by at least one feature and one class value, said feature having a plurality of possible feature values, comprising:
-
a memory that stores computer-readable code; and
a processor operatively coupled to said memory, said processor configured to implement said computer-readable code, said computer-readable code configured to;
establish an evaluation function to partition said domain dataset, wherein said evaluation function includes a class uniformity measure and a discrimination power measure, and a weight for each of said class uniformity and discrimination power measures; and
partition said domain dataset using said evaluation function.
-
-
27. A system for partitioning a domain dataset, said domain dataset having a plurality of examples, each of said examples characterized by at least one feature and one class value, said feature having a plurality of possible feature values, comprising:
-
a memory that stores computer-readable code; and
a processor operatively coupled to said memory, said processor configured to implement said computer-readable code, said computer-readable code configured to;
evaluate a class uniformity measure for each of said examples for every feature value;
evaluate a discrimination power measure for each of said examples for every feature value;
determine a score for each of said features using a function that considers both said class uniformity measure and said discrimination power measure;
select a feature having a highest score to use to partition said data; and
recursively apply said two evaluating steps and said determining and selecting steps until all of said examples are partitioned.
-
-
28. A system for establishing an evaluation function for partitioning a domain dataset, said domain dataset having a plurality of examples, each of said examples characterized by at least one feature and one class value, said feature having a plurality of possible feature values, comprising:
-
a memory that stores computer-readable code; and
a processor operatively coupled to said memory, said processor configured to implement said computer-readable code, said computer-readable code configured to;
provide one or more configurable parameters that evaluate a class uniformity measure and a discrimination power measure and provide a weight for each of said class uniformity and discrimination power measures; and
provide a configurable function that is based on said class uniformity measure and said discrimination power measure to determine a score for each of said features, said score used to identify a feature to partition said domain dataset.
-
-
29. An article of manufacture for partitioning a domain dataset, said domain dataset having a plurality of examples, each of said examples characterized by at least one feature and one class value, said feature having a plurality of possible feature values, comprising:
a computer readable medium having computer readable code means embodied thereon, said computer readable program code means comprising;
a step to establish an evaluation function to partition said domain dataset, wherein said evaluation function includes a class uniformity measure and a discrimination power measure, and a weight for each of said class uniformity and discrimination power measures; and
a step to partition said domain dataset using said evaluation function.
-
30. An article of manufacture for partitioning a domain dataset, said domain dataset having a plurality of examples, each of said examples characterized by at least one feature and one class value, said feature having a plurality of possible feature values, comprising:
a computer readable medium having computer readable code means embodied thereon, said computer readable program code means comprising;
a step to evaluate a class uniformity measure for each of said examples for every feature value;
a step to evaluate a discrimination power measure for each of said examples for every feature value;
a step to determine a score for each of said features using a function that considers both said class uniformity measure and said discrimination power measure;
a step to select a feature having a highest score to use to partition said data; and
a step to recursively apply said two evaluating steps and said determining and selecting steps until all of said examples are partitioned.
-
31. An article of manufacture for establishing an evaluation function for partitioning a domain dataset, said domain dataset having a plurality of examples, each of said examples characterized by at least one feature and one class value, said feature having a plurality of possible feature values, comprising:
a computer readable medium having computer readable code means embodied thereon, said computer readable program code means comprising;
a step to provide one or more configurable parameters that evaluate a class uniformity measure and a discrimination power measure and provide a weight for each of said class uniformity and discrimination power measures; and
a step to provide a configurable function that is based on said class uniformity measure and said discrimination power measure to determine a score for each of said features, said score used to identify a feature to partition said domain dataset.
Specification