System and method for assessing categorization rule selectivity
First Claim
1. A system for assessing the selectivity of categorization rules, the system comprising:
- a computer system including at least one processor, a non-transitory data storage medium interfaced with the at least one processor, and input/output facilities, the data storage medium containing instructions that, when executed by the at least one processor, implement;
a categorization rule application engine configured to apply at least one categorization rule to a set of un-categorized objects to produce a categorization result set representing assignment of objects of the set into at least two categories into which the objects of the set are divided when the categorization rule is applied, the categorization rule application engine further configured to gather statistical information relating to the categorization result set based on properties of objects assigned to each of the at least two categories, and including at least one rule-specific aggregating statistic characterizing the application of to the categorization rule to all of the objects and at least one categorization-specific statistic characterizing the objects of one of the at least two categories;
a selectivity determination engine configured toassess a numerical selectivity score for the at least one categorization rule, the numerical selectivity score representing an estimation of selectivity accuracy of the at least one categorization rule to provide an evaluation of the at least one categorization rule, the numerical selectivity score being calculated by the application of at least one trained selectivity determination algorithm to the statistical information including the at least one rule-specific aggregating statistic representing information on the set of files belonging to each of the categories defined in the categorization rule, the application of the at least one trained selectivity determination algorithm to the statistical information including considering each of a plurality of parameters derived from the statistical information and in accordance with the at least one categorization rule,and compare the selectivity score against a predefined selectivity threshold, wherein a selectivity score that exceeds the selectivity threshold is deemed highly selective; and
an algorithm training engine configured to produce each of the at least one trained selectivity determination algorithm based on application of a plurality of specially-selected categorization rules to a set of pre-categorized training data, wherein the application of each one of the specially-selected categorization rules to the set of training data produces at least one uniform grouping of objects in which the objects all meet a predefined similarity criterion, and wherein the trained selectivity determination algorithms are unrelated to the plurality of specially-selected categorization rules.
2 Assignments
0 Petitions
Accused Products
Abstract
Assessment of selectivity of categorization rules. One or more categorization rules are applied to a set of un-categorized objects to produce a categorization result set representing assignment of objects the set into at least two categories. A selectivity score for the at least one categorization rule is obtained based on statistical information. The numerical selectivity score represents an estimation of accuracy of the at least one categorization rule, and is produced as a result of application of at least one trained selectivity determination algorithm, which is based on application of a plurality of specially-selected categorization rules to a set of pre-categorized training data, with the application of each one producing a uniform grouping of objects.
9 Citations
22 Claims
-
1. A system for assessing the selectivity of categorization rules, the system comprising:
a computer system including at least one processor, a non-transitory data storage medium interfaced with the at least one processor, and input/output facilities, the data storage medium containing instructions that, when executed by the at least one processor, implement; a categorization rule application engine configured to apply at least one categorization rule to a set of un-categorized objects to produce a categorization result set representing assignment of objects of the set into at least two categories into which the objects of the set are divided when the categorization rule is applied, the categorization rule application engine further configured to gather statistical information relating to the categorization result set based on properties of objects assigned to each of the at least two categories, and including at least one rule-specific aggregating statistic characterizing the application of to the categorization rule to all of the objects and at least one categorization-specific statistic characterizing the objects of one of the at least two categories; a selectivity determination engine configured to assess a numerical selectivity score for the at least one categorization rule, the numerical selectivity score representing an estimation of selectivity accuracy of the at least one categorization rule to provide an evaluation of the at least one categorization rule, the numerical selectivity score being calculated by the application of at least one trained selectivity determination algorithm to the statistical information including the at least one rule-specific aggregating statistic representing information on the set of files belonging to each of the categories defined in the categorization rule, the application of the at least one trained selectivity determination algorithm to the statistical information including considering each of a plurality of parameters derived from the statistical information and in accordance with the at least one categorization rule, and compare the selectivity score against a predefined selectivity threshold, wherein a selectivity score that exceeds the selectivity threshold is deemed highly selective; and an algorithm training engine configured to produce each of the at least one trained selectivity determination algorithm based on application of a plurality of specially-selected categorization rules to a set of pre-categorized training data, wherein the application of each one of the specially-selected categorization rules to the set of training data produces at least one uniform grouping of objects in which the objects all meet a predefined similarity criterion, and wherein the trained selectivity determination algorithms are unrelated to the plurality of specially-selected categorization rules. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 21)
-
11. A machine-implemented method for assessing the selectivity of categorization rules, the method comprising:
-
autonomously applying at least one categorization rule to a set of un-categorized objects to produce a categorization result set representing assignment of objects of the set into at least two categories into which the objects of the set are divided when the categorization rule is applied; autonomously gathering statistical information relating to the categorization result set based on properties of objects assigned to each of the at least two categories, and including at least one rule-specific aggregating statistic characterizing the application of to the categorization rule to all of the objects and at least one categorization-specific statistic characterizing the objects of one of the at least two categories; autonomously assessing a numerical selectivity score for the at least one categorization rule, the numerical selectivity score representing an estimation of selectivity accuracy of the at least one categorization rule to provide an evaluation of the at least one categorization rule, the numerical selectivity score being calculated by the application of at least one trained selectivity determination algorithm to the statistical information including the at least one rule-specific aggregating statistic representing information on the set of files belonging to each of the categories defined in the categorization rule, the application of the at least one trained selectivity determination algorithm to the statistical information including considering each of a plurality of parameters derived from the statistical information and in accordance with the at least one categorization rule; autonomously producing each of the at least one trained selectivity determination algorithm based on application of a plurality of specially-selected categorization rules to a set of pre-categorized training data, wherein the application of each one of the specially-selected categorization rules to the set of training data produces at least one uniform grouping of objects in which the objects all meet a predefined similarity criterion, and wherein the trained selectivity determination algorithms are unrelated to the plurality of specially-selected categorization rules and comparing the selectivity score against a predefined selectivity threshold, wherein a selectivity score that exceeds the selectivity threshold is deemed highly selective. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 22)
-
-
20. A system for assessing the selectivity of categorization rules, the method comprising:
-
means for applying at least one categorization rule to a set of un-categorized objects to produce a categorization result set representing assignment of objects of the set into at least two categories into which the objects of the set are divided when the categorization rule is applied and for gathering statistical information relating to the categorization result set based on properties of objects assigned to each of the at least two categories, and including at least one rule-specific aggregating statistic characterizing the application of to the categorization rule to all of the objects and at least one categorization-specific statistic characterizing the objects of one of the at least two categories; means for assessing a numerical selectivity score for the at least one categorization rule, the numerical selectivity score representing an estimation of selectivity accuracy of the at least one categorization rule to provide an evaluation of the at least one categorization rule, the numerical selectivity score being calculated by the application of at least one trained selectivity determination algorithm to the statistical information including the at least one rule-specific aggregating statistic representing information on the set of files belonging to each of the categories defined in the categorization rule, the application of the at least one trained selectivity determination algorithm to the statistical information including considering each of a plurality of parameters derived from the statistical information and in accordance with the at least one categorization rule, and comparing the selectivity score against a predefined selectivity threshold, wherein a selectivity score that exceeds the selectivity threshold is deemed highly selective; and means for producing each of the at least one trained selectivity determination algorithm based on application of a plurality of specially-selected categorization rules to a set of pre-categorized training data, wherein the application of each one of the specially-selected categorization rules to the set of training data produces at least one uniform grouping of objects in which the objects all meet a predefined similarity criterion, and wherein the trained selectivity determination algorithms are unrelated to the plurality of specially-selected categorization rules.
-
Specification