×

System and method for assessing categorization rule selectivity

  • US 9,501,742 B2
  • Filed: 09/25/2014
  • Issued: 11/22/2016
  • Est. Priority Date: 12/05/2013
  • Status: Active Grant
First Claim
Patent Images

1. A system for assessing the selectivity of categorization rules, the system comprising:

  • a computer system including at least one processor, a non-transitory data storage medium interfaced with the at least one processor, and input/output facilities, the data storage medium containing instructions that, when executed by the at least one processor, implement;

    a categorization rule application engine configured to apply at least one categorization rule to a set of un-categorized objects to produce a categorization result set representing assignment of objects of the set into at least two categories into which the objects of the set are divided when the categorization rule is applied, the categorization rule application engine further configured to gather statistical information relating to the categorization result set based on properties of objects assigned to each of the at least two categories, and including at least one rule-specific aggregating statistic characterizing the application of to the categorization rule to all of the objects and at least one categorization-specific statistic characterizing the objects of one of the at least two categories;

    a selectivity determination engine configured toassess a numerical selectivity score for the at least one categorization rule, the numerical selectivity score representing an estimation of selectivity accuracy of the at least one categorization rule to provide an evaluation of the at least one categorization rule, the numerical selectivity score being calculated by the application of at least one trained selectivity determination algorithm to the statistical information including the at least one rule-specific aggregating statistic representing information on the set of files belonging to each of the categories defined in the categorization rule, the application of the at least one trained selectivity determination algorithm to the statistical information including considering each of a plurality of parameters derived from the statistical information and in accordance with the at least one categorization rule,and compare the selectivity score against a predefined selectivity threshold, wherein a selectivity score that exceeds the selectivity threshold is deemed highly selective; and

    an algorithm training engine configured to produce each of the at least one trained selectivity determination algorithm based on application of a plurality of specially-selected categorization rules to a set of pre-categorized training data, wherein the application of each one of the specially-selected categorization rules to the set of training data produces at least one uniform grouping of objects in which the objects all meet a predefined similarity criterion, and wherein the trained selectivity determination algorithms are unrelated to the plurality of specially-selected categorization rules.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×