×

Decision tree refinement

  • US 8,417,654 B1
  • Filed: 07/18/2012
  • Issued: 04/09/2013
  • Est. Priority Date: 09/22/2009
  • Status: Active Grant
First Claim
Patent Images

1. A method, comprising:

  • identifying, by at least one data processing device, split-rules and an initial training set of data records used to generate the split rules, the initial training set of data records including negative training pairs that each include at least two data records that have not been identified as duplicate data records, each training pair having match scores specifying a measure of similarity for attributes of training pairs;

    removing, by at least one data processing device, at least one clause from the split rules to generate initial trimmed rules, the removing being based at least in part on a threshold match score specifying a match score at which the initial training set is segmented;

    classifying, by at least one data processing device, the negative training pairs in the initial training set based on the match scores for the negative training pairs and the initial trimmed rules;

    removing, by at least one data processing device and based on the classification, negative training pairs that are classified as duplicate pairs from the initial training set to create a filtered training set;

    generating, by at least one data processing device, an intermediate decision tree with the filtered training set, the intermediate decision tree defining intermediate split-rules; and

    generating, by at least one data processing device, final split rules based on the intermediate split rules, the final split rules including at least one final split rule that differs from each of the intermediate split rules.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×