×

Method and system for large scale data curation

  • US 9,542,412 B2
  • Filed: 03/28/2014
  • Issued: 01/10/2017
  • Est. Priority Date: 03/28/2014
  • Status: Active Grant
First Claim
Patent Images

1. A computer implemented method for performing object linkage in computer memory on object pairs from one or more database storage sources, in order to separate said object pairs into linked object pairs and non-linked object pairs, comprising:

  • applying rules represented as a Boolean formula in disjunctive normal form (DNF) as shown in FORMULA 1 to said object pairs,wherein said FORMULA 1 is constructed with attribute similarity predicates, andwherein said FORMULA 1 is constructed such that most of said linked object pairs satisfy said FORMULA 1 while a minimal number of said non-linked object pairs satisfy said FORMULA 1; and

    generating initial rules in disjunctive normal form (DNF) based on collected statistics from said database storage sources, and based on hints from data experts, wherein said rules guarantee high recall and moderate precision, and wherein said hints consist of keys and anti-keys.

View all claims
  • 5 Assignments
Timeline View
Assignment View
    ×
    ×