×

Declarative framework for deduplication

  • US 8,200,640 B2
  • Filed: 06/15/2009
  • Issued: 06/12/2012
  • Est. Priority Date: 06/15/2009
  • Status: Active Grant
First Claim
Patent Images

1. A method for collective deduplication of entity references in data records stored in a database, the method comprising:

  • executing, on a processor, an execution unit that implements a declarative deduplication language using a clustering algorithm and by accessing the database through a database server, wherein the declarative deduplication language is not a Structured Query Language, the execution unit receiving and executing arbitrary programs in the declarative deduplication language;

    accessing one or more relational tables of the database containing data records, where the data records contain references to varying real-world entities, and where the references include a plurality of sets of two or more entity references that are duplicates, wherein duplicates comprise references that have different respective textual representations of a same real-world entity;

    receiving entity-reference declarative program code of the declarative deduplication language that specifies entity references in the relational tables that are to be deduplicated;

    receiving constraint-specifying declarative program code of the declarative deduplication language that specifies one or more constraints that a deduplication of the entity references should satisfy; and

    generating output by the execution unit executing the entity-reference declarative program code and the constraint-specifying declarative program code, the output comprising one or more deduplication relations that identify whether or not two entity references are duplicates, and which satisfy the one or more constraints specified in the constraint-specifying declarative program code, wherein each output deduplication relation is an equivalence relation, wherein each equivalence relation partitions the output into corresponding disjoint subsets.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×