×

DECLARATIVE FRAMEWORK FOR DEDUPLICATION

  • US 20100318499A1
  • Filed: 06/15/2009
  • Published: 12/16/2010
  • Est. Priority Date: 06/15/2009
  • Status: Active Grant
First Claim
Patent Images

1. A method for collective deduplication of entity references in data records stored in a database, the method comprising:

  • accessing one or more relational tables of the database containing data records, where the data records contain references to varying real-world entities, and where the references include a plurality of sets of two or more entity references that are duplicates, wherein duplicates comprise references that have different respective textual representations of a same real-world entity;

    receiving entity-reference declarative program code that declaratively specifies entity references in the relational tables that are to be deduplicated;

    receiving constraint-specifying declarative program code that declaratively specifies one or more constraints that a deduplication of the entity references should satisfy; and

    generating output by executing on a processor the entity-reference declarative program code and the constraint-specifying declarative program code, the output comprising one or more deduplication relations that identify whether or not two entity references are duplicates, and which satisfy the one or more constraints specified in the constraint-specifying declarative program code, wherein each output deduplication relation is an equivalence relation, wherein each equivalence relation partitions the output into corresponding disjoint subsets

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×