×

Example-driven design of efficient record matching queries

  • US 8,046,339 B2
  • Filed: 06/05/2007
  • Issued: 10/25/2011
  • Est. Priority Date: 06/05/2007
  • Status: Expired due to Fees
First Claim
Patent Images

1. A computer-implemented query system, comprising:

  • an input component configured to receive an example set of records from two input relations, the example set comprising;

    pairs of matching records that are labeled as examples of records that are considered a match between the two input relations; and

    pairs of non-matching records that are labeled as examples of records that are not considered a match between the two input relations; and

    a modeling component configured to;

    generate an operator tree based on the example set of records from the two input relations, wherein, to generate the operator tree, the modeling component is further configured to;

    map the pairs of matching records to positive points in a similarity space based on a similarity function;

    map the pairs of non-matching records to negative points in the similarity space based on the similarity function;

    generate one or more similarity joins of the operator tree, based on the positive points and the negative points in the similarity space;

    limit the operator tree to a maximum number of the similarity joins; and

    limit individual similarity joins of the operator tree to a maximum number of similarity function predicates; and

    generate a query based on the operator tree, the query being configured to identify individual matching records between the two input relations; and

    one or more processors configured to execute the input component or the modeling component.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×