EXAMPLE-DRIVEN DESIGN OF EFFICIENT RECORD MATCHING QUERIES
First Claim
1. A computer-implemented query system, comprising:
- a input component for receiving an example set of matching records and non-matching records between two input relations; and
a modeling component for generating a query based on an operator tree that maximizes identification of the matching records between the two input relations.
2 Assignments
0 Petitions
Accused Products
Abstract
Example-driven creation of record matching queries. The disclosed architecture employs techniques that exploit the availability of positive (or matching) and negative (non-matching) examples to search through this space and suggest an initial record matching query. The record matching task is modeled as that of designing an operator tree obtained by composing a few primitive operators. This ensures that record matching programs be executable efficiently and scalably over large input relations. The architecture joins records across multiple (e.g., two) relations (e.g., R and S). The architecture exploits the monotonicity property of similarity functions for record matching in the relations, in that, any pair of matching records have a higher similarity value than non-matching record pairs on at least one similarity function.
28 Citations
20 Claims
-
1. A computer-implemented query system, comprising:
-
a input component for receiving an example set of matching records and non-matching records between two input relations; and a modeling component for generating a query based on an operator tree that maximizes identification of the matching records between the two input relations. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A computer-implemented method of providing a query, comprising:
-
receiving an example set of matching records and non-matching records of multiple relations; generating a bounded operator tree that represents a query which maximizes identification of the matching records between the relations; and quantifying quality of the operator tree based on monotonicity of the matching and non-matching records. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19)
-
-
20. A computer-implemented system, comprising:
-
computer-implemented means for generating a bounded operator tree based on an example set of matching records and non-matching records of multiple relations; computer-implemented means for quantifying quality of the operator tree based on monotonicity of the matching and non-matching records; and computer-implemented means for creating a query that maximizes identification of the matching records between the relations.
-
Specification