System and method for accessing heterogeneous databases
First Claim
1. A method for answering a query containing a join operation, the method comprising:
- representing each entry in a column of a first relation by a vector;
representing each entry in a column of a second relation by a vector;
selecting a subset of rows of the first relation;
for each of the entries in the column of the first relation that is part of a row from the subset of rows;
determining the value of a similarity metric function that is based upon the vector representing the entry in the column of the first relation and a vector representing an entry from the column of the second relation, for each entry in the column of the second relation;
joining the first relation with the second relation based upon the set of similanty metric function values determined; and
outputting the result of the joining.
3 Assignments
0 Petitions
Accused Products
Abstract
A system and method are provided for answering queries concerning information stored in a set of collections. Each collection includes a structured entity, and each structured entity includes a field. A query is received that specifies a subset of the set of collections and a logical constraint between fields that includes a requirement that a first field match a second field. The probability that the first field matches the second field is determined automatically based upon the contents of the fields. A collection of lists is generated in response to the query, where each list includes members of the subset of collections specified in the query, and where each list has an estimate of the probability that the members of the list satisfies the logical constraint specified in the query.
98 Citations
28 Claims
-
1. A method for answering a query containing a join operation, the method comprising:
-
representing each entry in a column of a first relation by a vector;
representing each entry in a column of a second relation by a vector;
selecting a subset of rows of the first relation;
for each of the entries in the column of the first relation that is part of a row from the subset of rows;
determining the value of a similarity metric function that is based upon the vector representing the entry in the column of the first relation and a vector representing an entry from the column of the second relation, for each entry in the column of the second relation;
joining the first relation with the second relation based upon the set of similanty metric function values determined; and
outputting the result of the joining.
-
-
2. A method for answering a query containing a join operation, the method comprising:
-
representing each entry in a field of a first relation by a vector;
representing each entry in a field of a second relation by a vector;
selecting a subset of tuples of the first relation;
for each of the entries in the field of the first relation that is part of a tuple from the subset of tuples;
determining the value of a similarity metric function that is based upon the vector representing the entry in the field of the first relation and a vector representing an entry from the field of the second relation, for each entry in the field of the second relation;
joining the first relation with the second relation based upon the set of similarity metric function values determined; and
outputting the result of the joining. - View Dependent Claims (3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
wherein vi is the ith component of a vector v corresponding to an entry in the field from the first relation, and wi is the ith component of vector w corresponding to an entry in the field from the second relation.
-
-
13. The method of claim 2 wherein the query is probabilistically answered using the A* search algorithm.
-
14. A medium storing instructions for answering a query containing a join operation, the instructions adapted to be executed by a processor, the instructions including:
-
representing each entry in a field of a first relation by a vector;
representing each entry in a field of a second relation by a vector;
selecting a subset of tuples of the first relation;
for each of the entries in the field of the first relation that is part of a tuple from the subset of tuples;
determining the value of a similarity metric function that is based upon the vector representing the entry in the field of the first relation and a vector representing an entry from the field of the second relation, for each entry in the field of the second relation;
joining the first relation with the second relation based upon the set of similarity metric function values determined; and
outputting the result of the joining. - View Dependent Claims (16, 17, 18, 19, 20, 21, 22, 23, 24, 25)
wherein vi is the ith component of a vector v corresponding to an entry in the field from the first relation, and wi is the ith component of vector w corresponding to an entry in the field from the second relation.
-
-
25. The medium of claim 14 wherein the query is probabilistically answered using the A* search algorithm.
-
15. The medium 14 wherein the joining is based on the N highest values in a subset of the set of similarity metric function values determined, where N is an integer.
-
26. An apparatus for answering a query containing a join operation, the apparatus comprising:
-
a processor; and
a memory that stores instructions adapted to be executed by a processor, the instructions including;
representing each entry in a field of a first relation by a vector;
representing each entry in a field of a second relation by a vector;
selecting a subset of tuples of the first relation;
for each of the entries in the field of the first relation that is part of a tuple from the subset of tuples;
determining the value of a similarity metric function that is based upon the vector representing the entry in the field of the first relation and a vector representing an entry from the field of the second relation, for each entry in the field of the second relation;
joining the first relation with the second relation based upon the set of similarity metric function values determined; and
outputting the result of the joining.
-
-
27. A method for performing a join operation, the method comprising the steps of:
-
representing each entry in a field of a first relation and a field of a second relation as a vector;
evaluating a similarity metric function on a plurality of pairs of vectors, wherein each pair from the plurality of pairs of vectors includes a vector from the field of the first relation and a vector from the field of the second relation;
determining whether to join a tuple from the first relation and a tuple from the second relation based on i) the value of the similarity function evaluated on the vector corresponding to the entry that is in the tuple from the first relation and the field of the first relation, and the vector corresponding to the entry that is in the tuple from the second relation and the field of the second relation; and
ii) the values for the similarity metric function obtained in the evaluation step; and
outputting the tuple obtained by joining the tuple from the first relation and the tuple from the second relation. - View Dependent Claims (28)
-
Specification