MULTISOURCE SEMANTIC PARTITIONING
First Claim
1. A computer-implemented method for federated query processing comprising:
- receiving one or more source queries associated with a data set;
storing the one or more source queries as one or more historical queries;
determining one or more column constant pairs associated with the one or more historical queries;
based on the one or more column constant pairs, determining a partitioning column constant pair;
determining a first subset of the one or more column constant pairs that has a first pre-defined relation to the partitioning column constant pair;
determining a second subset of the one or more column constant pairs that has a second pre-defined relation to the partitioning column constant pair;
based on the partitioning column constant pair, partitioning the data set into a first subset of the data set and a second subset of the data set;
receiving a source query;
determining a source column constant pair associated with the source query;
comparing the source column constant pair to the partitioning column constant pair; and
based on the comparing, generating a result of the source query from at least one of the following;
a view, the first subset of the data set, and the second subset of the data set.
1 Assignment
0 Petitions
Accused Products
Abstract
Methods, systems, and computer program products for processing a query to determine query results. The query may be analyzed to determine a constant column pair corresponding to the query. The column constant pair may be analyzed with respect to a column constant pair associated with a partitioned data set in order to route the query to a subset of the data set. Data sets may be partitioned into subsets by analyzing historical queries to determine a partitioning column constant pair with respect to the data set that is used to partition the data of the data set into subsets. The query processing may include both query routing and data set partitioning.
84 Citations
20 Claims
-
1. A computer-implemented method for federated query processing comprising:
-
receiving one or more source queries associated with a data set; storing the one or more source queries as one or more historical queries; determining one or more column constant pairs associated with the one or more historical queries; based on the one or more column constant pairs, determining a partitioning column constant pair; determining a first subset of the one or more column constant pairs that has a first pre-defined relation to the partitioning column constant pair; determining a second subset of the one or more column constant pairs that has a second pre-defined relation to the partitioning column constant pair; based on the partitioning column constant pair, partitioning the data set into a first subset of the data set and a second subset of the data set; receiving a source query; determining a source column constant pair associated with the source query; comparing the source column constant pair to the partitioning column constant pair; and based on the comparing, generating a result of the source query from at least one of the following;
a view, the first subset of the data set, and the second subset of the data set. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A non-transitory computer-readable medium for query processing comprising computer-readable instructions, the computer-readable instructions executable by a processor to cause the processor to:
-
receive a source query; determine a source column constant pair associated with the source query; determine a partitioning column constant pair associated with a data set; compare the source column constant pair to the partitioning column constant pair; and based on the comparison, determine a result of the source query from at least one of the following;
a view, a first subset of the data set, and a second subset of the data set. - View Dependent Claims (10, 12, 13, 14)
-
-
11. The medium of 9 wherein the partitioning column constant pair is a tuple comprising one or more column identifiers and one or more constants corresponding to the column identifiers.
-
15. A federated system for query processing, comprising:
-
at least one processor in communication with a memory; a source router communicatively coupled to one or more data sources; the source router executable by the at least one processor to; receive a source query; determine a source column constant pair associated with the source query; compare the source column constant pair to a partitioning column constant pair that is associated with a data set; and based on the comparison, determine a result of the source query from at least one of the following;
a view, a first subset of the data set that is stored on a first data source of the one or more data sources, and a second subset of the data set that is stored on a second data source of the one or more data sources. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification