Multisource semantic partitioning
First Claim
Patent Images
1. A computer-implemented method for federated query processing comprising:
- receiving one or more source queries associated with a data set;
storing the one or more source queries as one or more historical queries;
storing at least one statistic for each of the one or more historical queries, the at least one statistic including a size for each of the one or more historical queries;
determining one or more column constant pairs associated with the one or more historical queries, each column constant pair identifying a column and a corresponding value;
based on the one or more column constant pairs, determining a partitioning column constant pair,wherein the one or more column constant pairs corresponding to a first subset of the one or more historical queries have a first pre-defined relation to the partitioning column constant pair, the first pre-defined relation having a size difference between +10% or −
10%,wherein the one or more column constant pairs corresponding to a second subset of the one or more historical queries have a second pre-defined relation to the partitioning column constant pair, the second pre-defined relation having a size difference of less than 10%, andwherein the first subset of the one or more historical queries is within a pre-determined size corresponding to the second subset of the one or more historical queries;
based on the determined partitioning column constant pair, partitioning the data set into a first subset of the data set and a second subset of the data set;
storing, in a data store, associations between the determined partitioning column constant pair and both of the first subset of the data set and the second subset of the data set;
after the partitioning, determining a source column constant pair associated with a received source query;
comparing the source column constant pair to the partitioning column constant pair;
based on the comparing, generating a result corresponding to the received source query from at least one of the following;
a view, the first subset of the data set, and the second subset of the data set; and
joining the first subset of the data set and the second subset of the data set when the one or more historical queries including an “
or”
operator are determined to have a greater size than the one or more historical queries including an “
and”
operator.
1 Assignment
0 Petitions
Accused Products
Abstract
Methods, systems, and computer program products for processing a query to determine query results. The query may be analyzed to determine a constant column pair corresponding to the query. The column constant pair may be analyzed with respect to a column constant pair associated with a partitioned data set in order to route the query to a subset of the data set. Data sets may be partitioned into subsets by analyzing historical queries to determine a partitioning column constant pair with respect to the data set that is used to partition the data of the data set into subsets. The query processing may include both query routing and data set partitioning.
0 Citations
17 Claims
-
1. A computer-implemented method for federated query processing comprising:
-
receiving one or more source queries associated with a data set; storing the one or more source queries as one or more historical queries; storing at least one statistic for each of the one or more historical queries, the at least one statistic including a size for each of the one or more historical queries; determining one or more column constant pairs associated with the one or more historical queries, each column constant pair identifying a column and a corresponding value; based on the one or more column constant pairs, determining a partitioning column constant pair, wherein the one or more column constant pairs corresponding to a first subset of the one or more historical queries have a first pre-defined relation to the partitioning column constant pair, the first pre-defined relation having a size difference between +10% or −
10%,wherein the one or more column constant pairs corresponding to a second subset of the one or more historical queries have a second pre-defined relation to the partitioning column constant pair, the second pre-defined relation having a size difference of less than 10%, and wherein the first subset of the one or more historical queries is within a pre-determined size corresponding to the second subset of the one or more historical queries; based on the determined partitioning column constant pair, partitioning the data set into a first subset of the data set and a second subset of the data set; storing, in a data store, associations between the determined partitioning column constant pair and both of the first subset of the data set and the second subset of the data set; after the partitioning, determining a source column constant pair associated with a received source query; comparing the source column constant pair to the partitioning column constant pair; based on the comparing, generating a result corresponding to the received source query from at least one of the following;
a view, the first subset of the data set, and the second subset of the data set; andjoining the first subset of the data set and the second subset of the data set when the one or more historical queries including an “
or”
operator are determined to have a greater size than the one or more historical queries including an “
and”
operator. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A non-transitory computer-readable medium for query processing comprising computer-readable instructions, the computer-readable instructions executable by one or more processors to perform operations comprising:
-
receiving one or more source queries associated with the data set; storing the one or more source queries as the one or more historical queries; storing at least one statistic for each of the one or more historical queries, the at least one statistic including a size for each of the one or more historical queries; determining a partitioning column constant pair associated with a data set, the partitioning column constant pair identifying a column and a corresponding value, the determining including identifying a first set of one or more column constant pairs having a first pre-defined relation to the partitioning column constant pair, wherein the first set corresponds to a first subset of one or more historical queries, the first pre-defined relation having a size difference between +10% or −
10%;identifying a second set of one or more column constant pairs having a second pre-defined relation to the partitioning column constant pair, wherein the second set corresponds to a second subset of the one or more historical queries, the second pre-defined relation having a size difference of less than 10%, and wherein the first subset of the one or more historical queries is within a pre-determined size of the second subset of the one or more historical queries; based on the partitioning column constant pair, partitioning the data set into a first subset of the data set and a second subset of the data set; storing, in a data store, associations between the determined partitioning column constant pair and both of the first subset of the data set and the second subset of the data set; after determining the partitioning column constant pair, determining a source column constant pair associated with a received source query; comparing the source column constant pair to the partitioning column constant pair; based on the comparing, determining a result of the received source query from at least one of the following;
a view, a first subset of the data set, and a second subset of the data set; andjoining the first subset of the data set and the second subset of the data set when the one or more historical queries including an “
or”
operator are determined to have a greater size than the one or more historical queries including an “
and”
operator. - View Dependent Claims (9, 10, 11, 12)
-
-
13. A federated system for query processing, comprising:
-
at least one processor in communication with a memory; a multisource partitioner executable by the at least one processor to perform operations comprising; storing at least one statistic for each of one or more historical queries, the at least one statistic including a size for each of the one or more historical queries; determining one or more column constant pairs associated with the one or more historical queries, each column constant pair identifying a column and a corresponding value; and based on the one or more column constant pairs, determining a partitioning column constant pair, wherein the one or more column constant pairs corresponding to a first subset of the one or more historical queries have a first pre-defined relation to the partitioning column constant pair, the first pre-defined relation having a size difference between +10% or −
10%,wherein the one or more column constant pairs corresponding to a second subset of the one or more historical queries have a second pre-defined relation to the partitioning column constant pair, the second pre-defined relation having a size difference of less than 10%, and wherein the first subset of the one or more historical queries is within a pre-determined size corresponding to the second subset of the one or more historical queries; based on the partitioning column constant pair, partitioning the data set into the first subset of the data set and the second subset of the data set; storing, in a data store, associations between the determined partitioning column constant pair and both of the first subset of the data set and the second subset of the data set; and a source router communicatively coupled to one or more data sources, the source router executable by the at least one processor to perform operations comprising; determining a source column constant pair associated with a received source query; comparing the source column constant pair to the partitioning column constant pair; based on the comparing, determining a result of the source query from at least one of the following;
a view, a first subset of the data set that is stored on a first data source of the one or more data sources, and a second subset of the data set that is stored on a second data source of the one or more data sources; andjoining the first subset of the data set and the second subset of the data set when the one or more historical queries including an “
or”
operator are determined to have a greater size than the one or more historical queries including an “
and”
operator. - View Dependent Claims (14, 15, 16, 17)
-
Specification