Multisource semantic partitioning

US 11,157,473 B2
Filed: 11/21/2014
Issued: 10/26/2021
Est. Priority Date: 11/21/2014
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method for federated query processing comprising:

receiving one or more source queries associated with a data set;

storing the one or more source queries as one or more historical queries;

storing at least one statistic for each of the one or more historical queries, the at least one statistic including a size for each of the one or more historical queries;

determining one or more column constant pairs associated with the one or more historical queries, each column constant pair identifying a column and a corresponding value;

based on the one or more column constant pairs, determining a partitioning column constant pair,wherein the one or more column constant pairs corresponding to a first subset of the one or more historical queries have a first pre-defined relation to the partitioning column constant pair, the first pre-defined relation having a size difference between +10% or −

10%,wherein the one or more column constant pairs corresponding to a second subset of the one or more historical queries have a second pre-defined relation to the partitioning column constant pair, the second pre-defined relation having a size difference of less than 10%, andwherein the first subset of the one or more historical queries is within a pre-determined size corresponding to the second subset of the one or more historical queries;

based on the determined partitioning column constant pair, partitioning the data set into a first subset of the data set and a second subset of the data set;

storing, in a data store, associations between the determined partitioning column constant pair and both of the first subset of the data set and the second subset of the data set;

after the partitioning, determining a source column constant pair associated with a received source query;

comparing the source column constant pair to the partitioning column constant pair;

based on the comparing, generating a result corresponding to the received source query from at least one of the following;

a view, the first subset of the data set, and the second subset of the data set; and

joining the first subset of the data set and the second subset of the data set when the one or more historical queries including an “

or”

operator are determined to have a greater size than the one or more historical queries including an “

and”

operator.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods, systems, and computer program products for processing a query to determine query results. The query may be analyzed to determine a constant column pair corresponding to the query. The column constant pair may be analyzed with respect to a column constant pair associated with a partitioned data set in order to route the query to a subset of the data set. Data sets may be partitioned into subsets by analyzing historical queries to determine a partitioning column constant pair with respect to the data set that is used to partition the data of the data set into subsets. The query processing may include both query routing and data set partitioning.

0 Citations

17 Claims

1. A computer-implemented method for federated query processing comprising:
- receiving one or more source queries associated with a data set;
  
  storing the one or more source queries as one or more historical queries;
  
  storing at least one statistic for each of the one or more historical queries, the at least one statistic including a size for each of the one or more historical queries;
  
  determining one or more column constant pairs associated with the one or more historical queries, each column constant pair identifying a column and a corresponding value;
  
  based on the one or more column constant pairs, determining a partitioning column constant pair,wherein the one or more column constant pairs corresponding to a first subset of the one or more historical queries have a first pre-defined relation to the partitioning column constant pair, the first pre-defined relation having a size difference between +10% or −
  
  10%,wherein the one or more column constant pairs corresponding to a second subset of the one or more historical queries have a second pre-defined relation to the partitioning column constant pair, the second pre-defined relation having a size difference of less than 10%, andwherein the first subset of the one or more historical queries is within a pre-determined size corresponding to the second subset of the one or more historical queries;
  
  based on the determined partitioning column constant pair, partitioning the data set into a first subset of the data set and a second subset of the data set;
  
  storing, in a data store, associations between the determined partitioning column constant pair and both of the first subset of the data set and the second subset of the data set;
  
  after the partitioning, determining a source column constant pair associated with a received source query;
  
  comparing the source column constant pair to the partitioning column constant pair;
  
  based on the comparing, generating a result corresponding to the received source query from at least one of the following;
  
  a view, the first subset of the data set, and the second subset of the data set; and
  
  joining the first subset of the data set and the second subset of the data set when the one or more historical queries including an “
  
  or”
  
  operator are determined to have a greater size than the one or more historical queries including an “
  
  and”
  
  operator.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1 wherein (i) the first pre-defined relation is a “
    - less than or equal to”
      
      relation and the second pre-defined relation is a “
      
      greater than”
      
      relation or (ii) the first pre-defined relation is a “
      
      less than”
      
      relation and the second pre-defined relation is a “
      
      greater than or equal to”
      
      relation.
  - 3. The method of claim 1, further comprising at least one of:
    - routing the received source query to the view if the source column constant pair is determined to be incomparable to the partitioning column constant pair;
      
      routing the received source query to the first subset of the data set if the source column constant pair is determined to have the first pre-defined relation to the partitioning column constant pair;
      
      orrouting the received source query to the second subset of the data set if the source column constant pair is determined to have the second pre-defined relation to the partitioning column constant pair.
  - 4. The method of claim 1, wherein the first subset of the data set is located on a first data source and the second subset of the data set is located on a second data source that is different than the first data source.
  - 5. The method of claim 1, wherein the partitioning column constant pair includes a tuple comprising at least one column identifier and at least one constant corresponding to the at least one column identifier.
  - 6. The method of claim 1 further comprising:
    - determining a third subset of the one or more column constant pain, wherein the third subset is determined to be incomparable to the partitioning column constant pair.
  - 7. The method of claim 1, wherein the first subset of the data set has the first pre-defined relation with respect to the partitioning column constant pair, and wherein the second subset of the data set has the second pre-defined relation with respect to the partitioning column constant pair.

8. A non-transitory computer-readable medium for query processing comprising computer-readable instructions, the computer-readable instructions executable by one or more processors to perform operations comprising:
- receiving one or more source queries associated with the data set;
  
  storing the one or more source queries as the one or more historical queries;
  
  storing at least one statistic for each of the one or more historical queries, the at least one statistic including a size for each of the one or more historical queries;
  
  determining a partitioning column constant pair associated with a data set, the partitioning column constant pair identifying a column and a corresponding value, the determining includingidentifying a first set of one or more column constant pairs having a first pre-defined relation to the partitioning column constant pair, wherein the first set corresponds to a first subset of one or more historical queries, the first pre-defined relation having a size difference between +10% or −
  
  10%;
  
  identifying a second set of one or more column constant pairs having a second pre-defined relation to the partitioning column constant pair, wherein the second set corresponds to a second subset of the one or more historical queries, the second pre-defined relation having a size difference of less than 10%, andwherein the first subset of the one or more historical queries is within a pre-determined size of the second subset of the one or more historical queries;
  
  based on the partitioning column constant pair, partitioning the data set into a first subset of the data set and a second subset of the data set;
  
  storing, in a data store, associations between the determined partitioning column constant pair and both of the first subset of the data set and the second subset of the data set;
  
  after determining the partitioning column constant pair, determining a source column constant pair associated with a received source query;
  
  comparing the source column constant pair to the partitioning column constant pair;
  
  based on the comparing, determining a result of the received source query from at least one of the following;
  
  a view, a first subset of the data set, and a second subset of the data set; and
  
  joining the first subset of the data set and the second subset of the data set when the one or more historical queries including an “
  
  or”
  
  operator are determined to have a greater size than the one or more historical queries including an “
  
  and”
  
  operator.
- View Dependent Claims (9, 10, 11, 12)
- - 9. The non-transitory computer-readable medium of claim 8, wherein the partitioning column constant pair includes a tuple comprising one or more column identifiers and one or more constants corresponding to the column identifiers.
  - 10. The non-transitory computer-readable medium of claim 8, wherein (i) the first pre-defined relation is a “
    - less than or equal to”
      
      relation and the second pre-defined relation is a “
      
      greater than”
      
      relation or (ii) the first pre-defined relation is a “
      
      less than”
      
      relation and the second pre-defined relation is a “
      
      greater than or equal to”
      
      relation.
  - 11. The non-transitory computer-readable medium of claim 8, the operations further comprising at least one of:
    - routing the received source query to the view if the source column constant pair is determined to be incomparable to the partitioning column constant pair;
      
      routing the received source query to the first subset of the data set if the source column constant pair is determined to have the first pre-defined relation to the partitioning column constant pair;
      
      orrouting the received source query to the second subset of the data set if the source column constant pair is determined to have the second pre-defined relation to the partitioning column constant pair.
  - 12. The medium of claim 8, the operations further comprising:
    - determining a third subset of the one or more column constant pairs, wherein the third subset is incomparable to the partitioning column constant pair.

13. A federated system for query processing, comprising:
- at least one processor in communication with a memory;
  
  a multisource partitioner executable by the at least one processor to perform operations comprising;
  
  storing at least one statistic for each of one or more historical queries, the at least one statistic including a size for each of the one or more historical queries;
  
  determining one or more column constant pairs associated with the one or more historical queries, each column constant pair identifying a column and a corresponding value; and
  
  based on the one or more column constant pairs, determining a partitioning column constant pair,wherein the one or more column constant pairs corresponding to a first subset of the one or more historical queries have a first pre-defined relation to the partitioning column constant pair, the first pre-defined relation having a size difference between +10% or −
  
  10%,wherein the one or more column constant pairs corresponding to a second subset of the one or more historical queries have a second pre-defined relation to the partitioning column constant pair, the second pre-defined relation having a size difference of less than 10%, andwherein the first subset of the one or more historical queries is within a pre-determined size corresponding to the second subset of the one or more historical queries;
  
  based on the partitioning column constant pair, partitioning the data set into the first subset of the data set and the second subset of the data set;
  
  storing, in a data store, associations between the determined partitioning column constant pair and both of the first subset of the data set and the second subset of the data set; and
  
  a source router communicatively coupled to one or more data sources, the source router executable by the at least one processor to perform operations comprising;
  
  determining a source column constant pair associated with a received source query;
  
  comparing the source column constant pair to the partitioning column constant pair;
  
  based on the comparing, determining a result of the source query from at least one of the following;
  
  a view, a first subset of the data set that is stored on a first data source of the one or more data sources, and a second subset of the data set that is stored on a second data source of the one or more data sources; and
  
  joining the first subset of the data set and the second subset of the data set when the one or more historical queries including an “
  
  or”
  
  operator are determined to have a greater size than the one or more historical queries including an “
  
  and”
  
  operator.
- View Dependent Claims (14, 15, 16, 17)
- - 14. The federated system of claim 13, wherein the partitioning column constant pair includes a tuple comprising one or more column identifiers and one or more constants corresponding to the column identifiers.
  - 15. The federated system of claim 13, wherein (i) the first pre-defined relation is a “
    - less than or equal to”
      
      relation and the second pre-defined relation is a “
      
      greater than”
      
      relation or (ii) the first pre-defined relation is a “
      
      less than”
      
      relation and the second pre-defined relation is a “
      
      greater than or equal to”
      
      relation.
  - 16. The federated system of claim 13, wherein the source router—
    - further performs operations comprising at least one of;
      
      routing the received source query to the view if the source column constant pair is determined to be incomparable to the partitioning column constant pair;
      
      routing the received source query to the first subset of the data set if the source column constant pair is determined to have the first pre-defined relation to the partitioning column constant pair;
      
      orrouting the received source query to the second subset of the data set if the source column constant pair associated with a source query is determined to have the second pre-defined relation to the partitioning column constant pair.
  - 17. The federated system of claim 13, the multisource partitioner further performing operations comprising:
    - determining a third subset of the one or more column constant pairs, wherein the third subset is determined to be incomparable to the partitioning column constant pair.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Red Hat, Inc. (International Business Machines Corporation)
Original Assignee
Red Hat, Inc. (International Business Machines Corporation)
Inventors
Nguyen, Filip, Elias, Filip
Primary Examiner(s)
Mofiz, Apu M
Assistant Examiner(s)
Samara, Husam Turki

Application Number

US14/550,166
Publication Number

US 20160147837A1
Time in Patent Office

2,531 Days
Field of Search
US Class Current
CPC Class Codes

G06F 16/22 Indexing; Data structures t...

G06F 16/245 Query processing

Multisource semantic partitioning

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

0 Citations

17 Claims

Specification

Solutions

Use Cases

Quick Links

Multisource semantic partitioning

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

0 Citations

17 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links