System and method for batch query processing

US 9,262,476 B2
Filed: 01/10/2014
Issued: 02/16/2016
Est. Priority Date: 01/10/2014
Status: Active Grant

First Claim

Patent Images

1. A method of processing data source queries, the method comprising:

accumulating the data source queries in a query holding area of a query assistant running in a computer server;

separating the accumulated data source queries into a plurality of partitions, each of the partitions including data source queries with a respective from-type, each respective from-type being associated with a combination of storage tables accessed by each of the data source queries in a corresponding partition;

ordering the partitions;

ordering the accumulated data source queries within each of the partitions; and

processing the accumulated data source queries in an order based on the ordering of the partitions and the ordering of the data source queries within each of the partitions;

wherein ordering the accumulated data source queries within each of the partitions comprises;

processing a first data source query in a first partition selected from the partitions against a first test data set to determine a first result;

processing a second data source query in the first partition against the first test data set to determine a second result;

determining a first ordering metric based on the first result;

determining a second ordering metric based on the second result; and

ordering the first data source query and the second data source query based on the first ordering metric and the second ordering metric.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and method of batch query processing includes accumulating data queries in a query holding area of a query assistant running in a computer server, separating the accumulated data queries into a plurality of partitions, ordering the partitions, ordering the accumulated data queries within each of the partitions, and processing the accumulated data queries in an order based on the ordering of the partitions and the ordering of the data queries within each of the partitions. Each of the partitions includes data queries with a respective from-type. Each respective from-type is associated with a combination of storage tables accessed by each of the data queries in a corresponding partition. In some examples, ordering the accumulated data queries within each of the partitions includes processing the data queries in each partition against a test data set and ordering the data queries based on results of the processing.

Citations

18 Claims

1. A method of processing data source queries, the method comprising:
- accumulating the data source queries in a query holding area of a query assistant running in a computer server;
  
  separating the accumulated data source queries into a plurality of partitions, each of the partitions including data source queries with a respective from-type, each respective from-type being associated with a combination of storage tables accessed by each of the data source queries in a corresponding partition;
  
  ordering the partitions;
  
  ordering the accumulated data source queries within each of the partitions; and
  
  processing the accumulated data source queries in an order based on the ordering of the partitions and the ordering of the data source queries within each of the partitions;
  
  wherein ordering the accumulated data source queries within each of the partitions comprises;
  
  processing a first data source query in a first partition selected from the partitions against a first test data set to determine a first result;
  
  processing a second data source query in the first partition against the first test data set to determine a second result;
  
  determining a first ordering metric based on the first result;
  
  determining a second ordering metric based on the second result; and
  
  ordering the first data source query and the second data source query based on the first ordering metric and the second ordering metric.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. The method of claim 1, further comprising inserting the accumulated data source queries in a first-in first-out queue based on the ordering of the partitions and the ordering of the data source queries within each of the partitions.
  - 3. The method of claim 1 wherein accumulating the data source queries continues until an accumulation threshold is reached, the accumulation threshold being based on a combination of one or more criteria selected from a group consisting of a number of data source queries that are accumulated in the query holding area, a predetermined period of time, and whether a queue containing previously accumulated data source queries is empty.
  - 4. The method of claim 1 wherein ordering the partitions comprises ordering the partitions based on at least a number of queries in each partition.
  - 5. The method of claim 1 wherein ordering the partitions comprises ordering the partitions based on at least a number of entries stored in storage tables associated with each respective from-type associated with each of the partitions.
  - 6. The method of claim 1 wherein ordering the partitions comprises ordering the partitions based on at least a size of results from recent queries with a same from-type as the respective from-type associated with each of the partitions.
  - 7. The method of claim 1 wherein:
    - each respective from-type is a subset of a set of storage tables accessed by the accumulated data source queries; and
      
      ordering the partitions comprises ordering the partitions based on at least a subset ordering among from-types associated with the partitions.
  - 8. The method of claim 1 wherein the first ordering metric is based on a number of entries in the first result and the second ordering metric is based on a number of entries in the second result.
  - 9. The method of claim 1 wherein the first ordering metric is based on a number of entries included in the first result that are also included in the second result.
  - 10. The method of claim 1, further comprising generating the first test data set pseudo-randomly based on a first from-type associated with the first partition.
  - 11. The method of claim 1 wherein ordering the accumulated data source queries within each of the partitions further comprises:
    - processing the first data source query against a second test data set to determine a third result;
      
      processing the second data source query against the second test data set to determine a fourth result;
      
      determining a third ordering metric based on the third result;
      
      determining a fourth ordering metric based on the fourth result; and
      
      ordering the first data source query and the second data source query based on an aggregation of the first and third ordering metrics and an aggregation of the second and fourth ordering metrics.
  - 12. The method of claim 1 wherein ordering the accumulated data source queries within each of the partitions further comprises:
    - simplifying the first data source query before processing the first data source query against the first test data set; and
      
      simplifying the second data source query before processing the second data source query against the first test data set.

13. A query assistant hosted in an application server executing on a server comprising a processor and memory, the query assistant comprising:
- a query manager;
  
  a data set generator coupled to the query manager;
  
  a query holding area coupled to the query manager;
  
  a query queue coupled to the query manager; and
  
  a query engine coupled to the query manager and the query queue;
  
  wherein;
  
  the query holding area is configured to accumulate queries received by the query assistant;
  
  the query manager is configured to;
  
  separate the queries in the query holding area into a plurality of disjoint sets, each of the disjoint sets including queries with a respective from-type, each respective from-type being associated with a combination of data tables accessed by each of the queries in a corresponding disjoint set;
  
  sort the disjoint sets;
  
  sort the queries within each of the disjoint sets; and
  
  insert the queries into the query queue based on the sorting of the disjoint sets and the sorting of the queries within each of the disjoint sets;
  
  the query engine is configured to;
  
  remove the queries in order from the query queue; and
  
  process the queries;
  
  the data set generator is configured to generate a dummy data set based on a first from-type associated with a first disjoint set selected from the disjoint sets; and
  
  the query manager is further configured to;
  
  determine a first result set by having a first query in the first disjoint set processed against the dummy data set;
  
  determine a second result set by having a second query in the first disjoint set processed against the dummy data set;
  
  determine a first sorting metric based on the first result set;
  
  determine a second sorting metric based on the second result set; and
  
  sort the first query and the second query based on the first sorting metric and the second sorting metric.
- View Dependent Claims (14, 15)
- - 14. The query assistant of claim 13 wherein the first sorting metric is based on one or more criteria selected from a group consisting of a number of entries in the first result set and a number of entries in an intersection of the first and second result sets.
  - 15. The query assistant of claim 13 wherein the query engine is a federated query engine.

16. A non-transitory machine-readable medium comprising a plurality of machine-readable instructions which when executed by one or more processors associated with an application server are adapted to cause the one or more processors to perform a method comprising:
- accumulating data queries in a query holding area until an accumulation threshold is reached, the accumulation threshold being based on a combination of one or more criteria selected from a group consisting of a number of data queries that are accumulated in the query holding area, a predetermined period of time, and whether a first-in first-out (FIFO) queue containing previously accumulated data queries is empty;
  
  separating the data queries into a plurality of partitions, each of the partitions including data queries with a respective from-type, each respective from-type being associated with a combination of tables accessed by each of the data source queries in a corresponding partition;
  
  ordering the partitions;
  
  ordering the data queries within each of the partitions;
  
  inserting the accumulated data queries into the FIFO queue based on the ordering of the partitions and the ordering of the data queries within each of the partitions;
  
  removing the data queries in order from the FIFO queue; and
  
  processing the data queries;
  
  wherein ordering the data queries within each of the partitions comprises;
  
  generating a data set pseudo-randomly based on a first from-type associated with a first partition;
  
  generating a first query result by processing a first data query in the first partition against the data set;
  
  generating a second query result by processing a second data query in the first partition against the data set;
  
  determining a first ordering metric based on the first query result;
  
  determining a second ordering metric based on the second query result; and
  
  ordering the first data query and the second data query based on the first ordering metric and the second ordering metric.
- View Dependent Claims (17, 18)
- - 17. The non-transitory machine-readable medium of claim 16,wherein the first ordering metric is based on one or more criteria selected from a group consisting of a number of entries in the first query result and a number of entries included in the first query result that are also included in the second query result.
  - 18. The non-transitory machine-readable medium of claim 16 wherein ordering the partitions comprises ordering the partitions based on one or more criteria selected from a group consisting of:
    - a number of queries in each partition;
      
      a number of entries stored in tables associated with each respective from-type associated with each of the partitions;
      
      a size of query results from recent queries with a same from-type as the respective from-type associated with each of the partitions; and
      
      an ordering among from-types associated with the partitions.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Red Hat, Inc. (International Business Machines Corporation)
Original Assignee
Red Hat, Inc. (International Business Machines Corporation)
Inventors
Eli, Filip, Nguyen, Filip
Primary Examiner(s)
Casanova, Jorge A

Application Number

US14/152,419
Publication Number

US 20150199404A1
Time in Patent Office

767 Days
Field of Search
US Class Current

1/1
CPC Class Codes

G06F 16/2453 Query optimisation

G06F 16/24535 of sub-queries or views

System and method for batch query processing

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for batch query processing

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links