Workflow driven database partitioning

US 10,614,069 B2
Filed: 01/15/2018
Issued: 04/07/2020
Est. Priority Date: 12/01/2017
Status: Active Grant

First Claim

Patent Images

1. A computer system comprising:

a data store comprising;

a plurality of first data files, each having a first value for a first variable, stored in a first partition; and

a plurality of second data files, each having a second value for the first variable, stored in a second partition, wherein the first partition and the second partition are recognizable as separate storage partitions by an operating system, and wherein at least the first variable is part of a nested partition scheme;

one or more hardware computer processors configured to execute computer executable instructions to cause the computer system to;

receive a user-written query that does not specify the first variable;

rewrite the user-written query to include the first value for the first variable in a rewritten query;

select the first partition for access based at least in part on the first value of the first variable in the rewritten query matching the first value of the first variable in each of the plurality of first data files stored in the first partition; and

execute the rewritten query on the first plurality of data files stored in the first partition according to the nested partition scheme; and

wherein variables at higher levels of the nested partition scheme more frequently reduce a search space than other variables at lower levels of the nested partition scheme.

View all claims

8 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A database is configured to analyze user queries to dynamically partition the database according to a partition scheme. User queries can be rewritten based on the partition scheme so that, in response to queries, partitions including relevant data are read while partitions including irrelevant data can be skipped, reducing latency. Files can be named according to the partition scheme and stored on respective partitions so that low partition management can be implemented by underlying systems. Blocks within files can be sorted and statistics can be determined. The statistics can be used to find and read relevant blocks and skip irrelevant blocks.

Citations

19 Claims

1. A computer system comprising:
- a data store comprising;
  
  a plurality of first data files, each having a first value for a first variable, stored in a first partition; and
  
  a plurality of second data files, each having a second value for the first variable, stored in a second partition, wherein the first partition and the second partition are recognizable as separate storage partitions by an operating system, and wherein at least the first variable is part of a nested partition scheme;
  
  one or more hardware computer processors configured to execute computer executable instructions to cause the computer system to;
  
  receive a user-written query that does not specify the first variable;
  
  rewrite the user-written query to include the first value for the first variable in a rewritten query;
  
  select the first partition for access based at least in part on the first value of the first variable in the rewritten query matching the first value of the first variable in each of the plurality of first data files stored in the first partition; and
  
  execute the rewritten query on the first plurality of data files stored in the first partition according to the nested partition scheme; and
  
  wherein variables at higher levels of the nested partition scheme more frequently reduce a search space than other variables at lower levels of the nested partition scheme.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
- - 2. The computer system of claim 1, wherein the one or more hardware computer processors are configured to execute computer executable instructions to further cause the computer system to execute the rewritten query according to the nested partition scheme without reading the plurality of second data files from the second partition in response to a mismatch between the first value for the first variable in the rewritten query and the second value.
  - 3. The computer system of claim 1, wherein data files in the first partition are configured to be fetched as a plurality of blocks of a common block size, and wherein the first partition includes statistics about variables included in the plurality of blocks.
  - 4. The computer system of claim 3, wherein the statistics include data about values of the first variable.
  - 5. The computer system of claim 4, wherein the one or more hardware computer processors are configured to execute computer executable instructions to further cause the computer system to:
    - based at least in part on the statistics indicating that values of the first variable in a first block include the value for the first variable, read the first block in the first partition; and
      
      based at least in part on the statistics indicating that values of the first variable in a second block do not include the value for the first variable, reading the second block in the first partition.
  - 6. The computer system of claim 1, wherein:
    - a first data file has a first filename and is stored in the first partition, the first filename indicating the first value of the first variable; and
      
      a second data file has a second filename and is stored in the second partition, the second filename indicating the second value of the first variable.
  - 7. The computer system of claim 1, wherein the one or more hardware computer processors are configured to execute computer executable instructions to further cause the computer system to:
    - determine a frequency of a presence of the first variable in user-written queries.
  - 8. The computer system of claim 1, wherein the one or more hardware computer processors are configured to execute computer executable instructions to further cause the computer system to:
    - determine a frequency of values of the first variable in query results.
  - 9. The computer system of claim 1, wherein the one or more hardware computer processors are configured to execute computer executable instructions to further cause the computer system to:
    - determine how frequently the first variable separates query results from data not included in the query results.
  - 10. The computer system of claim 1, wherein the one or more hardware computer processors are configured to execute computer executable instructions to further cause the computer system to:
    - determine the nested partition scheme based at least in part on a frequency of the first variable, wherein the nested partition scheme includes storing data files having the first value for the first variable on one or more partitions that include the first partition and storing data files having the second value for the first variable on one or more different partitions that include the second partition.
  - 11. The computer system of claim 10, wherein the one or more hardware computer processors are configured to execute computer executable instructions to further cause the computer system to:
    - rewrite the user-written query to include the first value for the first variable in a rewritten query based at least in part on the first variable being included in the nested partition scheme.
  - 12. The computer system of claim 10, wherein the nested partition scheme comprises at least two levels of variables used to separate partitions.
  - 13. The computer system of claim 12, wherein the variables at the higher levels of the nested partition scheme more frequently reduce a search space of the data store by larger amounts than the other variables at the lower levels of the nested partition scheme.
  - 14. The computer system of claim 1, wherein the first variable is a lower cardinality superset of a user-written variable in the user-written query.
  - 15. The computer system of claim 1, wherein the first variable is a Boolean variable.
  - 16. The computer system of claim 1, wherein the first variable is a lower cardinality hash of a user-written variable in the user-written query.
  - 17. The computer system of claim 1, wherein the one or more hardware computer processors are configured to execute computer executable instructions in order to cause the computer system to:
    - determine a first partition scheme based at least in part on queries in a first workflow; and
      
      determine a second partition scheme based at least in part on queries in a second workflow; and
      
      wherein the data store includes data files redundantly stored under both the first partition scheme and under the second partition scheme.

18. A system comprising:
- one or more hardware computer processors configured to execute computer executable instructions to cause the system to;
  
  receive a user-written query that does not specify a first variable, wherein the first variable is part of a nested partition scheme;
  
  rewrite the user-written query to include a first value for the first variable in a rewritten query; and
  
  transmit the rewritten query to be executed according to the nested partition scheme by on a database, wherein the database includes;
  
  a plurality of first data files, each having a first value for the first variable, stored in a first partition; and
  
  a plurality of second data files, each having a second value for the first variable, stored in a second partition, wherein the first partition and the second partition are recognizable as separate storage partitions by an operating system;
  
  wherein, to execute the query according to the nested partition scheme, the first partition is selected for access based at least in part in response to the rewritten query including the first value for the first variable that matches the first value of the first variable in each of the plurality of first data files stored in the first partition; and
  
  wherein higher levels of the nested partition scheme more frequently reduce a search space of the database than lower levels of the nested partition scheme.
- View Dependent Claims (19)
- - 19. The system of claim 18, wherein:
    - a first data file has a first filename and is stored in the first partition, the first filename indicating the first value of the first variable; and
      
      a second data file has a second filename and is stored in the second partition, the second filename indicating the second value of the first variable.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Palantir Technologies Incorporated
Original Assignee
Palantir Technologies Incorporated
Inventors
Ding, James
Primary Examiner(s)
Hu, Jensen

Application Number

US15/871,608
Publication Number

US 20190171743A1
Time in Patent Office

813 Days
Field of Search
US Class Current
CPC Class Codes

G06F 16/213   with details for schema evo...

G06F 16/217   Database tuning G06F16/2282...

G06F 16/221   Column-oriented storage; Ma...

G06F 16/242   Query formulation

G06F 16/2423   Interactive query statement...

G06F 16/24554   Unary operations; Data part...

Workflow driven database partitioning

First Claim

8 Assignments

0 Petitions

Accused Products

Abstract

Citations

19 Claims

Specification

Solutions

Use Cases

Quick Links

Workflow driven database partitioning

First Claim

8 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

19 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links