Workflow driven database partitioning
First Claim
Patent Images
1. A computer system comprising:
- a data store comprising;
a plurality of first data files, each having a first value for a first variable, stored in a first partition; and
a plurality of second data files, each having a second value for the first variable, stored in a second partition, wherein the first partition and the second partition are recognizable as separate storage partitions by an operating system, and wherein at least the first variable is part of a nested partition scheme;
one or more hardware computer processors configured to execute computer executable instructions to cause the computer system to;
receive a user-written query that does not specify the first variable;
rewrite the user-written query to include the first value for the first variable in a rewritten query;
select the first partition for access based at least in part on the first value of the first variable in the rewritten query matching the first value of the first variable in each of the plurality of first data files stored in the first partition; and
execute the rewritten query on the first plurality of data files stored in the first partition according to the nested partition scheme; and
wherein variables at higher levels of the nested partition scheme more frequently reduce a search space than other variables at lower levels of the nested partition scheme.
8 Assignments
0 Petitions
Accused Products
Abstract
A database is configured to analyze user queries to dynamically partition the database according to a partition scheme. User queries can be rewritten based on the partition scheme so that, in response to queries, partitions including relevant data are read while partitions including irrelevant data can be skipped, reducing latency. Files can be named according to the partition scheme and stored on respective partitions so that low partition management can be implemented by underlying systems. Blocks within files can be sorted and statistics can be determined. The statistics can be used to find and read relevant blocks and skip irrelevant blocks.
-
Citations
19 Claims
-
1. A computer system comprising:
-
a data store comprising; a plurality of first data files, each having a first value for a first variable, stored in a first partition; and a plurality of second data files, each having a second value for the first variable, stored in a second partition, wherein the first partition and the second partition are recognizable as separate storage partitions by an operating system, and wherein at least the first variable is part of a nested partition scheme; one or more hardware computer processors configured to execute computer executable instructions to cause the computer system to; receive a user-written query that does not specify the first variable; rewrite the user-written query to include the first value for the first variable in a rewritten query; select the first partition for access based at least in part on the first value of the first variable in the rewritten query matching the first value of the first variable in each of the plurality of first data files stored in the first partition; and execute the rewritten query on the first plurality of data files stored in the first partition according to the nested partition scheme; and wherein variables at higher levels of the nested partition scheme more frequently reduce a search space than other variables at lower levels of the nested partition scheme. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
-
-
18. A system comprising:
one or more hardware computer processors configured to execute computer executable instructions to cause the system to; receive a user-written query that does not specify a first variable, wherein the first variable is part of a nested partition scheme; rewrite the user-written query to include a first value for the first variable in a rewritten query; and transmit the rewritten query to be executed according to the nested partition scheme by on a database, wherein the database includes; a plurality of first data files, each having a first value for the first variable, stored in a first partition; and a plurality of second data files, each having a second value for the first variable, stored in a second partition, wherein the first partition and the second partition are recognizable as separate storage partitions by an operating system; wherein, to execute the query according to the nested partition scheme, the first partition is selected for access based at least in part in response to the rewritten query including the first value for the first variable that matches the first value of the first variable in each of the plurality of first data files stored in the first partition; and wherein higher levels of the nested partition scheme more frequently reduce a search space of the database than lower levels of the nested partition scheme. - View Dependent Claims (19)
Specification