System and method for dynamic database split generation in a massively parallel or distributed database environment
First Claim
1. A method for dynamic database split generation in a massively parallel or other distributed database environment including a plurality of databases and a data warehouse layer providing querying of the plurality of databases and data summarization of the plurality of databases in a table, the method comprising:
- obtaining by a database table accessor executing on one or more microprocessors, from an associated client application, a query for data in the table of the data warehouse layer, the query comprising query data representative of a user query and user splitter kind preference data representative of a user split preference specifying how an associated user would prefer the table to be split for performing the query for data;
obtaining table data representative of one or more properties of the table, the table data comprising table size data representative of a total size of the table;
selecting a splits generator from among an enumeration of splitter kinds in accordance with;
the user split preference when it is determined by the database table accessor that splitting the table using the user split preference would improve a performance of the query for data relative to splitting the table based on the one or more properties of the table, orthe one or more properties of the table when it is determined by the database table accessor that splitting the table based on the one or more properties of the table would improve the performance of the query for data relative to splitting the table based on the user split preference;
generating, by the selected splits generator, table splits dividing the user query into a plurality of query splits; and
outputting the plurality of query splits to an associated plurality of mappers for execution by the associated plurality of mappers of each of the plurality of query splits as tasks of a selected data processing framework against the table.
1 Assignment
0 Petitions
Accused Products
Abstract
A system and method is described for database split generation in a massively parallel or other distributed database environment including a plurality of databases and a data warehouse layer providing data summarization and querying functionality. A database table accessor of the system obtains, from an associated client application, a query for data in a table of the data warehouse layer, wherein the query includes a user preference. The system obtains table data representative of properties of the table, and determines a splits generator in accordance with one or more of the user preference or the properties of the table. The system generates, by the selected splits generator, table splits dividing the user query into a plurality of query splits, and outputs the plurality of query splits to an associated plurality of mappers for execution by the associated plurality of mappers of each of the plurality of query splits against the table.
-
Citations
19 Claims
-
1. A method for dynamic database split generation in a massively parallel or other distributed database environment including a plurality of databases and a data warehouse layer providing querying of the plurality of databases and data summarization of the plurality of databases in a table, the method comprising:
-
obtaining by a database table accessor executing on one or more microprocessors, from an associated client application, a query for data in the table of the data warehouse layer, the query comprising query data representative of a user query and user splitter kind preference data representative of a user split preference specifying how an associated user would prefer the table to be split for performing the query for data; obtaining table data representative of one or more properties of the table, the table data comprising table size data representative of a total size of the table; selecting a splits generator from among an enumeration of splitter kinds in accordance with; the user split preference when it is determined by the database table accessor that splitting the table using the user split preference would improve a performance of the query for data relative to splitting the table based on the one or more properties of the table, or the one or more properties of the table when it is determined by the database table accessor that splitting the table based on the one or more properties of the table would improve the performance of the query for data relative to splitting the table based on the user split preference; generating, by the selected splits generator, table splits dividing the user query into a plurality of query splits; and outputting the plurality of query splits to an associated plurality of mappers for execution by the associated plurality of mappers of each of the plurality of query splits as tasks of a selected data processing framework against the table. - View Dependent Claims (2, 3, 4, 5, 6, 19)
-
-
7. A system for dynamic database split generation in a massively parallel or other distributed database environment including a plurality of databases and a data warehouse layer providing querying of the plurality of databases and data summarization of the plurality of databases in a table, the system comprising:
-
one or more microprocessors; a database table accessor running on the one or more microprocessors, wherein the database table accessor operates to perform a method comprising; obtaining, from an associated client application, a query for data in the table of the data warehouse layer, the query comprising query data representative of a user query and user splitter kind preference data representative of a user split preference specifying how an associated user would prefer the table to be split for performing the query for data; obtaining table data representative of one or more properties of the table, the table data comprising table size data representative of a total size of the table; selecting a splits generator from among an enumeration of splitter kinds in accordance with; the user split preference when it is determined by the database table accessor that splitting the table using the user split preference would improve a performance of the query for data relative to splitting the table based on the one or more properties of the table, or the one or more properties of the table when it is determined by the database table accessor that splitting the table based on the one or more properties of the table would improve the performance of the query for data relative to splitting the table based on the user split preference; generating, by the selected splits generator, table splits dividing the user query into a plurality of query splits; and outputting the plurality of query splits to an associated plurality of mappers for execution by the associated plurality of mappers of each of the plurality of query splits as tasks of a selected data processing framework against the table. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. A non-transitory computer readable storage medium, including instructions stored thereon which when read and executed by one or more computers of a database table accessor in a massively parallel or other distributed database environment including a plurality of databases and a data warehouse layer providing querying of the plurality of databases and data summarization of the plurality of databases in a table, cause the one or more computers of the database table accessor to perform a method of dynamic database split generation comprising:
-
obtaining, from an associated client application, a query for data in the table of the data warehouse layer, the query comprising query data representative of a user query and user splitter kind preference data representative of a user split preference specifying how an associated user would prefer the table to be split for performing the query for data; obtaining table data representative of one or more properties of the table, the table data comprising table size data representative of a total size of the table; selecting a splits generator from among an enumeration of splitter kinds in accordance with; the user split preference when it is determined by the database table accessor that splitting the table using the user split preference would improve a performance of the query for data relative to splitting the table based on the one or more properties of the table, or the one or more properties of the table when it is determined by the database table accessor that splitting the table based on the one or more properties of the table would improve the performance of the query for data relative to splitting the table based on the user split preference; generating, by the selected splits generator, table splits dividing the user query into a plurality of query splits; and outputting the plurality of query splits to an associated plurality of mappers for execution by the associated plurality of mappers of each of the plurality of query splits as tasks of a selected data processing framework against the table. - View Dependent Claims (14, 15, 16, 17, 18)
-
Specification