Data management systems and methods
First Claim
Patent Images
1. A method comprising:
- receiving a query directed to a database;
identifying a plurality of files within the database to process in order to generate a response to the query;
identifying a plurality of execution nodes available to process the plurality of files;
creating a plurality of scansets and assigning each scanset thereof to a different node of the plurality of execution nodes based on a file assignment model, wherein each scanset of the plurality of scansets includes a different portion of the plurality of files and each file of the plurality of files is found somewhere within the plurality of scansets;
processing, by the plurality of execution nodes, the multiple scansets in parallel;
determining, during the processing, that a first execution node has finished processing all files in its assigned scanset of the plurality of scansets;
responding to the determining byidentifying an unprocessed file within a scanset of the plurality scansets that was assigned to a second execution node, andassigning the unprocessed file to the first execution node to be processed thereby; and
generating, based on the processing, the response to the query.
3 Assignments
0 Petitions
Accused Products
Abstract
Example data management systems and methods are described. In one implementation, a method identifies multiple files to process based on a received query and identifies multiple execution nodes available to process the multiple files. The method initially creates multiple scansets, each including a portion of the multiple files, and assigns each scanset to one of the execution nodes based on a file assignment model. The multiple scansets are processed by the multiple execution nodes. If the method determines that a particular execution node has finished processing all files in its assigned scanset, an unprocessed file is reassigned from another execution node to the particular execution node.
-
Citations
24 Claims
-
1. A method comprising:
-
receiving a query directed to a database; identifying a plurality of files within the database to process in order to generate a response to the query; identifying a plurality of execution nodes available to process the plurality of files; creating a plurality of scansets and assigning each scanset thereof to a different node of the plurality of execution nodes based on a file assignment model, wherein each scanset of the plurality of scansets includes a different portion of the plurality of files and each file of the plurality of files is found somewhere within the plurality of scansets; processing, by the plurality of execution nodes, the multiple scansets in parallel; determining, during the processing, that a first execution node has finished processing all files in its assigned scanset of the plurality of scansets; responding to the determining by identifying an unprocessed file within a scanset of the plurality scansets that was assigned to a second execution node, and assigning the unprocessed file to the first execution node to be processed thereby; and generating, based on the processing, the response to the query. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. An apparatus comprising:
-
at least one processor; memory operably connected to the at least one processor; and the memory storing a request processing module configured to receive a query directed to a database and identify a plurality of files within the database to process in order to generate a response to the query, a virtual warehouse manager configured to identify a plurality of execution nodes available to process the plurality of files, a transaction management module configured to create a plurality of scansets and assign each scanset thereof to a different node of the plurality of execution nodes based on a file assignment model, wherein each scanset of the plurality of scansets includes a different subset of the plurality of files and each file of the plurality of files is found somewhere within the plurality of scansets, the transaction management module further configured to determine when a first execution node has finished processing all files in its assigned scanset of the plurality of scansets and respond by identifying an unprocessed file within a scanset of the plurality scansets that was assigned to a second node and assigning the unprocessed file to the first execution node to be processed thereby, and a resource manager module configured to respond to the query based on the processing of the plurality of files performed by the plurality of execution nodes. - View Dependent Claims (16, 17, 18, 19, 20, 21, 22)
-
-
23. An apparatus comprising:
-
means for receiving a query directed to a database and identifying a plurality of files within the database to process in order to generate a response to the query; means for identifying a plurality of execution nodes available to process the plurality of files; means for creating a plurality of scansets and assigning each scanset thereof to a different node of the plurality of execution nodes based on a file assignment model, wherein each scanset of the plurality of scansets includes a different subset of the plurality of files and each file of the plurality of files is found somewhere within the plurality of scansets; means for determining when a first execution node has finished processing all files in its assigned scanset of the plurality of scansets and responding to the determining by identifying an unprocessed file within a scanset of the plurality scansets that was assigned to a second node and assigning the unprocessed file to the first execution node to be processed thereby; and means for responding to the query based on the processing of the plurality of files performed by the plurality of execution nodes. - View Dependent Claims (24)
-
Specification