Method for failure-resilient data placement in a distributed query processing system
First Claim
1. A computerized distributed query processing system comprising:
- a plurality of computing devices, each computing device being configured with a data store; and
a supervisor computing device communicatively connected to the plurality of computing devices;
wherein the supervisor computing device is configured to;
identify a particular computing device of the plurality of computing devices as a destination computing device of a particular unit of data;
wherein the particular unit of data is uniquely identified, among all units of data stored on the computerized distributed query processing system, by a particular data identifier;
to identify said particular computing device, said supervisor computing device is configured to;
perform a placement function, comprising two or more hash functions, based, at least in part, on the particular data identifier,wherein said supervisor computing device being configured to perform the placement function comprises said supervisor computing device being configured to;
combine results of the two or more hash functions to produce combined results, andidentify the particular computing device to be the destination computing device of the particular unit of data based on the combined results; and
to cause the particular unit of data to be stored on the data store of the particular computing device as the destination computing device of the particular unit of data.
1 Assignment
0 Petitions
Accused Products
Abstract
Herein is described a data placement scheme for a distributed query processing systems that achieves load balance amongst the nodes of the system. To identify a node on which to place particular data, a supervisor node performs a placement algorithm over the particular data'"'"'s identifier, where the placement algorithm utilizes two or more hash functions. The supervisor node runs the placement algorithm until a destination node is identified that is available to store the data, or the supervisor node has run the placement algorithm an established number of times. If no available node is identified using the placement algorithm, then an available destination node is identified for the particular data and information identifying the data and the selected destination node is included in an exception map. Most data may be located by any node in the system based on the node performing the placement algorithm for the required data.
61 Citations
20 Claims
-
1. A computerized distributed query processing system comprising:
-
a plurality of computing devices, each computing device being configured with a data store; and a supervisor computing device communicatively connected to the plurality of computing devices; wherein the supervisor computing device is configured to; identify a particular computing device of the plurality of computing devices as a destination computing device of a particular unit of data; wherein the particular unit of data is uniquely identified, among all units of data stored on the computerized distributed query processing system, by a particular data identifier; to identify said particular computing device, said supervisor computing device is configured to; perform a placement function, comprising two or more hash functions, based, at least in part, on the particular data identifier, wherein said supervisor computing device being configured to perform the placement function comprises said supervisor computing device being configured to; combine results of the two or more hash functions to produce combined results, and identify the particular computing device to be the destination computing device of the particular unit of data based on the combined results; and to cause the particular unit of data to be stored on the data store of the particular computing device as the destination computing device of the particular unit of data. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A computer-implemented method comprising:
-
identifying a particular computing device, of a plurality of computing devices, as a destination computing device of a particular unit of data; wherein a distributed query processing system comprises the plurality of computing devices; wherein each of the plurality of computing devices is configured with a data store; wherein the particular unit of data is uniquely identified, among all units of data stored on the distributed query processing system, by a particular data identifier; wherein identifying the particular computing device comprises; performing a placement function, comprising two or more hash functions, based, at least in part, on the particular data identifier, wherein performance of the placement function comprises combining results of the two or more hash functions to produce combined results, and identifying the particular computing device to be the destination computing device of the particular unit of data based on the combined results; and based on identifying the particular computing device, causing the particular unit of data to be stored on the data store of the particular computing device. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. One or more non-transitory computer-readable media storing one or more sequences of instructions which, when executed by one or more processors, cause:
-
identifying a particular computing device, of a plurality of computing devices, as a destination computing device of a particular unit of data; wherein a distributed query processing system comprises the plurality of computing devices; wherein each of the plurality of computing devices is configured with a data store; wherein the particular unit of data is uniquely identified, among all units of data stored on the distributed query processing system, by a particular data identifier; wherein identifying the particular computing device comprises; performing a placement function, comprising two or more hash functions, based, at least in part, on the particular data identifier, wherein performance of the placement function comprises combining results of the two or more hash functions to produce combined results, and identifying the particular computing device to be the destination computing device of the particular unit of data based on the combined results; and based on identifying the particular computing device, causing the particular unit of data to be stored on the data store of the particular computing device. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification