Method for failure-resilient data placement in a distributed query processing system

US 9,842,148 B2
Filed: 05/05/2015
Issued: 12/12/2017
Est. Priority Date: 05/05/2015
Status: Active Grant

First Claim

Patent Images

1. A computerized distributed query processing system comprising:

a plurality of computing devices, each computing device being configured with a data store; and

a supervisor computing device communicatively connected to the plurality of computing devices;

wherein the supervisor computing device is configured to;

identify a particular computing device of the plurality of computing devices as a destination computing device of a particular unit of data;

wherein the particular unit of data is uniquely identified, among all units of data stored on the computerized distributed query processing system, by a particular data identifier;

to identify said particular computing device, said supervisor computing device is configured to;

perform a placement function, comprising two or more hash functions, based, at least in part, on the particular data identifier,wherein said supervisor computing device being configured to perform the placement function comprises said supervisor computing device being configured to;

combine results of the two or more hash functions to produce combined results, andidentify the particular computing device to be the destination computing device of the particular unit of data based on the combined results; and

to cause the particular unit of data to be stored on the data store of the particular computing device as the destination computing device of the particular unit of data.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Herein is described a data placement scheme for a distributed query processing systems that achieves load balance amongst the nodes of the system. To identify a node on which to place particular data, a supervisor node performs a placement algorithm over the particular data'"'"'s identifier, where the placement algorithm utilizes two or more hash functions. The supervisor node runs the placement algorithm until a destination node is identified that is available to store the data, or the supervisor node has run the placement algorithm an established number of times. If no available node is identified using the placement algorithm, then an available destination node is identified for the particular data and information identifying the data and the selected destination node is included in an exception map. Most data may be located by any node in the system based on the node performing the placement algorithm for the required data.

61 Citations

20 Claims

1. A computerized distributed query processing system comprising:
- a plurality of computing devices, each computing device being configured with a data store; and
  
  a supervisor computing device communicatively connected to the plurality of computing devices;
  
  wherein the supervisor computing device is configured to;
  
  identify a particular computing device of the plurality of computing devices as a destination computing device of a particular unit of data;
  
  wherein the particular unit of data is uniquely identified, among all units of data stored on the computerized distributed query processing system, by a particular data identifier;
  
  to identify said particular computing device, said supervisor computing device is configured to;
  
  perform a placement function, comprising two or more hash functions, based, at least in part, on the particular data identifier,wherein said supervisor computing device being configured to perform the placement function comprises said supervisor computing device being configured to;
  
  combine results of the two or more hash functions to produce combined results, andidentify the particular computing device to be the destination computing device of the particular unit of data based on the combined results; and
  
  to cause the particular unit of data to be stored on the data store of the particular computing device as the destination computing device of the particular unit of data.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The computerized distributed query processing system of claim 1, wherein the supervisor computing device is further configured to:
    - prior to performance of the placement function, determine whether the placement function has been run, to identify the destination computing device for the particular unit of data, less than a particular number of times; and
      
      in response to a determination that the placement function has been run, to identify the destination computing device for the particular unit of data, less than the particular number of times, perform the placement function on the particular data identifier.
  - 3. The computerized distributed query processing system of claim 1, wherein the supervisor computing device is further configured to:
    - identify a certain computing device of the plurality of computing devices as a destination computing device of a second unit of data;
      
      wherein the second unit of data is uniquely identified, among all units of data stored on the computerized distributed query processing system, by a second data identifier;
      
      to identify said certain computing device, said supervisor computing device is configured to;
      
      determine whether the placement function has been run, to identify the destination computing device for the second unit of data, less than or equal to a particular number of times; and
      
      in response to a determination that the placement function has been run, to identify the destination computing device for the second unit of data, the particular number of times;
      
      identify the certain computing device to be the destination computing device of the second unit of data based, at least in part, on information that indicates that the certain computing device is available to store the second unit of data, andinclude information mapping the second unit of data to the certain computing device in an exception map stored at the supervisor computing device; and
      
      based on identification of the certain computing device, cause the second unit of data to be stored on the certain computing device.
  - 4. The computerized distributed query processing system of claim 1, wherein the supervisor computing device is further configured to:
    - identify a certain computing device of the plurality of computing devices as a destination computing device of a second unit of data;
      
      wherein the second unit of data is uniquely identified, among all units of data stored on the computerized distributed query processing system, by a second data identifier;
      
      to identify said certain computing device, said supervisor computing device is configured to;
      
      perform a first performance of the placement function that is based, at least in part, on the second data identifier,wherein the first performance of the placement function comprises combining results of the two or more hash functions to produce second combined results,identify a first unavailable computing device based on the second combined results,determine that the first unavailable computing device is not available to store the second unit of data,in response to a determination that the first unavailable computing device is not available to store the second unit of data, perform a second performance of the placement function that is based, at least in part, on the second data identifier,wherein the second performance of the placement function comprises combining results of the two or more hash functions to produce third combined results,wherein the third combined results are different than the second combined results,identify the certain computing device based on the third combined results, anddetermine that the certain computing device is available to store the second unit of data; and
      
      in response to a determination that the certain computing device is available to store the second unit of data, cause the second unit of data to be stored on the certain computing device.
  - 5. The computerized distributed query processing system of claim 1, wherein the supervisor computing device is further configured to retrieve a certain unit of data from a certain computing device of the plurality of computing devices by being configured to:
    - retrieve information mapping an identifier of the certain unit of data to the certain computing device in an exception map stored at the supervisor computing device; and
      
      retrieve the certain unit of data from the certain computing device based on the retrieved information mapping the identifier of the certain unit of data to the certain computing device.
  - 6. The computerized distributed query processing system of claim 1, wherein the supervisor computing device is further configured to retrieve a certain unit of data from a certain computing device of the plurality of computing devices by being configured to:
    - determine that an exception map stored at the supervisor computing device does not include information mapping a certain data identifier that identifies the certain unit of data to any of the plurality of computing devices;
      
      in response to a determination that the exception map does not include information mapping the certain data identifier to any of the plurality of computing devices;
      
      perform a second performance of the placement function based, at least in part, on the certain data identifier,wherein the second performance of the placement function comprises combining results of the two or more hash functions to produce second combined results, andidentify the certain computing device to be the destination computing device of the certain unit of data based on the second combined results; and
      
      based on an identification of the certain computing device to be the destination computing device of the certain unit of data, retrieving the certain unit of data from the certain computing device.
  - 7. The computerized distributed query processing system of claim 1, wherein one or more of the plurality of computing devices are independently configured to retrieve a certain unit of data from a certain computing device of the plurality of computing devices by being configured to:
    - perform a second performance of the placement function based, at least in part, on a certain data identifier that identifies the certain unit of data;
      
      wherein the second performance of the placement function comprises combining results of the two or more hash functions to produce second combined results;
      
      identify the certain computing device to be the destination computing device of the certain unit of data based on the second combined results; and
      
      based on an identification of the certain computing device to be the destination computing device of the certain unit of data, retrieve the certain unit of data from the certain computing device.

8. A computer-implemented method comprising:
- identifying a particular computing device, of a plurality of computing devices, as a destination computing device of a particular unit of data;
  
  wherein a distributed query processing system comprises the plurality of computing devices;
  
  wherein each of the plurality of computing devices is configured with a data store;
  
  wherein the particular unit of data is uniquely identified, among all units of data stored on the distributed query processing system, by a particular data identifier;
  
  wherein identifying the particular computing device comprises;
  
  performing a placement function, comprising two or more hash functions, based, at least in part, on the particular data identifier,wherein performance of the placement function comprises combining results of the two or more hash functions to produce combined results, andidentifying the particular computing device to be the destination computing device of the particular unit of data based on the combined results; and
  
  based on identifying the particular computing device, causing the particular unit of data to be stored on the data store of the particular computing device.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The method of claim 8, further comprising:
    - prior to performing the placement function, determining whether the placement function has been run, to identify the destination computing device for the particular unit of data, less than a particular number of times; and
      
      in response to determining that the placement function has been run, to identify the destination computing device for the particular unit of data, less than the particular number of times, performing the placement function on the particular data identifier.
  - 10. The method of claim 8, further comprising:
    - identifying a certain computing device of the plurality of computing devices as a destination computing device of a second unit of data;
      
      wherein the second unit of data is uniquely identified, among all units of data stored on the distributed query processing system, by a second data identifier;
      
      wherein identifying the certain computing device comprises;
      
      determining whether the placement function has been run, to identify the destination computing device for the second unit of data, less than or equal to a particular number of times; and
      
      in response to determining that the placement function has been run, to identify the destination computing device for the second unit of data, the particular number of times;
      
      identifying the certain computing device to be the destination computing device of the second unit of data based, at least in part, on information that indicates that the certain computing device is available to store the second unit of data, andincluding information mapping the second unit of data to the certain computing device in an exception map stored at a supervisor computing device; and
      
      based on identifying the certain computing device, causing the second unit of data to be stored on the certain computing device.
  - 11. The method of claim 8, further comprising:
    - identifying a certain computing device of the plurality of computing devices as a destination computing device of a second unit of data;
      
      wherein the second unit of data is uniquely identified, among all units of data stored on the distributed query processing system, by a second data identifier;
      
      wherein identifying the certain computing device comprises;
      
      performing a first performance of the placement function that is based, at least in part, on the second data identifier,wherein the first performance of the placement function comprises combining results of the two or more hash functions to produce second combined results,identifying a first unavailable computing device based on the second combined results,determining that the first unavailable computing device is not available to store the second unit of data,in response to determining that the first unavailable computing device is not available to store the second unit of data, performing a second performance of the placement function that is based, at least in part, on the second data identifier,wherein the second performance of the placement function comprises combining results of the two or more hash functions to produce third combined results,wherein the third combined results are different than the second combined results,identifying the certain computing device based on the third combined results, anddetermining that the certain computing device is available to store the second unit of data; and
      
      in response to determining that the certain computing device is available to store the second unit of data, causing the second unit of data to be stored on the certain computing device.
  - 12. The method of claim 8, further comprising:
    - a supervisor computing device retrieving a certain unit of data from a certain computing device of the plurality of computing devices by;
      
      retrieving information mapping an identifier of the certain unit of data to the certain computing device in an exception map stored at the supervisor computing device; and
      
      retrieving the certain unit of data from the certain computing device based on the retrieved information mapping the identifier of the certain unit of data to the certain computing device.
  - 13. The method of claim 8, further comprising:
    - a supervisor computing device retrieving a certain unit of data from a certain computing device of the plurality of computing devices by;
      
      determining that an exception map stored at the supervisor computing device does not include information mapping a certain data identifier that identifies the certain unit of data to any of the plurality of computing devices;
      
      in response to determining that the exception map does not include information mapping the certain data identifier to any of the plurality of computing devices;
      
      performing a second performance of the placement function based, at least in part, on the certain data identifier,wherein the second performance of the placement function comprises combining results of the two or more hash functions to produce second combined results, andidentifying the certain computing device to be the destination computing device of the certain unit of data based on the second combined results; and
      
      based on identifying the certain computing device to be the destination computing device of the certain unit of data, retrieving the certain unit of data from the certain computing device.
  - 14. The method of claim 8, further comprising:
    - the particular computing device retrieving a certain unit of data from a certain computing device of the plurality of computing devices by;
      
      performing a second performance of the placement function based, at least in part, on a certain data identifier that identifies the certain unit of data;
      
      wherein the second performance of the placement function comprises combining results of the two or more hash functions to produce second combined results;
      
      identifying the certain computing device to be the destination computing device of the certain unit of data based on the second combined results; and
      
      based on identifying the certain computing device to be the destination computing device of the certain unit of data, retrieving the certain unit of data from the certain computing device.

15. One or more non-transitory computer-readable media storing one or more sequences of instructions which, when executed by one or more processors, cause:
- identifying a particular computing device, of a plurality of computing devices, as a destination computing device of a particular unit of data;
  
  wherein a distributed query processing system comprises the plurality of computing devices;
  
  wherein each of the plurality of computing devices is configured with a data store;
  
  wherein the particular unit of data is uniquely identified, among all units of data stored on the distributed query processing system, by a particular data identifier;
  
  wherein identifying the particular computing device comprises;
  
  performing a placement function, comprising two or more hash functions, based, at least in part, on the particular data identifier,wherein performance of the placement function comprises combining results of the two or more hash functions to produce combined results, andidentifying the particular computing device to be the destination computing device of the particular unit of data based on the combined results; and
  
  based on identifying the particular computing device, causing the particular unit of data to be stored on the data store of the particular computing device.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. The one or more non-transitory computer-readable media of claim 15, wherein the one or more sequences of instructions further comprise instructions, which, when executed by one or more processors, cause:
    - prior to performing the placement function, determining whether the placement function has been run, to identify the destination computing device for the particular unit of data, less than a particular number of times; and
      
      in response to determining that the placement function has been run, to identify the destination computing device for the particular unit of data, less than the particular number of times, performing the placement function on the particular data identifier.
  - 17. The one or more non-transitory computer-readable media of claim 15, wherein the one or more sequences of instructions further comprise instructions, which, when executed by one or more processors, cause:
    - identifying a certain computing device of the plurality of computing devices as a destination computing device of a second unit of data;
      
      wherein the second unit of data is uniquely identified, among all units of data stored on the distributed query processing system, by a second data identifier;
      
      wherein identifying the certain computing device comprises;
      
      determining whether the placement function has been run, to identify the destination computing device for the second unit of data, less than or equal to a particular number of times; and
      
      in response to determining that the placement function has been run, to identify the destination computing device for the second unit of data, the particular number of times;
      
      identifying the certain computing device to be the destination computing device of the second unit of data based, at least in part, on information that indicates that the certain computing device is available to store the second unit of data, andincluding information mapping the second unit of data to the certain computing device in an exception map stored at a supervisor computing device; and
      
      based on identifying the certain computing device, causing the second unit of data to be stored on the certain computing device.
  - 18. The one or more non-transitory computer-readable media of claim 15, wherein the one or more sequences of instructions further comprise instructions, which, when executed by one or more processors, cause:
    - identifying a certain computing device of the plurality of computing devices as a destination computing device of a second unit of data;
      
      wherein the second unit of data is uniquely identified, among all units of data stored on the distributed query processing system, by a second data identifier;
      
      wherein identifying the certain computing device comprises;
      
      performing a first performance of the placement function that is based, at least in part, on the second data identifier,wherein the first performance of the placement function comprises combining results of the two or more hash functions to produce second combined results,identifying a first unavailable computing device based on the second combined results,determining that the first unavailable computing device is not available to store the second unit of data,in response to determining that the first unavailable computing device is not available to store the second unit of data, performing a second performance of the placement function that is based, at least in part, on the second data identifier,wherein the second performance of the placement function comprises combining results of the two or more hash functions to produce third combined results,wherein the third combined results are different than the second combined results,identifying the certain computing device based on the third combined results, anddetermining that the certain computing device is available to store the second unit of data; and
      
      in response to determining that the certain computing device is available to store the second unit of data, causing the second unit of data to be stored on the certain computing device.
  - 19. The one or more non-transitory computer-readable media of claim 15, wherein the one or more sequences of instructions further comprise instructions, which, when executed by one or more processors, cause:
    - a supervisor computing device retrieving a certain unit of data from a certain computing device of the plurality of computing devices by;
      
      retrieving information mapping an identifier of the certain unit of data to the certain computing device in an exception map stored at the supervisor computing device; and
      
      retrieving the certain unit of data from the certain computing device based on the retrieved information mapping the identifier of the certain unit of data to the certain computing device.
  - 20. The one or more non-transitory computer-readable media of claim 15, wherein the one or more sequences of instructions further comprise instructions, which, when executed by one or more processors, cause:
    - a supervisor computing device retrieving a certain unit of data from a certain computing device of the plurality of computing devices by;
      
      determining that an exception map stored at the supervisor computing device does not include information mapping a certain data identifier that identifies the certain unit of data to any of the plurality of computing devices;
      
      in response to determining that the exception map does not include information mapping the certain data identifier to any of the plurality of computing devices;
      
      performing a second performance of the placement function based, at least in part, on the certain data identifier,wherein the second performance of the placement function comprises combining results of the two or more hash functions to produce second combined results, andidentifying the certain computing device to be the destination computing device of the certain unit of data based on the second combined results; and
      
      based on identifying the certain computing device to be the destination computing device of the certain unit of data, retrieving the certain unit of data from the certain computing device.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Oracle International Corporation (Oracle Corporation)
Original Assignee
Oracle International Corporation (Oracle Corporation)
Inventors
Zhang, Gong, Petride, Sabina, Klots, Boris, Idicula, Sam, Agarwal, Nipun
Primary Examiner(s)
Aspinwall, Evan

Application Number

US14/704,825
Publication Number

US 20160328456A1
Time in Patent Office

952 Days
Field of Search

707747
US Class Current
CPC Class Codes

G06F 16/2471 Distributed queries

Method for failure-resilient data placement in a distributed query processing system

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

61 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Method for failure-resilient data placement in a distributed query processing system

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

61 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links