×

Data placement control for distributed computing environment

  • US 10,055,458 B2
  • Filed: 07/30/2015
  • Issued: 08/21/2018
  • Est. Priority Date: 07/30/2015
  • Status: Active Grant
First Claim
Patent Images

1. A method comprising:

  • dividing a first dataset including a first plurality of elements into partitions by hashing a key for each of the first plurality of elements to generate a hash value for the key of the element, wherein each element of the first plurality of elements is stored in a partition corresponding to the hash value for the key of the element;

    selecting a set of distributed storage system nodes as a first primary node group for storage of the partitions of the first dataset;

    causing a primary copy of the partitions of the first dataset to be stored on the first primary node group by a distributed storage system file server based on the respective hash values such that the storage system node on which each element of the first plurality of elements is stored is associated with the hash value for the key of the element;

    dividing at least one additional dataset into partitions by hashing a key for each element of the at least one additional dataset to generate a hash value for the key of the element, wherein the datasets comprise tables; and

    causing a primary copy of the partitions of each additional dataset to be stored on corresponding primary node groups by the distributed storage system file server as a function of hash values such that the storage system node of each partition in the corresponding primary node group is known by hashing of the key, wherein a number of partitions that store each of the tables is a power of two, and wherein at least one partition is striped across multiple nodes of the primary node group.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×