DATA PLACEMENT CONTROL FOR DISTRIBUTED COMPUTING ENVIRONMENT
First Claim
Patent Images
1. A method comprising:
- dividing a dataset into partitions by hashing a specified key;
selecting a set of distributed storage system nodes as a primary node group for storage of the partitions; and
causing a primary copy of the partitions to be stored on the primary node group by a distributed storage system file server such that the location of each partition is known by hashing of the specified key.
1 Assignment
0 Petitions
Accused Products
Abstract
A method includes dividing a dataset into partitions by hashing a specified key, selecting a set of distributed file system nodes as a primary node group for storage of the partitions, and causing a primary copy of the partitions to be stored on the primary node group by a distributed storage system file server such that the location of each partition is known by hashing of the specified key.
-
Citations
20 Claims
-
1. A method comprising:
-
dividing a dataset into partitions by hashing a specified key; selecting a set of distributed storage system nodes as a primary node group for storage of the partitions; and causing a primary copy of the partitions to be stored on the primary node group by a distributed storage system file server such that the location of each partition is known by hashing of the specified key. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A system comprising:
-
a processor; and a memory device coupled to the processor and having code stored therein to cause the processor to perform a method, the method comprising; dividing each of multiple datasets into partitions by hashing a specified key; selecting sets of distributed storage system nodes as primary node groups for storage of the partitions; and causing a primary copy of the partitions of each dataset to be stored on corresponding primary node groups by a distributed storage system file server such that the location of each partition is known by hashing of the specified key. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17)
-
-
18. A method comprising:
-
receiving a query; inspecting the query to identify tables being joined and join conditions; identifying partition groups organized using distributed hash tables on a partition key and their sizes associated with multiple tables corresponding to the join conditions; performing a collocated join plan if an equi-join predicate is on partition keys; constructing a collocated join plan if the partition groups are the same; constructing a partition-wise join plan for bracketed exchange between nodes if the partition groups are the same; and if the size of the partition groups is not the same; constructing one or more logic partitions to match partitions in the groups; mapping partitions in a larger group to partitions in a smaller group; and constructing a partition-wise join plan for bracketed exchange between the nodes in the large and small groups based on the maps. - View Dependent Claims (19, 20)
-
Specification