Intelligent selection of replication node for file data blocks in GPFS-SNC
First Claim
1. A method, in a data processing system, for replicating writing of a file with striping, the method comprising:
- writing a file at an owner node within a plurality of nodes in a data processing system;
dividing the file into a plurality of file chunks;
performing a hash function on each file chunk to form a hash value;
looking up each hash value in a hash repository;
identifying at least one replication node within the plurality of nodes having a duplicate copy of a respective file chunk within the plurality of file chunks responsive to a hash value of the respective file chunk having an entry in the hash repository;
adding an entry to a duplicate set for each file chunk for which a duplicate copy is already stored at a replication node and adding an entry to a no-duplicate set for each file chunk for which no duplicate copy is stored at a replication node;
selecting a plurality of replication nodes, including the at least one replication node, for the plurality of file chunks based on the duplicate set and the no-duplicate set; and
replicating the file at the plurality of replication nodes based on the selection of the plurality of replication nodes, wherein replicating the file at the plurality of replication nodes comprises transferring pointer information for each file chunk having an entry in the duplication list to a corresponding selected replication node, wherein the pointer information points to the copy of the file chunk at the corresponding selected replication node.
1 Assignment
0 Petitions
Accused Products
Abstract
A mechanism is provided in a data processing system for replicating writing of a file with striping. The mechanism writes a file at an owner node within a plurality of nodes in a data processing system. The mechanism divides the file into a plurality of file chunks. The mechanism identifies at least one replication node within the plurality of nodes having a duplicate copy of a respective file chunk within the plurality of file chunks. The mechanism selects a plurality of replication nodes for the plurality of file chunks based on identification at least one replication node within the plurality of nodes having a duplicate copy of a respective file chunk and replicates the file at the plurality of replication nodes based on the selection of the plurality of replication nodes.
-
Citations
25 Claims
-
1. A method, in a data processing system, for replicating writing of a file with striping, the method comprising:
-
writing a file at an owner node within a plurality of nodes in a data processing system; dividing the file into a plurality of file chunks; performing a hash function on each file chunk to form a hash value; looking up each hash value in a hash repository; identifying at least one replication node within the plurality of nodes having a duplicate copy of a respective file chunk within the plurality of file chunks responsive to a hash value of the respective file chunk having an entry in the hash repository; adding an entry to a duplicate set for each file chunk for which a duplicate copy is already stored at a replication node and adding an entry to a no-duplicate set for each file chunk for which no duplicate copy is stored at a replication node; selecting a plurality of replication nodes, including the at least one replication node, for the plurality of file chunks based on the duplicate set and the no-duplicate set; and replicating the file at the plurality of replication nodes based on the selection of the plurality of replication nodes, wherein replicating the file at the plurality of replication nodes comprises transferring pointer information for each file chunk having an entry in the duplication list to a corresponding selected replication node, wherein the pointer information points to the copy of the file chunk at the corresponding selected replication node. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A computer program product comprising a non-transitory computer readable storage medium having a computer readable program stored therein, wherein the computer readable program, when executed on a computing device, causes the computing device to:
-
write a file at an owner node within a plurality of nodes in a data processing system; divide the file into a plurality of file chunks; perform a hash function on each file chunk to form a hash value; look up each hash value in a hash repository; identify at least one replication node within the plurality of nodes having a duplicate copy of a respective file chunk within the plurality of file chunks responsive to a hash value of the respective file chunk having an entry in the hash repository; add an entry to a duplicate set for each file chunk for which a duplicate copy is already stored at a replication node and add an entry to a no-duplicate set for each file chunk for which no duplicate copy is stored at a replication node; select a plurality of replication nodes, including the at least one replication node, for the plurality of file chunks based on the duplicate set and the no-duplicate set; and replicate the file at the plurality of replication nodes based on the selection of the plurality of replication nodes, wherein replicating the file at the plurality of replication nodes comprises transferring pointer information for each file chunk having an entry in the duplication list to a corresponding selected replication node wherein the pointer information points to the copy of the file chunk at the corresponding selected replication node. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. An apparatus, comprising:
-
a processor; and a memory coupled to the processor, wherein the memory comprises instructions which, when executed by the processor, cause the processor to; write a file at an owner node within a plurality of nodes in a data processing system; divide the file into a plurality of file chunks; perform a hash function on each file chunk to form a hash value; look up each hash value in a hash repository; identify at least one replication node within the plurality of nodes having a duplicate copy of a respective file chunk within the plurality of file chunks responsive to a hash value of the given file chunk having an entry in the hash repository responsive to a hash value of the respective file chunk having an entry in hash repository; add an entry to a duplicate set for each file chunk for which a duplicate copy is already stored at a replication node and add an entry to a no-duplicate set for each file chunk for which no duplicate copy is stored at a replication node; select a plurality of replication nodes, including the at least one replication node, for the plurality of file chunks based on the duplicate set and the no-duplicate set; and replicate the file at the plurality of replication nodes based on the selection of the plurality of replication nodes, wherein replicating the file at the plurality of replication nodes comprises transferring pointer information for each file chunk having an entry in the duplication list to a corresponding selected replication node, wherein the pointer information points to the copy of the file chunk at the corresponding selected replication node. - View Dependent Claims (20, 21, 22, 23, 24, 25)
-
Specification