System and method for distributing and accessing files in a distributed storage system
First Claim
1. A system configured for distributed storage, the system comprising:
- a plurality of storage nodes configured to store a plurality of files, wherein each storage node is configured to store at least a portion of one of the files; and
one or more client machines, wherein each client machine is configured to access the plurality of files stored across the plurality of storage nodes,wherein each storage node and each client machine executes a software application to independently compute a mapping of the plurality of files to the plurality of storage nodes,wherein, based on the mapping, any storage node and any client machine is able to access any portion of any file stored on any storage node, andwherein, in response to a change to the plurality of storage nodes, each storage node and each client machine re-executes the software application to independently compute a new mapping of the plurality of files to the plurality of storage nodes, the change comprising;
a new storage node being added to the plurality of storage nodes, in which case at least a portion of one of the files is transferred to the new storage node for storage;
a storage node being removed from the plurality of storage nodes, in which case all portions of all files stored on the storage node being removed are transferred to one or more remaining storage nodes for storage; and
a new storage node being swapped for one storage node in the plurality of storage nodes in which case all portions of all files stored on the one storage node are transferred to the new storage node for storage.
1 Assignment
0 Petitions
Accused Products
Abstract
A system and method for distributing and accessing files in a distributed storage system uses an ordered list of the storage nodes in the system to determine the storage node on which a file is stored. The distributed storage system includes a cluster of storage nodes and may also include one or more client nodes that participate in the system as storage resources. Each node (client and storage) stores an ordered list of the storage nodes in the system, allowing any of the nodes to access the file. The list is updated whenever a new storage node is added to the system, an existing storage node is removed from the system, or a new storage node is swapped with an existing storage node. Each one of the nodes may independently compute a new mapping of files to the storage nodes when the ordered list is changed.
-
Citations
19 Claims
-
1. A system configured for distributed storage, the system comprising:
-
a plurality of storage nodes configured to store a plurality of files, wherein each storage node is configured to store at least a portion of one of the files; and one or more client machines, wherein each client machine is configured to access the plurality of files stored across the plurality of storage nodes, wherein each storage node and each client machine executes a software application to independently compute a mapping of the plurality of files to the plurality of storage nodes, wherein, based on the mapping, any storage node and any client machine is able to access any portion of any file stored on any storage node, and wherein, in response to a change to the plurality of storage nodes, each storage node and each client machine re-executes the software application to independently compute a new mapping of the plurality of files to the plurality of storage nodes, the change comprising; a new storage node being added to the plurality of storage nodes, in which case at least a portion of one of the files is transferred to the new storage node for storage; a storage node being removed from the plurality of storage nodes, in which case all portions of all files stored on the storage node being removed are transferred to one or more remaining storage nodes for storage; and a new storage node being swapped for one storage node in the plurality of storage nodes in which case all portions of all files stored on the one storage node are transferred to the new storage node for storage. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A computer-implemented method for computing a mapping of plurality of files to a plurality of storage nodes in a distributed storage system, the method comprising:
-
placing a plurality of buckets in a first partition, wherein each bucket stores at least a portion of at least one file in the plurality of files; creating at least one new partition; for each new partition; computing a number of buckets to place in the new partition; for the first partition; determining a number of buckets to remove from the first partition and add to the new partition, removing the number of buckets from the first partition, and adding the number of buckets removed from the first partition to the new partition, and for each other existing partition; determining a number of buckets to remove from the existing partition and add to the new partition, removing the number of buckets from the existing partition, and adding the number of buckets removed from the existing partition to the new partition, wherein the number of buckets removed from the first partition and each of the other existing partitions equals the number of buckets being placed in the new partition, and wherein each partition is associated with at least one storage node and the mapping of the plurality of files to the plurality of storage nodes comprises a first mapping of the first partition and the at least one new partition to the plurality of storage nodes and a second mapping of the plurality of buckets to the first partition and the at least one new partition. - View Dependent Claims (7, 8, 9, 10, 11, 18)
-
-
12. A computer-readable storage medium storing instructions that, when executed by a processing unit, cause the processing unit to map of plurality of files to a plurality of storage nodes in a distributed storage system, by performing the steps of:
-
placing a plurality of buckets in a first partition, wherein each bucket stores at least a portion of at least one file in the plurality of files; creating at least one new partition; for each new partition; computing a number of buckets to place in the new partition; for the first partition; determining a number of buckets to remove from the first partition and add to the new partition, removing the number of buckets from the first partition, and adding the number of buckets removed from the first partition to the new partition, and for each other existing partition; determining a number of buckets to remove from the existing partition and add to the new partition, removing the number of buckets from the existing partition, and adding the number of buckets removed from the existing partition to the new partition, wherein the number of buckets removed from the first partition and each of the other existing partitions equals the number of buckets being placed in the new partition, and wherein each partition is associated with at least one storage node and the mapping of the plurality of files to the plurality of storage nodes comprises a first mapping of the first partition and the at least one new partition to the plurality of storage nodes and a second mapping of the plurality of buckets to the first partition and the at least one new partition. - View Dependent Claims (13, 14, 15, 16, 17, 19)
-
Specification