System and method for distributing and accessing files in a distributed storage system
First Claim
1. A computer-implemented method for accessing files that are distributed across multiple storage nodes in a distributed storage system, the method comprising:
- receiving an access request identifying a first distributed file having an associated filename;
determining, based on the filename, a bucket identifier corresponding to a first one of a plurality of buckets that contains the first distributed file, comprising;
upon determining that the filename is not associated with a predefined hash code;
computing a hash code using the filename; and
extracting the bucket identifier from a bit field of the computed hash code,wherein each of the plurality of buckets maps to a respective one or more of a plurality of partitions containing the distributed files, andwherein each of the plurality of partitions is stored across a respective one or more of the multiple storage nodes in the distributed storage system;
determining a first one or more partitions from the plurality of partitions that the first bucket maps to, wherein at least one of the first one or more partitions contains the first distributed file;
determining, based on a mapping of the first one or more partitions to the one or more storage nodes, a first storage node that stores the first distributed file; and
dispatching a request to the first storage node to access the first distributed file.
1 Assignment
0 Petitions
Accused Products
Abstract
A system and method for distributing and accessing files in a distributed storage system uses an ordered list of the storage nodes in the system to determine the storage node on which a file is stored. The distributed storage system includes a cluster of storage nodes and may also include one or more client nodes that participate in the system as storage resources. Each node (client and storage) stores an ordered list of the storage nodes in the system, allowing any of the nodes to access the file. The list is updated whenever a new storage node is added to the system, an existing storage node is removed from the system, or a new storage node is swapped with an existing storage node. Each one of the nodes may independently compute a new mapping of files to the storage nodes when the ordered list is changed.
-
Citations
18 Claims
-
1. A computer-implemented method for accessing files that are distributed across multiple storage nodes in a distributed storage system, the method comprising:
-
receiving an access request identifying a first distributed file having an associated filename; determining, based on the filename, a bucket identifier corresponding to a first one of a plurality of buckets that contains the first distributed file, comprising; upon determining that the filename is not associated with a predefined hash code; computing a hash code using the filename; and extracting the bucket identifier from a bit field of the computed hash code, wherein each of the plurality of buckets maps to a respective one or more of a plurality of partitions containing the distributed files, and wherein each of the plurality of partitions is stored across a respective one or more of the multiple storage nodes in the distributed storage system; determining a first one or more partitions from the plurality of partitions that the first bucket maps to, wherein at least one of the first one or more partitions contains the first distributed file; determining, based on a mapping of the first one or more partitions to the one or more storage nodes, a first storage node that stores the first distributed file; and dispatching a request to the first storage node to access the first distributed file. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A system for accessing files that are distributed across multiple storage nodes in a distributed storage system, the system comprising:
-
a first storage node of the multiple storage nodes, the first storage node storing a first distributed file; and a client node that is coupled to the multiple storage nodes and configured to; receive an access request identifying a first distributed file having an associated filename; determine, based on the filename, a bucket identifier corresponding to a first one of a plurality of buckets that contains the first distributed file, comprising; upon determining that the filename is not associated with a predefined hash code; compute a hash code using the filename; and
extract the bucket identifier from a bit field of the computed hash code,wherein each of the plurality of buckets maps to a respective one or more of a plurality of partitions containing the distributed files, and wherein each of the plurality of partitions is stored across a respective one or more of the multiple storage nodes in the distributed storage system; determine a first one or more partitions from the plurality of partitions that the first bucket maps to, wherein at least one of the first one or more partitions contains the first distributed file; determine, based on a mapping of the first one or more partitions to the one or more storage nodes, a first storage node that stores the first distributed file; and dispatch a request to the first storage node to access the first distributed file. - View Dependent Claims (10, 11, 12, 13, 14, 15)
-
-
16. A computer-readable storage medium storing instructions that, when executed by a processor, cause a computer system to access files that are distributed across multiple storage nodes in a distributed storage system, by performing the steps of:
-
receiving an access request identifying a first distributed file having an associated filename; determining, based on the filename, a bucket identifier corresponding to a first one of a plurality of buckets that contains the first distributed file, comprising; upon determining that the filename is not associated with a predefined hash code; computing a hash code using the filename; and extracting the bucket identifier from a bit field of the computed hash code, wherein each of the plurality of buckets maps to a respective one or more of a plurality of partitions containing the distributed files, and wherein each of the plurality of partitions is stored across a respective one or more of the multiple storage nodes in the distributed storage system; determining a first one or more partitions from the plurality of partitions that the first bucket maps to, wherein at least one of the first one or more partitions contains the first distributed file; determining, based on a mapping of the first one or more partitions to the one or more storage nodes, a first storage node that stores the first distributed file; and dispatching a request to the first storage node to access the first distributed file. - View Dependent Claims (17, 18)
-
Specification