System and methods for providing a distributed file system utilizing metadata to track information about data stored throughout the system
First Claim
Patent Images
1. A distributed file storage system comprising:
- a plurality of storage units configured to communicate with each other;
said plurality of storage units including;
a first storage unit including a storage disk and a processor;
a second storage unit including a storage disk and a processor;
a third storage unit including a storage disk and a processor; and
a fourth storage unit including a storage disk and a processor;
a file stored on the distributed file storage system;
a first file portion of the file comprising a first set of file data stored in the first storage unit;
a second file portion of the file comprising a second set of file data stored in the second storage unit, wherein the second set of file data is different from the first set of file data;
a first metadata to identify in part the location of the file, the first metadata stored on the first storage unit, the second storage unit, the third storage unit, and the fourth storage unit;
a second metadata, different at least in part from the first metadata, to supplement the first metadata in identifying the location of the file, the second metadata stored on at least one, but not all, of the first storage unit, the second storage unit, the third storage unit, and the fourth storage unit;
a switch in communication with the plurality of storage units, the switch configured to receive a read request for the file stored on the distributed file storage system and to send the read request to one of the plurality of storage units wherein each of the plurality of storage units is operable to monitor in real time a pattern of access to the file, a latency to access each copy of the file, and content included in the file, such that a block cache module will perform K packet read aheads, where K is calculated using a current read rate and a current latency of an access link; and
each of the plurality of storage units is configured to use the first metadata to process a read request on behalf of the distributed file storage system, wherein the distributed file storage system is arranged for dynamically determining at least one copy of the file to be replicated and dynamically determining a quantity of the plurality of storage units to store each replicated copy of the file based at least in part on the real time monitoring of the pattern of access to the file, the latency to access each copy of the file, and content included in the file.
22 Assignments
0 Petitions
Accused Products
Abstract
The intelligent distributed file system enables the storing of file data among a plurality of smart storage units which are accessed as a single file system. The intelligent distributed file system utilizes a metadata data structure to track and manage detailed information about each file, including, for example, the device and block locations of the file'"'"'s data blocks, to permit different levels of replication and/or redundancy within a single file system, to facilitate the change of redundancy parameters, to provide high-level protection for metadata, to replicate and move data in real-time, and so forth.
-
Citations
40 Claims
-
1. A distributed file storage system comprising:
-
a plurality of storage units configured to communicate with each other; said plurality of storage units including; a first storage unit including a storage disk and a processor; a second storage unit including a storage disk and a processor; a third storage unit including a storage disk and a processor; and a fourth storage unit including a storage disk and a processor; a file stored on the distributed file storage system; a first file portion of the file comprising a first set of file data stored in the first storage unit; a second file portion of the file comprising a second set of file data stored in the second storage unit, wherein the second set of file data is different from the first set of file data; a first metadata to identify in part the location of the file, the first metadata stored on the first storage unit, the second storage unit, the third storage unit, and the fourth storage unit; a second metadata, different at least in part from the first metadata, to supplement the first metadata in identifying the location of the file, the second metadata stored on at least one, but not all, of the first storage unit, the second storage unit, the third storage unit, and the fourth storage unit; a switch in communication with the plurality of storage units, the switch configured to receive a read request for the file stored on the distributed file storage system and to send the read request to one of the plurality of storage units wherein each of the plurality of storage units is operable to monitor in real time a pattern of access to the file, a latency to access each copy of the file, and content included in the file, such that a block cache module will perform K packet read aheads, where K is calculated using a current read rate and a current latency of an access link; and each of the plurality of storage units is configured to use the first metadata to process a read request on behalf of the distributed file storage system, wherein the distributed file storage system is arranged for dynamically determining at least one copy of the file to be replicated and dynamically determining a quantity of the plurality of storage units to store each replicated copy of the file based at least in part on the real time monitoring of the pattern of access to the file, the latency to access each copy of the file, and content included in the file. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
-
-
18. A file based distributed storage system comprising:
-
multiple storage units configured to communicate with each other, each comprising a storage device, a processor, and executable software stored on the storage device, the executable software configured to process file read and write requests on behalf of the distributed storage system; a switch in communication with the multiple storage units, the switch configured to receive a read request for a file stored on the distributed storage system and to send the read request to one of the multiple storage units wherein each of multiple storage units is operable to monitor in real time a pattern of access to the file, a latency to access each copy of the file, and content included in the file, such that a block cache module will perform K racket read aheads, where K is calculated using a current read rate and a current latency of an access link; and location metadata necessary to identify the location of a plurality of files stored on the storage system, wherein the location metadata is distributed across a subset of the multiple storage units, each storage unit in the subset storing a portion of the location metadata that is different at least in part from portions stored on other storage units in the subset. - View Dependent Claims (19, 20, 21, 22, 23, 24, 25, 26)
-
-
27. A distributed file storage system comprising:
-
a first storage unit comprising; a first storage device configured to store data; a first processor in communication with the first storage device configured to execute at least the following software modules stored on the first storage device; a first data allocation manager module configured to locate, in response to read requests, storage locations in the distributed file storage system corresponding to user files, wherein for each user file there is a separate address file comprising storage locations for file content data, the separate address file stored on at least one, but not all, storage units in the distributed file storage system; a first data cache module configured to cache data associated with read requests; a first local data manager module configured-to manage data at storage locations local to the first storage unit; a first remote data manager module configured to communicate with other storage units that store data at storage locations remote to the first storage unit; and a first storage device module configured to operate the first storage device; a second storage unit comprising; a second storage device configured to store data; a second processor in communication with the second storage device configured to execute at least the following software modules stored on the first storage device; a second data allocation manager module configured to locate, in response to read requests, storage locations in the distributed file storage system corresponding to user files, wherein for each user file there is a separate address file comprising storage locations for file content data, the separate address file stored on at least one, but not all, storage units in the distributed file storage system; a second data cache module configured to cache data associated with read requests; a second local data manager module configured to manage data at storage locations local to the second storage unit; a second remote data manager module configured to communicate with other storage units that store data at storage locations remote to the second storage unit; and a second storage device module configured to operate the second storage device; a third storage unit comprising; a third storage device configured to store data; a third processor in communication with the third storage device configured to execute at least the following software modules stored on the first storage device; a third data allocation manager module configured to locate, in response to read requests, storage locations in the distributed file storage system corresponding to user files, wherein for each user file there is a separate address file comprising storage locations for file content data, the separate address file stored on at least one, but not all, storage units in the distributed file storage system; a third data cache module configured to cache data associated with read requests; a third local data manager module configured to manage data at storage locations local to the third storage unit; a third remote data manager module configured to communicate with other storage units that store data at storage locations remote to the third storage unit; and a third storage device module configured to operate the third storage device; wherein each of first storage unit, the second storage unit, and the third storage unit are configured to respond to read requests on behalf of the system; and a switch in communication with at least the first, second and third storage units, the switch configured to receive a read request for a file stored on the distributed file storage system and to send the read request to at least one of the first, second and third storage units wherein each of storage units is operable to monitor in real time a pattern of access to the file, a latency to access each copy of the file, and content included in the file, such that a block cache module will perform K packet read aheads, where K is calculated using a current read rate and a current latency of an access link.
-
-
28. A distributed file storage system comprising:
multiple storage units in communication with each other, each configured to process file read requests on behalf of the distributed file storage system, and each comprising; a storage device configured to store data; a processor configured to execute at least the following software modules stored on the storage device; a data allocation manager module configured to locate, in response to read requests, storage locations in the distributed file storage system corresponding to user files, wherein for each user file there is a separate address file comprising storage locations for file content data, the separate address file stored on at least one, but not all, of the multiple storage units; a data cache module configured to cache data associated with read requests; a local data manager module configured to manage data at local storage locations; a remote data manager module configured to communicate with other storage units that store data at storage locations remote to the storage unit; and a storage device module configured to operate the storage device and a switch in communication with the multiple storage units, the switch configured to receive a read request for a file stored on the distributed file storage system and to send the read request to one of the multiple storage units wherein each of multiple storage units is operable to monitor in real time a pattern of access to the file, a latency to access each copy of the file, and content included in the file, such that a block cache module will perform K packet read aheads, where K is calculated using a current read rate and a current latency of an access link. - View Dependent Claims (29)
-
30. A distributed file storage system comprising:
-
multiple storage units configured to communicate with each other, each storage unit comprising a storage device, a processor, and at least one executable software module stored on the storage device; the software module configured to; write files in a distributed file system; in response to a request to write a file, initiate the storage of the file'"'"'s content data into the distributed storage system and to track data location information for locating each portion of the file'"'"'s content data in a data location file; and initiate the storage of at least one portion of the file on a different storage unit than other portions of the file; and wherein the data location information for the file comprises data locations on different storage units; and a switch in communication with the multiple storage units, the switch configured to receive a read request for a file stored on the distributed file storage system and to send the read request to one of the multiple storage units wherein each of multiple storage units is operable to monitor in real time a pattern of access to the file, a latency to access each copy of the file, and content included in the file, such that a block cache module will perform K racket read aheads, where K is calculated using a current read rate and a current latency of an access link. - View Dependent Claims (31, 32, 33, 34, 35, 36, 37, 38, 39, 40)
-
Specification