Computer system and process for transferring streams of data between multiple storage units and multiple applications in a scalable and reliable manner
First Claim
1. A data storage system, comprising:
- a plurality of storage units;
a plurality of client systems, wherein each client system has a file system through which an application executed on the client system accesses data;
a network interconnecting the plurality of storage units and the plurality of client systems;
wherein the file system of each client system accesses data in one or more files using the plurality of storage units, wherein a file has a name and includes segments of data and redundancy information for each segment, wherein the redundancy information for a segment is one or more copies of the segment;
wherein the application executed on the client system accesses data in a file using a request to the file system indicating the name of the file;
wherein client code accessed by the file system in each client system includes means for writing data to a file comprising;
means for selecting, for each segment of the data, at least two of the storage units for storing the segment;
means for initiating a request to store each segment of the data to each of the at least two storage units selected for the segment, wherein the request includes an identifier of the segment; and
means for locally accessing information indicative of the at least two storage units on which each segment of a file is stored;
wherein each storage unit identifies a location on the storage unit for storing a received segment of data, stores the received segment of data at the identified location and maintains information associating the identifier of the segment of data with a location of each segment of data on the storage unit;
wherein client code accessed by the file system in each client system includes means for reading data from a file comprising;
means for selecting, for each segment of the requested data, one of the storage units on which the segment is stored using the locally accessed information indicative of the at least two storage units on which each segment of a file is stored; and
means for reading each segment of the requested data from the selected storage unit for the segment, including sending a request, for each segment, to the storage unit selected for the segment including the identifier of the segment; and
means for providing the read data to the application; and
wherein each storage unit retrieves a requested segment of data from the storage unit using the information associating the identifier of the segment of data with a location of each segment of data on the storage unit to obtain the location of the segment of data on the storage unit.
7 Assignments
0 Petitions
Accused Products
Abstract
Multiple applications request data from multiple storage units over a computer network. The data is divided into segments and each segment is distributed randomly on one of several storage units, independent of the storage units on which other segments of the media data are stored. Redundancy information corresponding to each segment also is distributed randomly over the storage units. The redundancy information for a segment may be a copy of the segment, such that each segment is stored on at least two storage units. The redundancy information also may be based on two or more segments. This random distribution of segments of data and corresponding redundancy information improves both scalability and reliability. When a storage unit fails, its load is distributed evenly over to remaining storage units and its lost data may be recovered because of the redundancy information. When an application requests a selected segment of data, the request may be processed by the storage unit with the shortest queue of requests. Random fluctuations in the load applied by multiple applications on multiple storage units are balanced nearly equally over all of the storage units. Small data files also may be stored on storage units that combine small files into larger segments of data using a log structured file system. This combination of techniques results in a system which can transfer both multiple, independent high-bandwidth streams of data and small data files in a scalable manner in both directions between multiple applications and multiple storage units.
180 Citations
3 Claims
-
1. A data storage system, comprising:
-
a plurality of storage units;
a plurality of client systems, wherein each client system has a file system through which an application executed on the client system accesses data;
a network interconnecting the plurality of storage units and the plurality of client systems;
wherein the file system of each client system accesses data in one or more files using the plurality of storage units, wherein a file has a name and includes segments of data and redundancy information for each segment, wherein the redundancy information for a segment is one or more copies of the segment;
wherein the application executed on the client system accesses data in a file using a request to the file system indicating the name of the file;
wherein client code accessed by the file system in each client system includes means for writing data to a file comprising;
means for selecting, for each segment of the data, at least two of the storage units for storing the segment;
means for initiating a request to store each segment of the data to each of the at least two storage units selected for the segment, wherein the request includes an identifier of the segment; and
means for locally accessing information indicative of the at least two storage units on which each segment of a file is stored;
wherein each storage unit identifies a location on the storage unit for storing a received segment of data, stores the received segment of data at the identified location and maintains information associating the identifier of the segment of data with a location of each segment of data on the storage unit;
wherein client code accessed by the file system in each client system includes means for reading data from a file comprising;
means for selecting, for each segment of the requested data, one of the storage units on which the segment is stored using the locally accessed information indicative of the at least two storage units on which each segment of a file is stored; and
means for reading each segment of the requested data from the selected storage unit for the segment, including sending a request, for each segment, to the storage unit selected for the segment including the identifier of the segment; and
means for providing the read data to the application; and
wherein each storage unit retrieves a requested segment of data from the storage unit using the information associating the identifier of the segment of data with a location of each segment of data on the storage unit to obtain the location of the segment of data on the storage unit. - View Dependent Claims (2, 3)
-
Specification