Computer system and process for transferring multiple high bandwidth streams of data between multiple storage units and multiple applications in a scalable and reliable manner
First Claim
1. In a distributed data storage system comprising a plurality of storage units for storing data and interconnected by a computer network, wherein stored data from a file is divided into segments that are distributed among the plurality of storage units, wherein each segment has an identifier, and wherein two or more copies of each segment are distributed among the plurality of storage units, such that each segment is stored on at least two of the storage units, and wherein the segments of data are distributed nonsequentially among the plurality of storage units, a method for allowing one or more client systems to access segments of data from a file, comprising:
- each client system;
accessing a catalog manager that associates, for each segment of a file, the identifier of the segment with an indication of each of the storage units on which a copy of the segment is stored;
for each segment to be read, selecting one of the at least two storage units storing the segment;
sending over the network a request for the segment directly to the selected storage unit, wherein the request includes the identifier of the requested segment; and
each storage unit;
maintaining data defining information that associates, for each segment stored on the storage unit, the identifier of the segment with the location of the segment in storage;
receiving a request over the computer network from one of the client systems for a segment, wherein the received request includes the identifier of the requested segment;
determining the location of the segment in the storage using the information that associates the identifier of the requested segment with the location of the segment in the storage;
retrieving the requested segment from the storage; and
sending the retrieved segment over the computer network to the client system that requested the segment.
4 Assignments
0 Petitions
Accused Products
Abstract
Multiple applications request data from multiple storage units over a computer network. The data is divided into segments and each segment is distributed randomly on one of several storage units, independent of the storage units on which other segments of the media data are stored. At least one additional copy of each segment also is distributed randomly over the storage units, such that each segment is stored on at least two storage units. This random distribution of multiple copies of segments of data improves both scalability and reliability. When an application requests a selected segment of data, the request is processed by the storage unit with the shortest queue of requests. Random fluctuations in the load applied by multiple applications on multiple storage units are balanced nearly equally over all of the storage units. This combination of techniques results in a system which can transfer multiple, independent high-bandwidth streams of data in a scalable manner in both directions between multiple applications and multiple storage units.
-
Citations
8 Claims
-
1. In a distributed data storage system comprising a plurality of storage units for storing data and interconnected by a computer network, wherein stored data from a file is divided into segments that are distributed among the plurality of storage units, wherein each segment has an identifier, and wherein two or more copies of each segment are distributed among the plurality of storage units, such that each segment is stored on at least two of the storage units, and wherein the segments of data are distributed nonsequentially among the plurality of storage units, a method for allowing one or more client systems to access segments of data from a file, comprising:
-
each client system; accessing a catalog manager that associates, for each segment of a file, the identifier of the segment with an indication of each of the storage units on which a copy of the segment is stored; for each segment to be read, selecting one of the at least two storage units storing the segment; sending over the network a request for the segment directly to the selected storage unit, wherein the request includes the identifier of the requested segment; and each storage unit; maintaining data defining information that associates, for each segment stored on the storage unit, the identifier of the segment with the location of the segment in storage; receiving a request over the computer network from one of the client systems for a segment, wherein the received request includes the identifier of the requested segment; determining the location of the segment in the storage using the information that associates the identifier of the requested segment with the location of the segment in the storage; retrieving the requested segment from the storage; and sending the retrieved segment over the computer network to the client system that requested the segment. - View Dependent Claims (2, 3)
-
-
4. A data storage system, comprising:
-
a plurality of client systems, each client system having a file system through which applications executed on the client system access data; a plurality of storage servers coupled to the plurality of client systems via a computer network, each storage server storing data from files of the file system; wherein the stored data from each of the files is divided into segments that are stored across the plurality of storage servers, with two or more copies of each segment being distributed among the plurality of storage servers, such that each segment is stored on at least two of the storage servers and wherein the segments of data are distributed nonsequentially among the plurality of storage servers; and a catalog manager having storage in that stores information indicating the storage servers on which the segments of files are stored; at least one of the client systems being configured to; access, before reading data from a file, the catalog manager to obtain the information indicating the storage servers on which the segments of the file are stored, and communicate directly with the storage servers to request the segments of the file using the accessed information; and each storage server being configured to; maintain data defining information that associates, for each segment stored on the storage server, the identifier of the segment with the location of the segment in storage; in response to a request from one of the client systems for a segment, wherein the received request includes the identifier of the requested segment, determine the location of the segment in the storage using the information that associates the identifier of the requested segment with the location of the segment in the storage; retrieve the requested segment from the storage; and send the retrieved segment over the computer network to the client system that requested the segment. - View Dependent Claims (5, 6)
-
-
7. A data storage system, comprising:
-
a client system having a file system through which applications executed on the client system access data; a plurality of storage servers coupled to the client system via a computer network, each storage server storing data from files of the file system, wherein the stored data from each of the files is divided into segments that are stored across the plurality of storage servers, with two or more copies of each segment being distributed among the plurality of storage servers, such that each segment is stored on at least two of the storage servers and wherein the segments of data are distributed nonsequentially among the plurality of storage servers; and a catalog manager configured to maintain information indicating the storage servers on which the segments of files are stored; the client system being configured to; access, before reading data from a file, the catalog manager to obtain the information indicating the storage servers on which the segments of the file are stored, and communicate directly with the storage servers to request the segments of the file using the accessed information; and each storage server being configured to; maintain data defining information that associates, for each segment stored on the storage server, the identifier of the segment with the location of the segment in storage; in response to a request over the computer network from the client system for a segment, wherein the received request includes the identifier of the requested segment, determine the location of the segment in the storage using the information that associates the identifier of the requested segment with the location of the segment in the storage; retrieve the requested segment from the storage; and send the retrieved segment over the computer network to the client system. - View Dependent Claims (8)
-
Specification