Computer system and process for transferring multiple high bandwidth streams of data between multiple storage units and multiple applications in a scalable and reliable manner
First Claim
1. A distributed data storage system for allowing one or more client systems to access data, comprising:
- a plurality of independent storage units for storing the data;
wherein the data is stored on the plurality of storage units in files, wherein each file includes segments of data and redundancy information for each segment, wherein each segment has an identifier, and wherein the redundancy information for each segment includes at least one copy of the segment, and wherein, for each file, the segments and the redundancy information for each segment are distributed among the plurality of storage units;
wherein each storage unit comprises means for maintaining information associating the identifier of each segment stored on the storage unit with the location of each segment on the storage unit;
wherein the distributed data storage system includes means for maintaining information associating the identifier of each segment with indications of the storage units from the plurality of storage units on which each segment and the redundancy information for the segment is stored;
wherein the distributed data storage system includes means for identifying one of the storage units to be removed; and
wherein the distributed data storage system includes means, operative in response to an identification of one of the storage units to be removed, for redistributing data on the identified storage unit to other storage units, including means for determining, for each segment of data stored on the identified storage unit, another storage unit on which the segment is stored;
means for sending, for each segment of data stored on the identified storage unit, a request to the other storage unit on which the segment is stored to send a copy of the segment to a different storage unit, wherein each request includes the identifier of the segment.
6 Assignments
0 Petitions
Accused Products
Abstract
Multiple applications request data from multiple storage units over a computer network. The data is divided into segments and each segment is distributed randomly on one of several storage units, independent of the storage units on which other segments of the media data are stored. At least one additional copy of each segment also is distributed randomly over the storage units, such that each segment is stored on at least two storage units. This random distribution of multiple copies of segments of data improves both scalability and reliability. When an application requests a selected segment of data, the request is processed by the storage unit with the shortest queue of requests. Random fluctuations in the load applied by multiple applications on multiple storage units are balanced nearly equally over all of the storage units. This combination of techniques results in a system which can transfer multiple, independent high-bandwidth streams of data in a scalable manner in both directions between multiple applications and multiple storage units.
127 Citations
4 Claims
-
1. A distributed data storage system for allowing one or more client systems to access data, comprising:
-
a plurality of independent storage units for storing the data;
wherein the data is stored on the plurality of storage units in files, wherein each file includes segments of data and redundancy information for each segment, wherein each segment has an identifier, and wherein the redundancy information for each segment includes at least one copy of the segment, and wherein, for each file, the segments and the redundancy information for each segment are distributed among the plurality of storage units;
wherein each storage unit comprises means for maintaining information associating the identifier of each segment stored on the storage unit with the location of each segment on the storage unit;
wherein the distributed data storage system includes means for maintaining information associating the identifier of each segment with indications of the storage units from the plurality of storage units on which each segment and the redundancy information for the segment is stored;
wherein the distributed data storage system includes means for identifying one of the storage units to be removed; and
wherein the distributed data storage system includes means, operative in response to an identification of one of the storage units to be removed, for redistributing data on the identified storage unit to other storage units, including means for determining, for each segment of data stored on the identified storage unit, another storage unit on which the segment is stored;
means for sending, for each segment of data stored on the identified storage unit, a request to the other storage unit on which the segment is stored to send a copy of the segment to a different storage unit, wherein each request includes the identifier of the segment. - View Dependent Claims (2)
-
-
3. A process for recovering data in a distributed data storage system comprising a plurality of storage units for storing the data, wherein copies of segments of the data stored on the storage units are randomly distributed among the plurality of storage units, the process being performed when failure of one of the storage units is detected, comprising the steps of:
-
identifying segments of which copies were stored on the failed storage unit;
identifying storage units on which another copy of the identified segments was stored; and
randomly distributing a copy of the identified copies among the plurality of storage units.
-
-
4. A process for combining streams of video data to produce composited video data for storing the composited video data in a distributed system comprising a plurality of storage units for storing video data, wherein copies of segments of the video data stored on the storage units are randomly distributed among the plurality of storage units, comprising the steps of:
-
reading the streams of video data from the plurality of storage units;
combining the streams of video data to produce the composited video data;
dividing the composited video data into segments; and
randomly distributing copies of the segments of the composited video data among the plurality of storage units.
-
Specification