Distributed data storage
First Claim
Patent Images
1. A method for writing data to a data storage system comprising a plurality of data storage nodes, the method being employed in a server running an application which accesses data in the data storage system via a communication network, and comprising:
- the server sending a multicast storage query to a plurality of said storage nodes via the communication network, the multicast storage query comprising a data identifier for data to be stored;
the server receiving a plurality of responses from a number of said storage nodes via the communication network, the responses including storage node information respectively relating to each storage node;
the server selecting a subset of the responding storage nodes that satisfy a primary criteria based on geographical separation for further evaluation, wherein at least one responding storage node is removed from the selection process as lacking a requisite level of geographical separation;
for each storage node in the subset, the server determining a respective probability factor, wherein each respective probability factor is determined based at least in part on the respective storage node information included in a respective response;
the server randomly selecting at least two storage nodes from the subset of storage nodes that satisfied the primary criteria, wherein the probability of a respective storage node being randomly selected depends on its respective probability factor; and
the server sending the data to the at least two selected storage nodes via the communication network.
3 Assignments
0 Petitions
Accused Products
Abstract
The present invention relates to a distributed data storage system comprising a plurality of storage nodes. Using unicast and multicast transmission, a server application may write data in the storage system. When writing data, at least two storage nodes are selected based in part on a randomized function, which ensures that data is sufficiently spread to provide efficient and reliable replication of data in case a storage node malfunctions.
-
Citations
18 Claims
-
1. A method for writing data to a data storage system comprising a plurality of data storage nodes, the method being employed in a server running an application which accesses data in the data storage system via a communication network, and comprising:
-
the server sending a multicast storage query to a plurality of said storage nodes via the communication network, the multicast storage query comprising a data identifier for data to be stored; the server receiving a plurality of responses from a number of said storage nodes via the communication network, the responses including storage node information respectively relating to each storage node; the server selecting a subset of the responding storage nodes that satisfy a primary criteria based on geographical separation for further evaluation, wherein at least one responding storage node is removed from the selection process as lacking a requisite level of geographical separation; for each storage node in the subset, the server determining a respective probability factor, wherein each respective probability factor is determined based at least in part on the respective storage node information included in a respective response; the server randomly selecting at least two storage nodes from the subset of storage nodes that satisfied the primary criteria, wherein the probability of a respective storage node being randomly selected depends on its respective probability factor; and the server sending the data to the at least two selected storage nodes via the communication network. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A server adapted for writing data to a data storage system comprising a plurality of data storage nodes by sending and receiving messages over a communication network, the server comprising at least memory and a processor configured to:
-
send a multicast storage query to a plurality of said storage nodes via the communication network, the multicast storage query comprising a data identifier for data to be stored; receive a plurality of responses from a subset of said storage nodes, the responses including storage node information respectively relating to each storage node; identify a number of storage groups that have a requisite geographic diversity and select a given storage node in the subset from each identified storage group for further evaluation; determine a respective probability factor for each storage node selected from one of the storage groups, wherein each respective probability factor is determined based at least in part on the respective storage node information included in a respective response; randomly select at least two storage nodes from the storage nodes selected from the storage groups, wherein the probability of a respective storage node being randomly selected depends on its respective probability factor; and send the data to the at least two selected storage nodes via the communication network. - View Dependent Claims (15, 16)
-
-
12. A method for writing data to a data storage system comprising a plurality of data storage nodes, the method being employed in a server running an application which accesses data in the data storage system via a communication network, the server comprising:
-
the server sending a multicast storage query to a plurality of said storage nodes via the communication network, the multicast storage query comprising a data identifier for data to be stored; the server receiving a plurality of responses from a subset of said storage nodes via the communication network, the responses including storage node information respectively relating to each storage node; the server removing at least one storage node in the subset from the selection process based on the at least one storage node lacking a desired level of geographic diversity as compared to remaining storage nodes in the subset; the server determining, based on an algorithm, for each of the remaining storage nodes in the subset, a corresponding probability factor based at least in part on the corresponding storage node information included the corresponding response; the server randomly selecting at least two storage nodes from the remaining storage nodes with the desired levels of geographic diversity, wherein the probability of a storage node being randomly selected depends on its respective probability factor; and the server sending the data to the at least two selected storage nodes via the communication network. - View Dependent Claims (13, 14, 17, 18)
-
Specification