Distributed data storage
First Claim
Patent Images
1. A method for a device to write data in a data storage system, the method comprising:
- sending a multicast storage query, the multicast storage query indicating a request to store first data in the data storage system;
receiving a plurality of responses to the multicast storage query, wherein each of the plurality of responses is received from a respective storage node of a plurality of storage nodes, and each of the plurality of responses indicates storage node information regarding the respective storage node that sent the response;
determining a respective probability factor for each storage node that sent one of the plurality of responses, wherein each respective probability factor is determined based at least in part on the storage node information included in the response to the multicast storage query that is received from the respective storage node;
selecting a subset of storage nodes from the plurality of storage nodes that sent the plurality of responses, wherein the subset is selected based on the determined probability factors, and at least one storage node with a lowest determined probability factor of the determined probability factors is excluded from the subset;
performing a probabilistic based selection that results in at least two storage nodes from the subset of storage nodes being selected to store the first data, wherein when performing the probabilistic based selection a probability of selecting a given storage node from the subset of storage nodes is determined based on the probability factor determined for the given storage node; and
sending the first data to the at least two storage nodes.
0 Assignments
0 Petitions
Accused Products
Abstract
The present invention relates to a distributed data storage system comprising a plurality of storage nodes. Using unicast and multicast transmission, a server application may write data in the storage system. When writing data, at least two storage nodes are selected based in part on a randomized function, which ensures that data is sufficiently spread to provide efficient and reliable replication of data in case a storage node malfunctions.
147 Citations
19 Claims
-
1. A method for a device to write data in a data storage system, the method comprising:
-
sending a multicast storage query, the multicast storage query indicating a request to store first data in the data storage system; receiving a plurality of responses to the multicast storage query, wherein each of the plurality of responses is received from a respective storage node of a plurality of storage nodes, and each of the plurality of responses indicates storage node information regarding the respective storage node that sent the response; determining a respective probability factor for each storage node that sent one of the plurality of responses, wherein each respective probability factor is determined based at least in part on the storage node information included in the response to the multicast storage query that is received from the respective storage node; selecting a subset of storage nodes from the plurality of storage nodes that sent the plurality of responses, wherein the subset is selected based on the determined probability factors, and at least one storage node with a lowest determined probability factor of the determined probability factors is excluded from the subset; performing a probabilistic based selection that results in at least two storage nodes from the subset of storage nodes being selected to store the first data, wherein when performing the probabilistic based selection a probability of selecting a given storage node from the subset of storage nodes is determined based on the probability factor determined for the given storage node; and sending the first data to the at least two storage nodes. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A server for writing data in a data storage system, the server comprising at least a processor configured to:
-
send a multicast storage query, the multicast storage query indicating a request to store first data in the data storage system; receive a plurality of responses to the multicast storage query, wherein each of the plurality of responses is received from a respective storage node of a plurality of storage nodes, and each of the plurality of responses indicates storage node information regarding the respective storage node that sent the response; determine a respective probability factor for each storage node that sent one of the plurality of responses, wherein each respective probability factor is determined based at least in part on the storage node information included in the response to the multicast storage query that is received from the respective storage node; select a subset of storage nodes from the plurality of storage nodes that sent the plurality of responses, wherein the subset is selected based on the determined probability factors, and at least one storage node with a lowest determined probability factor of the determined probability factors is excluded from the subset; perform a probabilistic based selection that results in at least two storage nodes from the subset of storage nodes being selected to store the first data, wherein when performing the probabilistic based selection a probability of selecting a given storage node from the subset of storage nodes is determined based on the probability factor determined for the given storage node; and sending the first data to the at least two storage nodes. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A device for writing data in a data storage system, the device comprising at least a processor configured to:
-
send a multicast storage query, the multicast storage query indicating a request to store first data in a data storage system; receive a plurality of responses to the multicast storage query, wherein each of the plurality of responses is received from a respective storage node of a plurality of storage nodes, and each of the plurality of responses indicates storage node information regarding the respective storage node that sent the response; determine a respective probability factor for each storage node that sent one of the plurality of responses, wherein each respective probability factor is determined after transmitting the multicast storage query based at least in part on the storage node information included in the response to the multicast storage query that is received from the respective storage node; perform a probabilistic based selection that results in at least two storage nodes from the plurality of responsive storage nodes being selected to store the first data, wherein when performing the probabilistic based selection a probability of selecting a given storage node is determined based on the probability factor determined for the given storage node; send the first data to the at least two storage nodes; and perform a subsequent selection of storage nodes for writing second data by; after selecting two or more storage nodes from a second subset of storage nodes for storing the second data based on probability factors of storage nodes in the second subset, determining a level of geographic diversity between at least two of the two or more selected storage nodes lack a requisite level of geographical diversity, removing at least one of the at least two selected storage nodes that lack the requisite level of geographic diversity from the second subset, and re-performing the selection from the second subset with the at least one of the at least two storage nodes removed. - View Dependent Claims (16, 17, 18, 19)
-
Specification