Generating probalistic data structures in gossip protocols
First Claim
1. A system, comprising:
- a distributed computing system comprising a plurality of nodes, each implemented by one or more processors and associated memory, wherein;
each node stores a data set of data items and uses a gossip protocol to synchronize its data set with data sets of other nodes of the plurality of nodes; and
during each round of the gossip protocol, each node is configured to;
select another node of the plurality of nodes to gossip with;
determine a set of hash functions to be used in the round to generate a space-efficient probabilistic data structure (SEPDS) that reflects a content of its data set, wherein the set of hash functions changes from round to round;
generate the SEPDS from its data set using the set of hash functions; and
perform a synchronization operation with the other node using the SEPDS.
1 Assignment
0 Petitions
Accused Products
Abstract
Systems and methods are disclosed to implement a gossip protocol to synchronize data in among nodes of a distributed computing system. During a round of the gossip protocol, a first node generates a space-efficient probabilistic data structure (SEPDS) from its data set. The SEPDS is generated using a set of hash functions that changes from round to round. The set of hash functions may be derived using two base hash functions without reliance on the use of any randomizing operations, and the result of each hash function may be assigned to modify a different portion of the SEPDS. The generated SEPDS is sent to a second node, which performs probabilistic queries on the SEPDS to compare the contents of its own data set with the SEPDS. Any data items that are missing from the SEPDS are sent back to the first node, which updates its data set accordingly.
-
Citations
20 Claims
-
1. A system, comprising:
a distributed computing system comprising a plurality of nodes, each implemented by one or more processors and associated memory, wherein; each node stores a data set of data items and uses a gossip protocol to synchronize its data set with data sets of other nodes of the plurality of nodes; and during each round of the gossip protocol, each node is configured to; select another node of the plurality of nodes to gossip with; determine a set of hash functions to be used in the round to generate a space-efficient probabilistic data structure (SEPDS) that reflects a content of its data set, wherein the set of hash functions changes from round to round; generate the SEPDS from its data set using the set of hash functions; and perform a synchronization operation with the other node using the SEPDS. - View Dependent Claims (2, 3, 4, 5)
-
6. A method comprising:
performing a synchronization of data sets in a distributed computing system using a gossip protocol, wherein the distributed computing system comprises a plurality of nodes implemented by one or more processors and associated memory, and each node performs, during each round of the gossip protocol; selecting another node in the plurality of nodes to gossip with; determining a set of hash functions to be used for the round to generate a space-efficient probabilistic data structure (SEPDS) that reflects a content of its data set, wherein the set of hash functions changes from round to round; generating the SEPDS from its data set using the set of hash functions; and performing a synchronization operation with the other node using the SEPDS. - View Dependent Claims (7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
-
17. A non-transitory computer-accessible storage medium storing program instructions that when executed on one or more processors of a first node in a distributed computing system implemented by a plurality of nodes, cause the first node to:
exchange communications via a gossip protocol with other nodes in the plurality of nodes to synchronize a data set stored by the first node with data sets stored by the other nodes, wherein, during a given round of the gossip protocol, the program instructions when executed cause the first node to; determine a set of hash functions for the round, wherein the set of hash functions changes from round to round; generate a space-efficient probabilistic data structure (SEPDS) that reflects a content of the data set using the set of hash functions; send the SEPDS to a second node of the plurality of nodes; receive from the second node one or more data items that are missing from the SEPDS; and update the data set with the received data items. - View Dependent Claims (18, 19, 20)
Specification