Reliable map-reduce communications in a decentralized, self-organizing communication orbit of a distributed network
First Claim
1. A method of providing message communications with failure detection and recovery in a linear communication orbit formed by a non-static collection of machines, the method comprising:
- at a first machine of the non-static collection of machines forming the linear communication orbit, wherein the first machine has a set of direct contacts comprising a set of machines distributed along the linear communication orbit;
monitoring a current propagation state of a first query that has departed from the first machine to travel in a forward direction along the linear communication orbit;
detecting a propagation failure of the first query corresponding to the current propagation state of the first query, including detecting a failure to receive an acknowledgement of the first query from a first direct contact among the set of direct contacts within a predetermined timeout corresponding to the first direct contact, wherein the first direct contact is located in an unanswered range of the first query downstream of the first machine along the linear communication orbit, and wherein there is at least one machine located between the first machine and the first direct contact of the first machine along the linear communication orbit; and
in accordance with detection of the propagation failure of the first query corresponding to the current propagation state of the first query, sending the first query directly to the first direct contact of the first machine, wherein the first direct contact of the first machine is configured to initiate failure recovery within at least part of a respective segment of the linear communication orbit between the first machine and the first direct contact of the first machine.
0 Assignments
0 Petitions
Accused Products
Abstract
Method and system for providing message communications with failure detection and recovery are disclosed. At a respective node of a non-static collection of nodes forming a linear communication orbit: the node identifies, from among the non-static collection of nodes, a set of forward contacts distributed in a forward direction along the linear communication orbit; the node monitors a propagation state of a first query that has departed from the respective node to travel in the forward direction along the linear communication orbit; and upon detecting a propagation failure of the first query based on the monitoring, the node sends the first query directly to a first forward contact among the set of forward contacts to initiate a failure recovery process within at least part of a segment of the linear communication orbit between the respective node and the first forward contact of the respective node.
102 Citations
21 Claims
-
1. A method of providing message communications with failure detection and recovery in a linear communication orbit formed by a non-static collection of machines, the method comprising:
at a first machine of the non-static collection of machines forming the linear communication orbit, wherein the first machine has a set of direct contacts comprising a set of machines distributed along the linear communication orbit; monitoring a current propagation state of a first query that has departed from the first machine to travel in a forward direction along the linear communication orbit; detecting a propagation failure of the first query corresponding to the current propagation state of the first query, including detecting a failure to receive an acknowledgement of the first query from a first direct contact among the set of direct contacts within a predetermined timeout corresponding to the first direct contact, wherein the first direct contact is located in an unanswered range of the first query downstream of the first machine along the linear communication orbit, and wherein there is at least one machine located between the first machine and the first direct contact of the first machine along the linear communication orbit; and in accordance with detection of the propagation failure of the first query corresponding to the current propagation state of the first query, sending the first query directly to the first direct contact of the first machine, wherein the first direct contact of the first machine is configured to initiate failure recovery within at least part of a respective segment of the linear communication orbit between the first machine and the first direct contact of the first machine. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
8. A non-transitory computer-readable medium, having instructions stored thereon, which when executed by one or more processors cause the processors to perform operations comprising:
at a first machine of a non-static collection of machines forming a linear communication orbit, wherein the first machine has a set of direct contacts comprising a set of machines distributed along the linear communication orbit; monitoring a current propagation state of a first query that has departed from the first machine to travel in a forward direction along the linear communication orbit; detecting a propagation failure of the first query corresponding to the current propagation state of the first query, including detecting a failure to receive an acknowledgement of the first query from a first direct contact among the set of direct contacts within a predetermined timeout corresponding to the first direct contact, wherein the first direct contact is located in an unanswered range of the first query downstream of the first machine along the linear communication orbit, and wherein there is at least one machine located between the first machine and the first direct contact of the first machine along the linear communication orbit; and in accordance with detection of the propagation failure of the first query corresponding to the current propagation state of the first query, sending the first query directly to the first direct contact of the first machine, wherein the first direct contact of the first machine is configured to initiate failure recovery within at least part of a respective segment of the linear communication orbit between the first machine and the first direct contact of the first machine. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
15. A first machine, comprising:
-
one or more processors; and memory having instructions stored thereon, which when executed by the one or more processors cause the processors to perform operations, wherein; the first machine is among a non-static collection of machines forming a linear communication orbit, the first machine has a set of direct contacts comprising a set of machines distributed along the linear communication orbit, and the operations include; monitoring a current propagation state of a first query that has departed from the first machine to travel in a forward direction along the linear communication orbit; detecting a propagation failure of the first query corresponding to the current propagation state of the first query, including detecting a failure to receive an acknowledgement of the first query from a first direct contact among the set of direct contacts within a predetermined timeout corresponding to the first direct contact, wherein the first direct contact is located in an unanswered range of the first query downstream of the first machine along the linear communication orbit, and wherein there is at least one machine located between the first machine and the first direct contact of the first machine along the linear communication orbit; and in accordance with detection of the propagation failure of the first query corresponding to the current propagation state of the first query, sending the first query directly to the first direct contact of the first machine, wherein the first direct contact of the first machine is configured to initiate failure recovery within at least part of a respective segment of the linear communication orbit between the first machine and the first direct contact of the first machine. - View Dependent Claims (16, 17, 18, 19, 20, 21)
-
Specification