Method and system for approximate, monotonic time synchronization for a multiple node NUMA system
First Claim
1. A method for monotonic time synchronization in a multi-node data processing system, comprising the steps of:
- designating one of n number of nodes as node zero;
initiating re-synchronization;
estimating a period of time necessary for transmitting data from said node zero to a target node in said multi-node data processing system;
updating said node zero time if said target node time is leading said node zero time;
updating said target node time if said node zero time is leading said target node time; and
repeating previous said steps for all other nodes in said multi-node data processing system.
2 Assignments
0 Petitions
Accused Products
Abstract
In a multi-node non-uniform memory access (NUMA) multi-processor system, a designated node synchronization processor on each node, is synchronized. Individual nodes accomplish internal synchronization of the other processors on each node utilizing well known techniques. Thus it is sufficient to synchronize one processor on each node. Node zero, a designated system node that acts as a synchronization manager, estimates the time it takes to transmit information in packet form to a particular, remote node in the system. As a result a time value is transmitted from the remote node to node zero. Node zero projects the current time on the remote node, based on the transmission time estimate and compares that with its own time and either updates its own clock to catch up with a leading remote node or sends a new time value to the other node, requiring the remote node to advance its time to catch up with that on node zero. Code on the remaining nodes is mostly passive, responding to packets coming from node zero and setting the time base value when requested. Monotonicity of the time bases is maintained by always advancing the earliest of the two time bases so as to catch up with the later one.
34 Citations
19 Claims
-
1. A method for monotonic time synchronization in a multi-node data processing system, comprising the steps of:
-
designating one of n number of nodes as node zero;
initiating re-synchronization;
estimating a period of time necessary for transmitting data from said node zero to a target node in said multi-node data processing system;
updating said node zero time if said target node time is leading said node zero time;
updating said target node time if said node zero time is leading said target node time; and
repeating previous said steps for all other nodes in said multi-node data processing system. - View Dependent Claims (2, 3, 4, 5, 6)
noting time at said node zero and sending a data packet from said node zero to said target node;
noting, at said node zero, the arrival time of said data packet at said target node;
calculating packet transit time by subtracting the time said packet was sent from the time said packet arrived at said target node;
determining validity of said transit time;
iterating transit time calculations a predetermined number of times; and
averaging results of said transit time calculations.
-
-
3. The method in claim 2, wherein determining validity of transit time calculations, further comprises:
comparing each round trip time estimate with a known minimum value and expected value less three times the standard deviation and an expected value plus three times the standard deviation, wherein said minimum value, standard deviation and expected value are known from design-time information.
-
4. The method in claim 2, further comprising:
-
receiving a time base value from said target node;
discarding all response packets except final said response packet; and
adjusting said time base value on said node zero or said target node based on a comparison of the time value with said time value received from said target node.
-
-
5. The method in claim 4, further comprising:
adjusting a re-synchronization interval based on a determination of the speed with which said target node time base is drifting away from said node zero time base.
-
6. The method in claim 5, further comprising:
-
re-synchronizing nodes more frequently when said target node time base drifts rapidly with respect to said node zero time base; and
re-synchronizing nodes less frequently when said target node time base drifts slowly with respect to said node zero time base.
-
-
7. A computer program product within a computer readable medium having instructions for monotonic time synchronization in a multi-node data processing system, comprising the steps of:
-
instructions within said computer program product for designating one of n number of nodes as node zero;
instructions within said computer program product for initiating re-synchronization;
instructions within said computer program product for estimating a period of time necessary for transmitting data from said node zero to a target node in said multi-node data processing system;
instructions within said computer program product for updating said node zero time if said target node time is leading said node zero time;
instructions within said computer program product for updating said target node time if said node zero time is leading said target node time; and
instructions within said computer program product for repeating said steps on all other nodes in said multi-node data processing system. - View Dependent Claims (8, 9, 10, 11, 12)
instructions within said computer program product for noting time at said node zero and sending a data packet from said node zero to said target node;
instructions within said computer program product for noting, at said node zero, the arrival time of said data packet at said target node;
instructions within said computer program product for calculating packet transit time by subtracting the time said packet was sent from the time said packet arrived at said target node;
instructions within said computer program product for determining validity of said transit time;
instructions within said computer program product for iterating transit time calculations a predetermined number of times; and
instructions within said computer program product for averaging results of said transit time calculations.
-
-
9. The computer program product in claim 8, wherein instructions for determining validity further comprises:
instructions within said computer program product for comparing each round trip time estimate with a known minimum value, an expected value less three times the standard deviation and an expected value plus three times the standard deviation, wherein said minimum value, standard deviation and expected value are known from design-time information.
-
10. The computer program product in claim 8, further comprising:
-
instructions within said computer program product for receiving a time base value from said target node;
instructions within said computer program product for disregarding all response packets except final said response packet; and
instructions within said computer program product for adjusting said time base value on said node zero or said target node based on a comparison of the time value with said time value received from said target node.
-
-
11. The computer program product in claim 8, further comprising:
instructions within said computer program product for adjusting a re-synchronization interval based on a determination of the speed with which said target node time base is drifting away from said node zero time base.
-
12. The computer program product in claim 11, further comprising:
-
instructions within said computer program product for re-synchronizing nodes more frequently when said target node time base drifts rapidly with respect to said node zero time base; and
instructions within said computer program product for re-synchronizing nodes less frequently when said target node time base drifts slowly with respect to said node zero time base.
-
-
13. A multi-node data processing system, comprising:
-
a system interconnect for transmitting data;
n number of nodes wherein each node comprises;
a plurality of processors connected to a local bus;
at least one memory connected to said local bus for storing said data; and
a controller connected to said local bus for controlling said memory, wherein said controller is also connected to a said system interconnect;
a register within each of said plurality of processors for recording and reporting time values in each of said plurality of nodes;
a designated node zero; and
logic means for synchronizing time values between said n number of nodes. - View Dependent Claims (14, 15, 16, 17, 18, 19)
discrimination means for designating one of n number of nodes as node zero;
means for initiating synchronization;
calculation means for estimating a period of time necessary for transmitting data from said node zero to a target node in said multi-node data processing system;
validation means for determining validity of said transit time;
update means for updating said node zero time if said target node time is leading said node zero time;
means for updating said target node time if said node zero time is leading said target node time; and
logic means for setting a time base value on all other nodes in said multi-node data processing system.
-
-
15. The multi-node data processing system in claim 14, wherein said validation means for determining validity, further comprises:
-
discrimination means for comparing each round trip time estimate with a calculation wherein said calculation comprises;
a known minimum value, an expected value less three times a standard deviation and an expected value plus three times said standard deviation, wherein said minimum value, said standard deviation and said expected value are known from design-time information.
-
-
16. The multi-node data processing system in claim 14, wherein said estimating means for estimating said period of time necessary for transmitting data from said node zero to a target node in said multi-node data processing system, further comprises:
-
transmission means for sending a data packet from said node zero to said target node;
means for receiving a time value from said target node at said node zero; and
discrimination means for comparing said target node time to said node zero time.
-
-
17. The multi-node data processing system in claim 14, further comprising:
logic means for adjusting time base value on said node zero or said target node based on said time value received from said target node.
-
18. The multi-node data processing system in claim 17, further comprising:
logic means for adjusting a re-synchronization interval based on a determination of the speed with which said target node time base is drifting away from said node zero time base.
-
19. The multi-node data processing system in claim 18, further comprising:
-
synchronization means for re-synchronizing nodes more frequently when said target node time base drifts rapidly with respect to said node zero time base; and
synchronization means for re-synchronizing nodes less frequently when said target node time base drifts slowly with respect to said node zero time base.
-
Specification