Reliable distributed shared memory
First Claim
1. At a first node in a distributed shared memory system, said system implemented using a weak consistency protocol, a method of replicating state comprising:
- completing access to a synchronization variable;
after completing said access, sending a message to a second node, said message comprising;
an indication of a global ordering of access to said synchronization variable;
an indication that a page of shared memory has undergone a modification, said page of shared memory including memory referenced by said synchronization variable; and
a record of said modification.
0 Assignments
0 Petitions
Accused Products
Abstract
In implementing a reliable distributed shared memory, a weak consistency model is modified to ensure that all vital data structures are properly replicated at all times. Write notices and their corresponding diffs are stored on a parameterizable number of nodes. Whenever a node (say the primary role) releases a lock, it sends its current vector timestamp, write notices generated during the time the lock was held and their corresponding diffs to secondary node. The secondary node keeps this information separate from its own private data structures. If a node fails (detected by all nodes simultaneously through a group membership protocol) while holding a lock, then all nodes complete a lock release method, and enter a recovery operation. During this recovery operation, all nodes exchange all write notices and corresponding diffs, including backup write notices and diffs held by nodes on behalf of the failed node. After the information has been exchanged, diffs are applied and all nodes may start fresh.
55 Citations
14 Claims
-
1. At a first node in a distributed shared memory system, said system implemented using a weak consistency protocol, a method of replicating state comprising:
-
completing access to a synchronization variable;
after completing said access, sending a message to a second node, said message comprising;
an indication of a global ordering of access to said synchronization variable;
an indication that a page of shared memory has undergone a modification, said page of shared memory including memory referenced by said synchronization variable; and
a record of said modification. - View Dependent Claims (2, 3)
-
-
4. At a first node in a distributed shared memory system, said system implemented using a weak consistency protocol, a method of replicating state comprising:
-
releasing a lock on a unit of shared memory;
after releasing said lock, sending a message to a second node, said message comprising;
a vector timestamp;
a write notice indicating that a page of shared memory underwent a modification while said lock was held; and
a record of said modification.
-
-
5. At a first node in a distributed shared memory system, said system implemented using a weak consistency protocol, a processor operable to:
-
complete access to a synchronization variable;
after completing said access, send a message to a second node, said message comprising;
an indication of a global ordering of access to said synchronization variable;
an indication that a page of shared memory has undergone a modification, said page of shared memory including memory referenced by said synchronization variable; and
a record of said modification.
-
-
6. A computer readable medium for providing program control to a processor, said processor included in a node in a distributed shared memory system, said system implemented using a weak consistency protocol, said computer readable medium adapting said processor to be operable to:
-
complete access to a synchronization variable;
after completing said access, send a message to a second node, said message comprising;
an indication of a global ordering of access to said synchronization variable;
an indication that a page of shared memory has undergone a modification, said page of shared memory including memory referenced by said synchronization variable; and
a record of said modification.
-
-
7. A computer data signal embodied in a carrier wave comprising:
-
an indication of a global ordering of access to said synchronization variable;
an indication that a page of shared memory has undergone a modification, said page of shared memory including memory referenced by said synchronization variable; and
a record of said modification.
-
-
8. A method for synchronization variable managing in a distributed shared memory system, said system implemented using a weak consistency protocol, said method comprising:
-
receiving an access request related to a synchronization variable, where said synchronization variable is for a unit of shared memory;
determining a most recent node to have held said synchronization variable;
if said most recent node to have held said synchronization variable has failed, and said failure has occurred subsequent to sending a replication message, determining a node possessed of said replication message, said replication message including an indication of a global ordering of access to said synchronization variable, an indication that a page has undergone a modification while said synchronization variable was held, said page of shared memory including memory referenced by said synchronization variable, and a record of said modification; and
forwarding said access request to said node determined to be possessed of said replication message. - View Dependent Claims (9, 10, 11)
-
-
12. At a node acting as a synchronization variable manager in a distributed shared memory system, said system implemented using a weak consistency protocol, a processor operable to:
-
receive an access request related to a synchronization variable, where said synchronization variable is for a unit of shared memory;
determine a most recent node to have held said synchronization variable;
if said most recent node to have held said synchronization variable has failed, and said failure has occurred subsequent to sending a replication message, determine a node possessed of said replication message, said replication message including an indication of a global ordering of access to said synchronization variable, an indication that a page has undergone a modification while said synchronization variable was held, said page of shared memory including memory referenced by said synchronization variable, and a record of said modification; and
forward said access request to said node determined to be possessed of said replication message.
-
-
13. A computer readable medium for providing program control to a processor, said processor included in a node acting as a synchronization variable manager in a distributed shared memory system, said system implemented using a weak consistency protocol, said computer readable medium adapting said processor to be operable to:
-
receive an access request related to a synchronization variable, where said synchronization variable is for a unit of shared memory;
determine a most recent node to have held said synchronization variable;
if said most recent node to have held said synchronization variable has failed, and said failure has occurred subsequent to sending a replication message, determine a node possessed of said replication message, said replication message including an indication of a global ordering of access to said synchronization variable, an indication that a page has undergone a modification while said synchronization variable was held, said page of shared memory including memory referenced by said synchronization variable, and a record of said modification; and
forward said access request to said node determined to be possessed of said replication message.
-
-
14. At a first node in a group of nodes in a distributed shared memory system, said system implemented using a weak consistency protocol, a method of recovering from a failure of a second node in said group, said method comprising:
-
detecting, via a group membership protocol, said failure in said second node;
releasing each currently held synchronization variable;
waiting for each currently held synchronization variable to be released or expire;
entering a recovery operation, wherein said recovery operation comprises;
sending, to all nodes in said group, an indication of a global ordering of access to each said synchronization variable along with an indication of each page that has undergone a modification while one said synchronization variable was held, and a record of said modification;
receiving from other nodes in said group a plurality of indications of global ordering of access to each said synchronization variable currently held by other nodes, each said indication of global ordering sent with an indication of each page that has undergone a modification while one said synchronization variable was held, and a record of said modification; and
subsequent to completion of said sending and receiving, applying each said received record to a shared memory.
-
Specification