Memory sharing across distributed nodes
First Claim
1. In a distributed system comprising a first node and a second node, wherein the first node has a first main memory and the second node has a second main memory, wherein the first main memory and the second main memory comprise random access memory, a method performed by the first node, comprising:
- generating and storing, in the first main memory, a mapping that maps one or more memory addresses in a first memory location in the first main memory to one or more virtual memory addresses corresponding to a second memory location in the second main memory;
executing, by a processor on the first node, a set of program instructions pertaining to a particular thread of execution, wherein the set of program instructions includes a load instruction to load data from the first memory location of the first main memory, and a store instruction to store updated data into the first memory location of the first main memory;
wherein executing the load instruction comprises;
determining, by the processor, whether the data in the first memory location is valid;
in response to a determination that the data in the first memory location is invalid, causing the load instruction to trap, which causes the processor to suspend execution of the set of program instructions and to begin execution of a first set of trap handling instructions;
wherein executing the first set of trap handling instructions causes;
based on the mapping, obtaining valid data from the second memory location of the second main memory, and storing the valid data into the first memory location of the first main memory; and
updating a validity indicator to indicate that the data in the first memory location is valid; and
resuming, by the processor, execution of the set of program instructions;
wherein executing the store instruction comprises;
causing the store instruction to trap, which causes the first processor to suspend execution of the set of program instructions and to begin execution of a second set of trap handling instructions;
wherein the second set of trap handling instructions causes, based on the mapping, propagating the updated data to the second node to be stored within the second memory location of the second main memory; and
resuming, by the first processor, execution of the set of program instructions.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and apparatus are disclosed for enabling nodes in a distributed system to share one or more memory portions. A home node makes a portion of its main memory available for sharing, and one or more sharer nodes mirrors that shared portion of the home node'"'"'s main memory in its own main memory. To maintain memory coherency, a memory coherence protocol is implemented. Under this protocol, load and store instructions that target the mirrored memory portion of a sharer node are trapped, and store instructions that target the shared memory portion of a home node are trapped. With this protocol, valid data is obtained from the home node and updates are propagated to the home node. Thus, no “dirty” data is transferred between sharer nodes. As a result, the failure of one node will not cause the failure of another node or the failure of the entire system.
28 Citations
48 Claims
-
1. In a distributed system comprising a first node and a second node, wherein the first node has a first main memory and the second node has a second main memory, wherein the first main memory and the second main memory comprise random access memory, a method performed by the first node, comprising:
-
generating and storing, in the first main memory, a mapping that maps one or more memory addresses in a first memory location in the first main memory to one or more virtual memory addresses corresponding to a second memory location in the second main memory; executing, by a processor on the first node, a set of program instructions pertaining to a particular thread of execution, wherein the set of program instructions includes a load instruction to load data from the first memory location of the first main memory, and a store instruction to store updated data into the first memory location of the first main memory; wherein executing the load instruction comprises; determining, by the processor, whether the data in the first memory location is valid; in response to a determination that the data in the first memory location is invalid, causing the load instruction to trap, which causes the processor to suspend execution of the set of program instructions and to begin execution of a first set of trap handling instructions; wherein executing the first set of trap handling instructions causes; based on the mapping, obtaining valid data from the second memory location of the second main memory, and storing the valid data into the first memory location of the first main memory; and updating a validity indicator to indicate that the data in the first memory location is valid; and resuming, by the processor, execution of the set of program instructions; wherein executing the store instruction comprises; causing the store instruction to trap, which causes the first processor to suspend execution of the set of program instructions and to begin execution of a second set of trap handling instructions; wherein the second set of trap handling instructions causes, based on the mapping, propagating the updated data to the second node to be stored within the second memory location of the second main memory; and resuming, by the first processor, execution of the set of program instructions. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
-
-
20. A first node for use in a distributed computing system, the first node comprising:
-
a first main memory, wherein the first main memory stores a mapping that maps one or more memory addresses in a first memory location in the first main memory to one or more virtual memory addresses corresponding to a second memory location in a second main memory on a second node of the distributed system, wherein the first main memory and the second main memory comprise random access memory; a first set of trap handling instructions and a second set of trap handling instructions; and one or more processors including a first processor, the first processor operable to execute a load instruction to load data from the first memory location of the first main memory, and a store instruction to store data into the first memory location of the first main memory, wherein the load instruction and the store instruction are part of a set of program instructions pertaining to a particular thread of execution; wherein executing the load instruction includes determining whether data in the first memory location of the first main memory is valid, and in response to a determination that the data in the first memory location of the first main memory is invalid, to cause the load instruction to trap, which would cause the first processor to suspend execution of the set of program instructions and to begin execution of the first set of trap handling instructions; and wherein the first set of trap handling instructions, when executed by the first processor, would cause the first processor to cause; valid data to be obtained from the second memory location of the second main memory based on the mapping, and stored into the first memory location of the first main memory; and a validity indicator to be updated to indicate that the data in the first memory location is valid; and execution of the set of program instructions to be resumed; wherein the first processor comprises circuitry operable to cause the store instruction to trap, which would cause the first processor to suspend execution of the set of program instructions and to begin execution of the second set of trap handling instructions; and wherein the second set of trap handling instructions, when executed by the first processor, would cause the first processor to; cause the updated data to be propagated to the second node to be stored within the second memory location of the second main memory based on the mapping; and resume execution of the set of program instructions. - View Dependent Claims (21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36)
-
-
37. In a distributed system comprising a first node and a second node, wherein the first node has a first main memory and the second node has a second main memory, wherein the first main memory and the second main memory comprise random access memory, a method performed by the second node, comprising:
-
generating and storing, in the second main memory, a mapping that maps one or more memory addresses in a first memory location in the first main memory to one or more virtual memory addresses corresponding to a second memory location in the second main memory; executing, by a first processor on the second node, a store instruction to store updated data into the second memory location of the second main memory, wherein the store instruction is part of a set of program instructions pertaining to a particular thread of execution;
causing the store instruction to trap, which causes the first processor to suspend execution of the set of program instructions and to begin execution of a set of trap handling instructions;executing, by the first processor, the set of trap handling instructions, wherein executing the set of trap handling instructions causes; based on the mapping, storing the updated data into the second memory location of the second main memory; and resuming, by the first processor, execution of the set of program instructions. - View Dependent Claims (38, 39, 40, 41, 42)
-
-
43. A second node for use in a distributed computing system comprising a first node and the second node, the second node comprising:
-
a second main memory, wherein the second main memory stores a mapping that maps one or more virtual memory addresses corresponding to a second memory location in the second main memory to one or more memory addresses in a first memory location in a first main memory on the first node, wherein the first main memory and the second main memory comprise random access memory; a set of trap handling instructions; and one or more processors including a first processor, the first processor operable to execute a store instruction to store updated data into the second memory location of the second main memory, wherein the store instruction is part of a set of program instructions pertaining to a particular thread of execution, and wherein the first processor comprises circuitry operable to cause the store instruction to trap, which would cause the first processor to suspend execution of the set of program instructions and to begin execution of the set of trap handling instructions; and wherein the set of trap handling instructions, when executed by the first processor, would cause the first processor to; store the updated data into the second memory location of the second main memory based on the mapping; and resume execution of the set of program instructions. - View Dependent Claims (44, 45, 46, 47, 48)
-
Specification