Persistent memory device for backup process checkpoint states
First Claim
1. A system for storing checkpoint data comprising:
- a network interface to an external network; and
a persistent memory unit coupled to the network interface, the persistent memory unit to;
receive the checkpoint data directly into a region of the persistent memory unit via a remote direct memory access (RDMA) write command issued by a first node executing a primary process, wherein the first node is connected to the network interface through the external network, andprovide direct access to the checkpoint data in the region of the persistent memory unit via a remote direct memory access (RDMA) read command issued by a second node executing a backup process, wherein the second node is connected to the network interface through the external network, wherein the remote direct memory access (RDMA) write command issued by the first node is preceded by a create request for the region, wherein the RDMA read command issued by the second node is preceded by an open request for the region, and wherein the backup process performs a function of the primary process in response to a failure of the primary process.
4 Assignments
0 Petitions
Accused Products
Abstract
A system is described that includes a network interface attached to a persistent memory unit. The persistent memory unit is configured to receive checkpoint data from a primary process, and to provide access to the checkpoint data for use in a backup process, which provides recovery capability in the event of a failure of the primary process. The network interface is configured to provide address translation information between virtual and physical addresses in the persistent memory unit. In other embodiments, the persistent memory unit is capable of storing multiple updates to the checkpoint state. The checkpoint state and the updates to the checkpoint state, if any, can be retrieved by the backup process periodically, or all at once upon failure of the primary process.
99 Citations
37 Claims
-
1. A system for storing checkpoint data comprising:
-
a network interface to an external network; and a persistent memory unit coupled to the network interface, the persistent memory unit to; receive the checkpoint data directly into a region of the persistent memory unit via a remote direct memory access (RDMA) write command issued by a first node executing a primary process, wherein the first node is connected to the network interface through the external network, and provide direct access to the checkpoint data in the region of the persistent memory unit via a remote direct memory access (RDMA) read command issued by a second node executing a backup process, wherein the second node is connected to the network interface through the external network, wherein the remote direct memory access (RDMA) write command issued by the first node is preceded by a create request for the region, wherein the RDMA read command issued by the second node is preceded by an open request for the region, and wherein the backup process performs a function of the primary process in response to a failure of the primary process. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A system for storing checkpoint data comprising:
-
a network interface to an external network; and a persistent memory unit coupled to the network interface, the persistent memory unit to; receive the checkpoint data into a region of the persistent memory unit via a remote direct memory access (RDMA) write command issued by a first node executing a primary process, wherein the first node is connected to the network interface through the external network, and provide access to the checkpoint data in the region of the persistent memory unit via a remote direct memory access (RDMA) read command issued by a second node executing a backup process, wherein the second node is connected to the network interface through the external network, wherein the remote direct memory access (RDMA) write command is preceded by a create request for the region, wherein the RDMA read command is preceded by an open request for the region, wherein the backup process is to perform a function of the primary process in response to a failure of the primary process, wherein the primary process is to directly write the checkpoint data to the persistent memory via the remote direct memory access (RDMA) write command without notifying an operating system of the first node, wherein the backup process is to directly read the checkpoint data from the persistent memory via the RDMA read command without notifying an operating system of the second node, and wherein the primary process provides the checkpoint data to the persistent memory unit independently from the backup process. - View Dependent Claims (15, 16, 17, 18, 19)
-
-
20. A system for storing checkpoint data comprising:
-
a network interface to an external network; and a persistent memory unit coupled to the network interface, wherein the persistent memory unit is configured to; receive the checkpoint data into a region of the persistent memory unit via a remote direct memory access (RDMA) write command from a first node executing a primary process, wherein the first node is connected to the network interface through the external network, provide access to the checkpoint data in the region via a remote direct memory access (RDMA) read command from a second node executing a backup process, wherein the second node is connected to the network interface through the external network, store meta-data regarding the contents and layout of memory regions within the persistent memory unit, and keep the meta-data consistent with the checkpoint data stored on the persistent memory unit, wherein the remote direct memory write command is preceded by a create request for the region and the read command is preceded by an open request for the region, wherein the backup process is to perform a function of the primary process in response to a failure of the primary process, wherein the primary process is to directly write the checkpoint data to the persistent memory via the remote direct memory access (RDMA) write command without notifying an operating system of the first node, wherein the backup process is to directly read the checkpoint data from the persistent memory via the RDMA read command without notifying an operating system of the second node, and wherein the primary process provides the checkpoint data to the persistent memory unit independently from the backup process. - View Dependent Claims (21, 22, 23, 24, 25)
-
-
26. A system for storing checkpoint data comprising:
-
a network interface to an external network; and a persistent memory unit coupled to the network interface, wherein the persistent memory unit is configured to receive the checkpoint data into a region of the persistent memory unit via a remote direct memory access (RDMA) write command issued by a first processor executing a primary process, wherein the first processor is connected to the network interface through the external network interface, provide access to the checkpoint data in the region via a remote direct memory access (RDMA) read command issued by a second processor executing a backup process, wherein the second processor is connected to the network interface through the external network, authenticate requests from remote processors with address protection and translation tables, and provide access information to authenticated remote processors, wherein the remote direct memory access (RDMA) write command is preceded by a create request for the region and the RDMA read command is preceded by an open request for the region, wherein the backup process is to perform a function of the primary process in response to a failure of the primary process, wherein the primary process is to directly write the checkpoint data to the persistent memory via the remote direct memory access (RDMA) write command, wherein the backup process is to directly read the checkpoint data from the persistent memory via the RDMA read command, and wherein the primary process provides the checkpoint data to the persistent memory unit independently from the backup process. - View Dependent Claims (27, 28, 29, 30, 31)
-
-
32. A system for storing checkpoint data comprising:
-
a network interface to an external network; and a persistent memory unit coupled to the network interface, wherein the persistent memory unit is configured to receive the checkpoint data into a region of the persistent memory unit via a remote direct memory access (RDMA) write command from a first processor node executing a primary process, wherein the first processor node is connected to the network interface through the external network interface, provide access to the checkpoint data in the region of the persistent memory unit via a remote direct memory access (RDMA) read command from a second processor node executing a backup process, wherein the second processor node is connected to the network interface through the external network, authenticate requests from remote processor nodes with address protection and translation tables, to provide access information to authenticated remote processor nodes, store meta-data regarding the contents and layout of memory regions within the persistent memory unit, and keep the meta-data consistent with the checkpoint data stored on the persistent memory unit, wherein the remote direct memory access (RDMA) write command is preceded by a create request for the region and the RDMA read command is preceded by an open request for the region, wherein the backup process is to perform a function of the primary process in response to a failure of the primary process, wherein the primary process of the first processor node is to directly write the checkpoint data to the persistent memory via the remote direct memory access (RDMA) write command, wherein the backup process of the second processor node is to directly read the checkpoint data from the persistent memory via the RDMA read command, and wherein the primary process provides the checkpoint data to the persistent memory unit independently from the backup process. - View Dependent Claims (33, 34, 35, 36, 37)
-
Specification