System and method for providing highly available data storage using globally addressable memory
First Claim
1. In a system for providing distributed control over data, a method for continuing operation after a node failure, the method comprising:
- (a) providing a plurality of nodes inter-connected by a network which periodically exchange connectivity information;
(b) storing on each node an instance of a data control program for manipulating data to provide multiple, distributed instances of the data control program;
(c) interfacing each instance of the data control program to a distributed shared memory system that provides distributed storage across the inter-connected node and that provides addressable persistent storage of data;
(d) operating each instance of the data control program to employ the shared memory system as a memory device having data contained therein, whereby the shared memory system maintains multiple, persistent copies of data distributed among more than one network node;
(e) determining from the exchanged connectivity information the failure of a node;
(f) determining a portion of the data for which the failed node was responsible; and
(g) storing a copy of the portion of the data for which the failed node was responsible in persistent storage hosted by a surviving node.
7 Assignments
0 Petitions
Accused Products
Abstract
A network of computer node interface to globally addressable memory system that provides persistent storage of data exchange periodic connectivity information. The exchanged connectivity information provides information regarding node failure to other nodes in the system, and the surviving nodes use the information to determine which node, if any, has ceased functioning. Various processes are used to recover the portion of the global address space for which the failed node was responsible, including RAM directory, disk directory, or file system information. Additionally, nodes may be subdivided into groups and connectivity information is exchanged between nodes belonging to a group. Each group then exchanges group-wise connectivity information and failures may be recovered.
652 Citations
14 Claims
-
1. In a system for providing distributed control over data, a method for continuing operation after a node failure, the method comprising:
-
(a) providing a plurality of nodes inter-connected by a network which periodically exchange connectivity information; (b) storing on each node an instance of a data control program for manipulating data to provide multiple, distributed instances of the data control program; (c) interfacing each instance of the data control program to a distributed shared memory system that provides distributed storage across the inter-connected node and that provides addressable persistent storage of data; (d) operating each instance of the data control program to employ the shared memory system as a memory device having data contained therein, whereby the shared memory system maintains multiple, persistent copies of data distributed among more than one network node; (e) determining from the exchanged connectivity information the failure of a node; (f) determining a portion of the data for which the failed node was responsible; and (g) storing a copy of the portion of the data for which the failed node was responsible in persistent storage hosted by a surviving node. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. In a system for providing distributed control over data, a method for continuing operation after a node failure, the method comprising:
-
(a) providing a plurality of nodes inter-connected by a network which periodically exchange connectivity information; (b) storing on each node an instance of a data control program for manipulating data to provide multiple, distributed instances of the data control program; (c) interfacing each instance of the data control program to a globally addressable data store that provides distributed storage across the inter-connected node and that provides addressable persistent storage of data; (d) operating each instance of the data control program to employ the globally addressable data store as a memory device having data contained therein, whereby the globally addressable data store maintains multiple, persistent copies of data distributed among more than one network node; (e) determining from the exchanged connectivity information the failure of a node; (f) determining a portion of the data for which the failed node was responsible; and (g) storing a copy of the portion of the data for which the failed node was responsible in persistent storage hosted by a surviving node. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
Specification