INFORMATION PROCESSING APPARATUS, CONTROL METHOD, AND COMPUTER-READABLE RECORDING MEDIUM
First Claim
1. An information processing apparatus comprising:
- a plurality of nodes that each comprise a storage device; and
an interconnect that connects between the plurality of nodes,wherein at least one node of the plurality of nodes comprises;
a detecting unit that detects a correctable error in data stored in a shared memory area included in a storage device of the one node or other node, the shared memory area being an area to which the one node and the other node access, and the correctable error being (i) an error which occurs more than a predetermined number of times within a predetermined time period or (ii) an error which occurs at a single location in the shared memory area;
a prevention control unit that, when the detecting unit detects the correctable error, performs control to prevent the one node and the other node from accessing the shared memory area;
a recovering unit that recovers the data stored in the shared memory area in a memory area different from the shared memory area;
a notifying unit that notifies information about the different memory area to the other node; and
a resumption control unit that performs control to resume the access to the recovered data from the one node and the other node.
1 Assignment
0 Petitions
Accused Products
Abstract
At least one node of a plurality of nodes in an information processing apparatus executes the following processing for data included in a memory of one node or other nodes and stored in a shared memory area which the node and the other nodes access. That is, the node detects an ICE which occurs over a predetermined number of times within a predetermined time or a PCE which occurs at a single location in the shared memory area. When the error is detected, the node performs control to prevent the node and the other nodes from accessing the shared memory. The node recovers the data in a memory area different from the shared memory area. The node notifies information about the different memory area to the other nodes. The node performs control to resume the access to the data from the node and the other nodes.
-
Citations
9 Claims
-
1. An information processing apparatus comprising:
-
a plurality of nodes that each comprise a storage device; and an interconnect that connects between the plurality of nodes, wherein at least one node of the plurality of nodes comprises; a detecting unit that detects a correctable error in data stored in a shared memory area included in a storage device of the one node or other node, the shared memory area being an area to which the one node and the other node access, and the correctable error being (i) an error which occurs more than a predetermined number of times within a predetermined time period or (ii) an error which occurs at a single location in the shared memory area; a prevention control unit that, when the detecting unit detects the correctable error, performs control to prevent the one node and the other node from accessing the shared memory area; a recovering unit that recovers the data stored in the shared memory area in a memory area different from the shared memory area; a notifying unit that notifies information about the different memory area to the other node; and a resumption control unit that performs control to resume the access to the recovered data from the one node and the other node. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. An information processing apparatus comprising:
-
a plurality of nodes that each comprise a storage device; and an interconnect that connects between the plurality of nodes, wherein at least one node of the plurality of nodes comprises; an access control unit that controls an access to the storage device of the one node, and comprises an error detecting unit that detects an error of data read from the storage device; and a processing unit that performs a process comprising, preventing an access to a shared memory area from the one node and other node when the error detecting unit detects a correctable error of data stored in the shared memory area, the shared memory area being included in the storage device of the one node and accessed by the one node and the other node, the correctable error being (i) an error which occurs more than a predetermined number of times within a predetermined time period or (ii) an error which occurs at a single location in the shared memory area; recovering the data stored in the shared memory area in a memory area different from the shared memory area and included in the storage device of the one node; notifying information about the different memory area to the other node; and resuming the access to the recovered data from the one node and the other node.
-
-
8. A computer-readable recording medium having stored therein a control program for causing at least one node of a plurality of nodes in an information processing apparatus to execute a process, the information processing apparatus comprising the plurality of nodes each comprising a storage device, and an interconnect connecting between the plurality of nodes, the process comprising:
-
detecting a correctable error in data stored in a shared memory area included in a storage device of the one node or other node, the shared memory area being an area to which the one node and the other node access, and the correctable error being (i) an error which occurs more than a predetermined number of times within a predetermined time period or (ii) an error which occurs at a single location in the shared memory area; when the correctable error is detected, performing control to prevent the one node and the other node from accessing the shared memory area; recovering the data stored in the shared memory area in a memory area different from the shared memory area and included in the storage device of the one node; notifying information about the different memory area to the other node; and performing control to resume the access to the recovered data from the one node and the other node.
-
-
9. A control method performed by at least one node of a plurality of nodes in an information processing apparatus, the information processing apparatus comprising the plurality of nodes each comprising a storage device, and an interconnect connecting between the plurality of nodes, the control method comprising:
-
detecting a correctable error in data stored in a shared memory area included in a storage device of the one node or other node, the shared memory area being an area to which the one node and the other node access, and the correctable error being (i) an error which occurs more than a predetermined number of times within a predetermined time period or (ii) an error which occurs at a single location in the shared memory area; when the correctable error is detected, performing control to prevent the one node and the other node from accessing the shared memory area; recovering the data stored in the shared memory area in a memory area different from the shared memory area; notifying information about the different memory area to the other node; and performing control to resume the access to the recovered data from the one node and the other node.
-
Specification