System and method to monitor and isolate faults in a storage area network
First Claim
1. A storage area network comprising:
- a plurality of loosely-coupled storage controllers arranged in a redundant configuration to provide, to a plurality of servers, access to virtualized storage, wherein one of the storage controllers operates as a master storage controller and the other storage controller or controllers operate as slave storage controllers;
a respective monitoring application executing on each of the storage controllers configured to determine whether or not the storage controllers are operating properly; and
two or more communication channels coupling the storage controllers and wherein;
the storage controllers are logically arranged in a binary tree having a root node and one or more child nodes such that the master storage controller is the root node of the tree and the slave storage controller or controllers are the child nodes, wherein the root node and each child node have, at most, two associated child nodes; and
each particular node is configured to periodically send, over at least one of the two or more communications channels, a respective inquiry message to each of its associated child nodes and, in response to an inquiry message, each associated child node is configured to send, over at least one of the two or more communications channels, an acknowledgement message to its parent node.
5 Assignments
0 Petitions
Accused Products
Abstract
A fiber channel storage area network (SAN) provides virtualized storage space for a number of servers to a number of virtual disks implemented on various virtual redundant array of inexpensive disks (RAID) devices striped across a plurality of physical disk drives. The SAN includes plural controllers and communication paths to allow for fail-safe and fail-over operation. The plural controllers can be loosely-coupled to provide n-way redundancy and have more than one independent channel for communicating with one another. In the event of a failure involving a controller or controller interface, the virtual disks that are accessed via the affected interfaces are re-mapped to another interface in order to continue to provide high data availability. In particular, deadman timers, heartbeat signals internal to each controller, and heartbeat signals between different controllers are used to detect controllers that are no longer communicating with other controllers in order to identify those controllers which are failing or have failed.
-
Citations
16 Claims
-
1. A storage area network comprising:
-
a plurality of loosely-coupled storage controllers arranged in a redundant configuration to provide, to a plurality of servers, access to virtualized storage, wherein one of the storage controllers operates as a master storage controller and the other storage controller or controllers operate as slave storage controllers; a respective monitoring application executing on each of the storage controllers configured to determine whether or not the storage controllers are operating properly; and two or more communication channels coupling the storage controllers and wherein; the storage controllers are logically arranged in a binary tree having a root node and one or more child nodes such that the master storage controller is the root node of the tree and the slave storage controller or controllers are the child nodes, wherein the root node and each child node have, at most, two associated child nodes; and each particular node is configured to periodically send, over at least one of the two or more communications channels, a respective inquiry message to each of its associated child nodes and, in response to an inquiry message, each associated child node is configured to send, over at least one of the two or more communications channels, an acknowledgement message to its parent node. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A method, in a storage area network comprising plural, loosely-coupled redundant storage controllers, for monitoring the operational status of the storage controllers, said method comprising the steps of:
-
arranging the storage controllers logically into a binary tree structure having a root node and one or more child nodes such that a master controller from among the storage controllers is the root node of the tree and the other storage controllers, operating as slave controllers, are the child nodes, wherein the root node and each child node have, at most, two associated child nodes; monitoring at each particular node an internal operating status of that particular node; monitoring at each particular node an operating status of any immediate parent node and any immediate child nodes, wherein an immediate parent node is a node arranged in the binary tree above the particular node so as to have no intervening node, and wherein an immediate child node is a node arranged in the tree below the particular node so as to have no intervening node; and determining, at each particular node, if a failure has occurred based on either monitoring step. - View Dependent Claims (14, 15, 16)
-
Specification