Techniques for maintaining operation of data storage system during a failure
First Claim
1. A data storage system, comprising:
- a first storage processor;
a second storage processor; and
a communications subsystem having (i) an interfacing portion interconnected between the first storage processor and the second storage processor, (ii) a clock circuit coupled to the interfacing portion, and (iii) a controller coupled to the interfacing portion and the clock circuit, the controller being configured to;
enable operation of the interfacing portion to provide communications between the first and second storage processors;
sense a failure within the clock circuit; and
reset the interfacing portion in response to the sensed failure to enable one of the first and second storage processors to continue operation;
wherein the controller of the communications subsystem includes;
a watchdog stage which is configured to generate an error signal in response to loss of a clock signal from the clock circuit within a predetermined timeout period;
wherein the interfacing portion of the communications subsystem includes a first interface device coupled to the first storage processor, a second interface device coupled to the second storage processor, and a communications bus connecting the first and second interface devices together;
wherein the controller of the communications subsystem further includes;
an output stage coupled to the watchdog stage, the output stage being configured to provide a reset signal to the first interface device in response to the error signal, the reset signal enabling the second storage processor to continue operation;
wherein (i) the first interface device is disposed at one end of the communications bus and (ii) the second interface device is disposed at another end of the communications bus to form a communications pathway between the first and second storage processors; and
wherein the controller, when enabling operation of the interfacing portion, is configured to;
direct the first interface device coupled to the first storage processor and the second interface device coupled to the second storage processor to concurrently operate as communications end points of the communications pathway formed between the first and second storage processors to exchange cached data between the first and second storage processors through the first interface device, the second interface device and the communications bus.
9 Assignments
0 Petitions
Accused Products
Abstract
A data storage system has a first storage processor, a second storage processor, and a communications subsystem. The communications subsystem has (i) an interfacing portion interconnected between the first storage processor and the second storage processor, (ii) a clock circuit coupled to the interfacing portion, and (iii) a controller coupled to the interfacing portion and the clock circuit. The controller is configured to enable operation of the interfacing portion to provide communications between the first and second storage processors, sense a failure within the clock circuit, and reset the interfacing portion in response to the sensed failure to enable one of the first and second storage processors to continue operation. Such resetting of the interfacing portion prevents the remaining storage processor from locking up, thus freeing that storage processor so that it is capable of continuing to operate even after the failure.
-
Citations
12 Claims
-
1. A data storage system, comprising:
-
a first storage processor; a second storage processor; and a communications subsystem having (i) an interfacing portion interconnected between the first storage processor and the second storage processor, (ii) a clock circuit coupled to the interfacing portion, and (iii) a controller coupled to the interfacing portion and the clock circuit, the controller being configured to; enable operation of the interfacing portion to provide communications between the first and second storage processors; sense a failure within the clock circuit; and reset the interfacing portion in response to the sensed failure to enable one of the first and second storage processors to continue operation; wherein the controller of the communications subsystem includes; a watchdog stage which is configured to generate an error signal in response to loss of a clock signal from the clock circuit within a predetermined timeout period; wherein the interfacing portion of the communications subsystem includes a first interface device coupled to the first storage processor, a second interface device coupled to the second storage processor, and a communications bus connecting the first and second interface devices together; wherein the controller of the communications subsystem further includes; an output stage coupled to the watchdog stage, the output stage being configured to provide a reset signal to the first interface device in response to the error signal, the reset signal enabling the second storage processor to continue operation; wherein (i) the first interface device is disposed at one end of the communications bus and (ii) the second interface device is disposed at another end of the communications bus to form a communications pathway between the first and second storage processors; and wherein the controller, when enabling operation of the interfacing portion, is configured to; direct the first interface device coupled to the first storage processor and the second interface device coupled to the second storage processor to concurrently operate as communications end points of the communications pathway formed between the first and second storage processors to exchange cached data between the first and second storage processors through the first interface device, the second interface device and the communications bus. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A communications subsystem for a data storage system having a first storage processor and a second storage processor, the communications subsystem comprising:
-
an interfacing portion configured to interconnect the first storage processor with the second storage processor; a clock circuit coupled to the interfacing portion; and a controller coupled to the interfacing portion and the clock circuit, the controller being configured to; enable operation of the interfacing portion to provide communications between the first and second storage processors; sense a failure within the clock circuit; and reset the interfacing portion in response to the sensed failure to enable one of the first and second storage processors to continue operation; wherein the controller includes; a watchdog stage which is configured to generate an error signal in response to loss of a clock signal from the clock circuit within a predetermined timeout period; wherein the interfacing portion includes a first interface device configured to couple to the first storage processor, a second interface device configured to couple to the second storage processor, and a communications bus connecting the first and second interface devices together; wherein the controller further includes; an output stage coupled to the watchdog stage, the output stage being configured to provide a reset signal to the first interface device in response to the error signal, the reset signal enabling the second storage processor to continue operation; wherein (i) the first interface device is disposed at one end of the communications bus and (ii) the second interface device is disposed at another end of the communications bus to form a communications pathway between the first and second storage processors; and wherein the controller, when enabling operation of the interfacing portion, is configured to; direct the first interface device coupled to the first storage processor and the second interface device coupled to the second storage processor to concurrently operate as communications end points of the communications pathway formed between the first and second storage processors to exchange cached data between the first and second storage processors through the first interface device, the second interface device and the communications bus. - View Dependent Claims (7, 8, 9, 10)
-
-
11. In a data storage system having (i) a first storage processor, (ii) a second storage processor and (iii) a communications subsystem coupled to the first and second storage processors, a method for operating the data storage system during a failure within the communications subsystem, the method comprising:
-
while the first and second storage processors perform data storage operations, enabling operation of the communications subsystem to provide communications between the first and second storage processors; sensing a failure within a critical portion of the communications subsystem; and resetting an interfacing portion of the communications subsystem in response to the sensed failure to enable one of the first and second storage processors to continue operation; wherein the communications subsystem is configured to exchange cached data for cache coherency between the first and second storage processors; and wherein sensing the failure within the critical portion of the communications subsystem includes detecting a malfunction within the communication subsystem which prevents the communications subsystem from exchanging cached data for cache coherency between the first and second storage processors.
-
-
12. In a data storage system having (i) a first storage processor, (ii) a second storage processor and (iii) a communications subsystem coupled to the first and second storage processors, a method for operating the data storage system during a failure within the communications subsystem, the method comprising:
-
while the first and second storage processors perform data storage operations, enabling operation of the communications subsystem to provide communications between the first and second storage processors; sensing a failure within a critical portion of the communications subsystem; and resetting an interfacing portion of the communications subsystem in response to the sensed failure to enable one of the first and second storage processors to continue operation; wherein the critical portion of the communications subsystem includes clock circuitry; wherein sensing the failure includes; generating an error signal in response to loss of a clock signal from the clock circuitry within a predetermined timeout period; wherein the communications subsystem includes a first interface device coupled to the first storage processor, and a second interface device coupled to the second storage processor, the first and second interface devices being connected together through a communications bus; wherein resetting the interfacing portion includes; outputting a reset signal to the first interface device to enable the second storage processor to continue operation; and wherein (i) the first interface device is disposed at one end of the communications bus and (ii) the second interface device is disposed at another end of the communications bus to form a communications pathway between the first and second storage processors; and wherein enabling operation of the communications subsystem to provide the communications between the first and second storage processors includes; directing the first interface device coupled to the first storage processor and the second interface device coupled to the second storage processor to concurrently operate as communications end points of the communications pathway formed between the first and second storage processors to exchange cached data between the first and second storage processors through the first interface device, the second interface device and the communications bus.
-
Specification