Storage system with automatic redundant code component failure detection, notification, and repair
First Claim
1. A RAID system, comprising:
- a non-volatile memory, configured to store a first program and first and second versions of a second program and a third program, wherein the first and second versions of the second program are different;
a volatile memory;
a watch dog timer, for detecting a failure during a boot process of the RAID system, the watch dog timer having a predetermined maximum timeout period, wherein the boot process is normally longer than said predetermined maximum timeout period of said watch dog timer; and
a processor, coupled to said non-volatile memory and to said volatile memory and to said watch dog timer, configured to execute said first program, wherein said first program is configured to;
detect said first version of said second program is failed; and
repair said failed first version of said second program in said non-volatile memory using said second version of said second program;
wherein said second program comprises an application program for performing RAID control functions;
wherein said third program is configured to decompress said first program stored in said non-volatile memory to a decompressed form and to write said decompressed form to said volatile memory during the boot process, wherein said third program is further configured to disable the watch dog timer after writing said decompressed form of said first program to said volatile memory, wherein said third program is further configured to re-enable the watch dog timer prior to said processor executing said first program.
1 Assignment
0 Petitions
Accused Products
Abstract
A RAID system includes a non-volatile memory storing a first program and first and second copies of a second program, and a processor executing the first program. The first program detects the first copy of the second program is failed and repairs the failed first copy in the non-volatile memory using the second copy. The failures may be detected at boot time or during normal operation of the controller. In one embodiment, the failure is detected via a CRC check. In one embodiment, the controller repairs the failed copy by copying the good copy to the location of the failed copy. In one embodiment, the system includes multiple controllers, each having its own processor and non-volatile memory and program that detects and repairs failed program copies. The programs include a loader, an application, FPGA code, CPLD code, and a program for execution by a power supply microcontroller.
-
Citations
55 Claims
-
1. A RAID system, comprising:
-
a non-volatile memory, configured to store a first program and first and second versions of a second program and a third program, wherein the first and second versions of the second program are different; a volatile memory; a watch dog timer, for detecting a failure during a boot process of the RAID system, the watch dog timer having a predetermined maximum timeout period, wherein the boot process is normally longer than said predetermined maximum timeout period of said watch dog timer; and a processor, coupled to said non-volatile memory and to said volatile memory and to said watch dog timer, configured to execute said first program, wherein said first program is configured to; detect said first version of said second program is failed; and repair said failed first version of said second program in said non-volatile memory using said second version of said second program; wherein said second program comprises an application program for performing RAID control functions; wherein said third program is configured to decompress said first program stored in said non-volatile memory to a decompressed form and to write said decompressed form to said volatile memory during the boot process, wherein said third program is further configured to disable the watch dog timer after writing said decompressed form of said first program to said volatile memory, wherein said third program is further configured to re-enable the watch dog timer prior to said processor executing said first program. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24)
-
-
25. A method for improving the data availability characteristics of a RAID system, comprising:
-
executing a first program on a processor of the RAID system; detecting, by the first program, that a first version of a second program is failed, wherein said first version of said second program is stored in a non-volatile memory of the RAID system; repairing, by the first program, said failed first version of said second program in said non-volatile memory using a second version of said second program stored in said non-volatile memory, wherein the first and second versions of the second program are different; decompressing the first program to a decompressed form and writing the decompressed form to a volatile memory of the RAID system during a boot process of the RAID system; and disabling a watch dog timer of the RAID system after writing the decompressed form of the first program to the volatile memory, wherein the watch dog timer is configured to detect a failure during the boot process, wherein the watch dog timer has a predetermined maximum timeout period, wherein the boot process is normally longer than the predetermined maximum timeout period of the watch dog timer; and re-enabling the watch dog timer of the RAID system prior to executing the first program on the processor; wherein said decompressing, said writing, and said disabling and then re-enabling are performed by a third program stored in the non-volatile memory; wherein said second program comprises an application program for performing RAID control functions. - View Dependent Claims (26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48)
-
-
49. A RAID system, comprising:
-
a first controller, comprising; a first non-volatile memory, configured to store a first program and first and second versions of a second program and a fifth program, wherein the first and second versions of the second program are different; a first volatile memory; a first watch dog timer, for detecting a failure during a boot process of the first controller, the first watch dog timer having a predetermined maximum timeout period, wherein the boot process of the first controller is normally longer than said predetermined maximum timeout period of said first watch dog timer; and a first processor, coupled to said first non-volatile memory and to said first volatile memory and to said first watch dog timer, configured to execute said first program, wherein said first program is configured to; detect said first version of said second program is failed; and repair said failed first version of said second program in said first non-volatile memory using said second version of said second program; wherein said fifth program is configured to decompress said first program stored in said first non-volatile memory to a decompressed form and to write said decompressed form to said first volatile memory during the boot process of the first controller, wherein said fifth program is further configured to disable the first watch dog timer after writing said decompressed form of said first program to said first volatile memory, wherein said fifth program is further configured to re-enable the first watch dog timer prior to said first processor executing said first program; and a second controller, coupled to said first controller, comprising; a second non-volatile memory, configured to store a third program and first and second versions of a fourth program and a sixth program, wherein the first and second versions of the fourth program are different; a second volatile memory; a second watch dog timer, for detecting a failure during a boot process of the second controller, the second watch dog timer having a predetermined maximum timeout period, wherein the boot process of the second controller is normally longer than said predetermined maximum timeout period of said second watch dog timer; and a second processor, coupled to said second non-volatile memory and to said second volatile memory and to said second watch dog timer, configured to execute said third program, wherein said third program is configured to; detect said first version of said fourth program is failed; and repair said failed first version of said fourth program in said second non-volatile memory using said second version of said fourth program; wherein said sixth program is configured to decompress said first program stored in said second non-volatile memory to a decompressed form and to write said decompressed form to said second volatile memory during the boot process of the second controller, wherein said sixth program is further configured to disable the second watch dog timer after writing said decompressed form of said first program to said second volatile memory, wherein said sixth program is further configured to re-enable the second watch dog timer prior to said second processor executing said third program; wherein said second program comprises an application program for performing RAID control functions. - View Dependent Claims (50, 51, 52, 53, 54, 55)
-
Specification