METHOD AND APPARATUS FOR DETECTING MULTIPLE ANOMALIES IN A CLUSTER OF COMPONENTS

0Associated
Cases 
0Associated
Defendants 
0Accused
Products 
7Forward
Citations 
0
Petitions 
2
Assignments
First Claim
1. A method for detecting multiple anomalies in a cluster of components, comprising:
 monitoring derivatives obtained from one or more inferential variables which are received from sensors in the cluster of components;
determining whether one or more components within the cluster have experienced an anomalous event based on the monitored derivatives; and
if so, performing one or more remedial actions.
2 Assignments
0 Petitions
Accused Products
Abstract
A system that detects multiple anomalies in a cluster of components is presented. During operation, the system monitors derivatives obtained from one or more inferential variables which are received from sensors in the cluster of components. The system then determines whether one or more components within the cluster have experienced an anomalous event based on the monitored derivatives. If so, the system performs one or more remedial actions.
12 Citations
View as Search Results
System and method for detecting vehicle system faults  
Patent #
US 9,558,601 B2
Filed 04/24/2015

Current Assignee
The Boeing Co.

Sponsoring Entity
The Boeing Co.

INFORMATION TECHNOLOGY SERVICE MANAGEMENT  
Patent #
US 20130132534A1
Filed 07/30/2010

Current Assignee
Hewlett Packard Enterprise Development LP

Sponsoring Entity
Hewlett Packard Enterprise Development LP

Information technology service management  
Patent #
US 9,240,931 B2
Filed 07/30/2010

Current Assignee
Hewlett Packard Enterprise Development LP

Sponsoring Entity
Hewlett Packard Enterprise Development LP

Anomaly Detecting for Database Systems  
Patent #
US 20110271146A1
Filed 04/30/2010

Current Assignee
Mitre Corporation

Sponsoring Entity
Mitre Corporation

Anomaly detection for database systems  
Patent #
US 8,504,876 B2
Filed 04/30/2010

Current Assignee
Mitre Corporation

Sponsoring Entity
Mitre Corporation

COMPUTER SYSTEM, METHOD OF DETECTING SYMPTOM OF FAILURE IN COMPUTER SYSTEM, AND PROGRAM  
Patent #
US 20100083049A1
Filed 07/28/2009

Current Assignee
Hitachi Ltd.

Sponsoring Entity
Hitachi Ltd.

Method, apparatus and system for improving failover within a high availability disaster recovery environment  
Patent #
US 8,370,679 B1
Filed 06/30/2008

Current Assignee
Veritas Technologies LLC

Sponsoring Entity
Symantec Corporation

Systems and methods for predictive failure management  
Patent #
US 7,730,364 B2
Filed 04/05/2007

Current Assignee
International Business Machines Corporation

Sponsoring Entity
International Business Machines Corporation

Causal ladder mechanism for proactive problem determination, avoidance and recovery  
Patent #
US 7,349,826 B2
Filed 05/23/2006

Current Assignee
Trend Micro America Inc.

Sponsoring Entity
International Business Machines Corporation

Selfhealing containers  
Patent #
US 7,646,725 B1
Filed 12/13/2004

Current Assignee
RPX Clearinghouse LLC

Sponsoring Entity
Nortel Networks Limited

Diagnosis system for at least one technical system  
Patent #
US 20050144264A1
Filed 02/24/2003

Current Assignee
Siemens AG

Sponsoring Entity
Siemens AG

Root cause analysis of server system performance degradations  
Patent #
US 20030065986A1
Filed 10/19/2001

Current Assignee
entIT Software LLC

Sponsoring Entity
entIT Software LLC

29 Claims
 1. A method for detecting multiple anomalies in a cluster of components, comprising:
 monitoring derivatives obtained from one or more inferential variables which are received from sensors in the cluster of components;
determining whether one or more components within the cluster have experienced an anomalous event based on the monitored derivatives; and
if so, performing one or more remedial actions.  View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
 monitoring derivatives obtained from one or more inferential variables which are received from sensors in the cluster of components;
 10. A computerreadable storage medium storing instructions that when executed by a computer cause the computer to perform a method for detecting multiple anomalies in a cluster of components, wherein the method comprises:
 monitoring derivatives obtained from one or more inferential variables which are received from sensors in the cluster of components;
determining whether one or more components within the cluster have experienced an anomalous event based on the monitored derivatives; and
if so, performing one or more remedial actions.  View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
 monitoring derivatives obtained from one or more inferential variables which are received from sensors in the cluster of components;
 19. An apparatus that detects multiple anomalies in a cluster of components, comprising:
 a monitoring mechanism configured to monitor derivatives obtained from one or more inferential variables which are received from sensors in the cluster of components;
an analysis mechanism configured to determine whether one or more components within the cluster have experienced an anomalous event based on the monitored derivatives; and
a remedial action mechanism, wherein if the analysis mechanism determines that one or more components within the cluster has failed, the remedial action mechanism is configured to perform one or more remedial actions.
 a monitoring mechanism configured to monitor derivatives obtained from one or more inferential variables which are received from sensors in the cluster of components;
 20. A method for detecting multiple anomalies in a cluster of components, comprising:
 monitoring derivatives obtained from one or more inferential variables which are received from sensors in the cluster of components;
determining whether one or more specified events occurred within the cluster of components based on the monitored derivatives; and
if so, determining the probability that the cluster of components is in a specified state based on the one or more specified events that occurred.  View Dependent Claims (21, 22, 23, 24)
 monitoring derivatives obtained from one or more inferential variables which are received from sensors in the cluster of components;
 25. A computerreadable storage medium storing instructions that when executed by a computer cause the computer to perform a method for detecting multiple anomalies in a cluster of components, wherein the method comprises:
 monitoring derivatives obtained from one or more inferential variables which are received from sensors in the cluster of components;
determining whether one or more specified events occurred within the cluster of components based on the monitored derivatives; and
if so, determining the probability that the cluster of components is in a specified state based on the one or more specified events that occurred.  View Dependent Claims (26, 27, 28, 29)
 monitoring derivatives obtained from one or more inferential variables which are received from sensors in the cluster of components;
1 Specification
1. Field of the Invention
The present invention relates to techniques for monitoring the health of components within a cluster of components. More specifically, the present invention relates to a method and apparatus for detecting multiple anomalies in a cluster of components.
2. Related Art
For missioncritical systems, it is desirable to minimize downtime. One technique for minimizing downtime is to provide redundancy. For example, a failover mechanism can be used to automatically switch from a system that has failed to a healthy system. Unfortunately, failover mechanisms only take effect after a system has failed.
It is also desirable to be able to determine the reliability of components used in missioncritical systems so that only components with a meantime before failure (MTBF) that exceeds the system specification can be used. One technique for determining the reliability of components is to perform acceleratedlife studies where components are placed in stresstest chambers. However, it is typically not possible to apply pass/fail tests for components (or systems) being stressed while the components are in stresstest chambers. In practice, the components under stress are periodically removed from the stresstest chambers and are tested to determine the number of components that have failed. The components that have not failed are then returned to the stresstest chambers and are subjected to the desired acceleratedstress conditions. At the end of the acceleratedlife study, a history of failed versus healthy component counts at discrete time intervals is generated (e.g., at 100 Hrs, 200 Hrs, 300 Hrs, etc.). This history can be used to predict the reliability of the components. Unfortunately, stopping an acceleratedlife study to externally test the components is costly and timeconsuming.
Even if a system is populated with components which are deemed reliable (e.g., having an MTBF greater than required by the system specification), these components can still fail prematurely. For example, operating a system in extreme heat can cause components in the system to fail prematurely. Hence, it is desirable to periodically monitor the components during operation of the system to determine whether the components are at the onset of degradation. If so, a remedial action (e.g., replacing a degrading component) can be performed preemptively to prevent an unexpected system failure.
One technique for monitoring the health of components is to use sensors within a system to detect the onset of degradation of components within the system. Unfortunately, as the number of components to be monitored increases, the number of sensors required to monitor these components increases, which increases the cost and computing resources required to process the sensor data.
Hence, what is needed is a method and an apparatus for detecting anomalies in a cluster of components without the problems described above.
SUMMARYSome embodiments of the present invention provide a system that detects multiple anomalies in a cluster of components. During operation, the system monitors derivatives obtained from one or more inferential variables which are received from sensors in the cluster of components. The system then determines whether one or more components within the cluster have experienced an anomalous event based on the monitored derivatives. If so, the system performs one or more remedial actions.
Some embodiments of the present invention provide a system that detects multiple anomalies in a cluster of components. During operation, the system monitors derivatives obtained from one or more inferential variables which are received from sensors in the cluster of components. The system then determines whether one or more specified events occurred within the cluster of components based on the monitored derivatives. If so, the system determines the probability that the cluster of components is in a specified state based on the one or more specified events that occurred. If the determined probabilities meet specified criteria, the system performs one or more remedial actions.
BRIEF DESCRIPTION OF THE FIGURESFIG. 1A presents a block diagram of a computer system in accordance with an embodiment of the present invention.
FIG. 1B presents a block diagram of an anomalydetection module in accordance with an embodiments of the present invention.
FIG. 2 presents a flow chart illustrating a process of detecting multiple anomalies in a cluster of components in accordance with an embodiment of the present invention.
FIG. 3 presents a flow chart illustrating another process of detecting multiple anomalies in a cluster of components in accordance with an embodiment of the present invention.
FIG. 4 presents a flow chart illustrating a process of performing one or more remedial actions in accordance with an embodiment of the present invention.
FIG. 5 presents a flow chart illustrating a process of determining the probability that the cluster of components is in a specified state in accordance with an embodiment of the present invention.
FIG. 6 presents a flow chart illustrating an exemplary process of detecting multiple anomalies in a cluster of components in accordance with an embodiment of the present invention.
FIG. 7 presents exemplary graphs of a monitored telemetry signal and the derivative of the monitored telemetry signal in a system illustrating one anomalous event in accordance with an embodiment of the present invention.
FIG. 8 presents exemplary graphs of a monitored telemetry signal and the derivative of the monitored telemetry signal in a system illustrating two anomalous events in accordance with an embodiment of the present invention.
FIG. 9 presents exemplary graphs of a monitored telemetry signal and the derivative of the monitored telemetry signal in a system illustrating four anomalous events in accordance with an embodiment of the present invention.
FIG. 10 presents exemplary graphs of a monitored telemetry signal and the derivative of the monitored telemetry signal in a system illustrating six anomalous events in accordance with an embodiment of the present invention.
FIG. 11 presents exemplary graphs of a monitored telemetry signal and the derivative of the monitored telemetry signal in a system illustrating seven anomalous events in accordance with an embodiment of the present invention.
FIG. 12 illustrates a realtime telemetry system in accordance with an embodiment of the present invention.
DETAILED DESCRIPTIONThe following description is presented to enable any person skilled in the art to make and use the invention, and is provided in the context of a particular application and its requirements. Various modifications to the disclosed embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be applied to other embodiments and applications without departing from the spirit and scope of the present invention. Thus, the present invention is not limited to the embodiments shown, but is to be accorded the widest scope consistent with the principles and features disclosed herein.
The data structures and code described in this detailed description are typically stored on a computerreadable storage medium, which may be any device or medium that can store code and/or data for use by a computer system. The computerreadable storage medium includes, but is not limited to, volatile memory, nonvolatile memory, magnetic and optical storage devices such as disk drives, magnetic tape, CDs (compact discs), DVDs (digital versatile discs or digital video discs), or other media capable of storing computerreadable media now known or later developed.
The methods and processes described in the detailed description can be embodied as code, data structures, and/or data, which can be stored on a computerreadable storage medium as described above. When a computer system reads and executes the code and/or data stored on the computerreadable storage medium, the computer system performs the methods and processes embodied as code, data structures, and/or data that are stored within the computerreadable storage medium. Furthermore, the methods and processes described below can be included in hardware modules. For example, the hardware modules can include, but are not limited to, applicationspecific integrated circuit (ASIC) chips, fieldprogrammable gate arrays (FPGAs), and other programmablelogic devices now known or later developed. When the hardware modules are activated, the hardware modules perform the methods and processes included within the hardware modules.
Computer SystemFIG. 1A presents a block diagram illustrating a computer system 100 in accordance with an embodiment of the present invention. Computer system 100 includes processor 101, memory 102, storage device 103, realtime telemetry system 104, and anomalydetection module 105.
Processor 101 can generally include any type of processor, including, but not limited to, a microprocessor, a mainframe computer, a digital signal processor, a personal organizer, a device controller and a computational engine within an appliance. Memory 102 can include any type of memory, including but not limited to, dynamic random access memory (DRAM), static random access memory (SRAM), flash memory, read only memory (ROM), and any other type of memory now known or later developed. Storage device 103 can include any type of nonvolatile storage device that can be coupled to a computer system. This includes, but is not limited to, magnetic, optical, and magnetooptical storage devices, as well as storage devices based on flash memory and/or batterybacked up memory.
In one embodiment of the present invention, realtime telemetry system 104 is separate from computer system 100. Note that realtime telemetry system 104 is described in more detail below with reference to FIG. 12.
In some embodiments of the present invention, anomalydetection module 105 is separate from computer system 100. Note that anomalydetection module 105 is described in more detail below with reference to FIGS. 1B to 6. In some embodiments, realtime telemetry system 104 includes anomalydetection module 105.
FIG. 1B presents a block diagram of anomalydetection module 105 in accordance with an embodiments of the present invention. Anomalydetection module 105 includes monitoring module 106, analysis module 107, and remedialaction module 108. Monitoring module 106 is configured to monitor derivatives obtained from one or more inferential variables which are received from sensors in the cluster of components. Analysis module 107 is configured to determine whether one or more components within the cluster have experienced an anomalous event based on the monitored derivatives. In some embodiments, the anomalous event includes: a failure event, a fault event, and a recovery event. If analysis module 107 determines that one or more components within the cluster have experienced an anomalous event, remedial action module 108 is configured to perform one or more remedial actions. In some embodiments, one or more of monitoring module 106, analysis module 107, and remedialaction module 108 are included in one or more integrated circuit (IC) chips. For example, these IC chips can include, but are not limited to, applicationspecific integrated circuit (ASIC) chips, fieldprogrammable gate arrays (FPGAs), and other programmablelogic devices now known or later developed.
OverviewSome embodiments of the present invention detect multiple anomalies in clusters of components. In these embodiments, the anomalies can be detected regardless of whether the individual component anomalies evolve to completion, and regardless of whether repair actions are taken or whether component selfhealing occurs.
Note that the discussion below refers to clusters of components, but can also be applied to clusters of systems. Also note that that the discussion below refers to acceleratedlife studies of multiple components. However, the techniques below can be applied to any system in which multiple components are to be monitored. For example, complex computing systems, such as highperformance computing grids, clusters of servers, and groups of processors within a server system, and switching resources, such as routers, and crossconnects, can be monitored using the techniques described below.
For components undergoing acceleratedlife studies, it is often desirable to supply power to the devices under test while the components are in the stresstest chambers. In some embodiments, the current being applied to the devices is monitored to detect anomalies in the noisesignature timeseries of the current that indicate that a component is at the onset of degradation (or failure). In these embodiments, the currentnoise timeseries is an “inferential variable” that is used to detect the onset of failure and/or the exact time of failure for a component.
In some embodiments, the information extracted from the currentnoise time series can be used to adjust the acceleratedlife studies. For example, the test can be stopped if overstress is detected, the test can be adjusted to increase monitoring on existing monitored inferential variables, or the test can be adjusted to add additional inferential variables to be monitored.
Note that although the specification is described in terms of using current timeseries, any inferential variable can be monitored. In some embodiments, these inferential variables are received from realtime telemetry system 104. In some embodiments, the inferential variables received from the sensors include one or more of: hardware variables; and software variables. In some embodiments, the hardware variables include one or more of: voltage; current; temperature; vibration; optical power; optical wavelength; air velocity; measures of signal integrity (e.g., signal/noise ratio, a biterror rate, the number of times an operation in the component is retried, the size of an eyediagram opening, the height of the eyediagram opening, the width of the eyediagram opening, etc.); and fan speed. In some embodiments the software variables include one or more of: throughput; transaction latencies; queue lengths; central processing unit load; memory load; cache load; I/O traffic; bus saturation metrics; FIFO overflow statistics; network traffic; and diskrelated metrics.
Detecting Multiple Anomalies in Clusters of ComponentsIn some embodiments, the N components being monitored are coupled in series. In doing so, the number of sensors required to monitor a given inferential variable is reduced. For example, for components undergoing acceleratedlife studies, a decrease in the current I(t) can indicate that one or more components is at the onset of failure. Similarly, for complex computer systems, an increase in the execution time for a given process can indicate that one or more computer systems are at the onset of failure. Moreover, for routing resources, an increase in latency can indicate that one or more routing resources are at the onset of degradation. Hence, by monitoring these inferential variables, the number of sensors can be reduced from N to 1.
FIG. 2 presents a flow chart illustrating the process of detecting multiple anomalies in a cluster of components in accordance with an embodiment of the present invention. The process begins when the system monitors one or more inferential variables (step 202) which are received from sensors in the cluster of components. Next, the system obtains derivatives from the one or more inferential variables (step 204). In some embodiments, while obtaining the derivatives from the one or more inferential variables, the system uses a movingwindow numerical derivative technique which calculates the rateofchange in the one or more inferential variables over a specified time interval. In some embodiments, for inferential variables which have derivatives which can be measured directly, steps 202 and 204 are not performed. In these embodiments, these inferential variables are measured directly using a sensor. For example, the derivative of current can be measured directly using capacitive or inductive detection techniques.
Returning to FIG. 2, the system then monitors derivatives obtained from the one or more inferential variables (step 206). Next, the system determines whether one or more components within the cluster have experienced an anomalous event based on the monitored derivatives (step 208). In some embodiments, the anomalous event includes: a failure event, a fault event, and a recovery event. If so (step 210, yes), the system performs one or more remedial actions (step 212). In some embodiments, the one or more remedial actions includes one or more of: generating warnings; replacing failed components; reporting the actual number of components that have failed; reporting the probability that a specified number of components have failed; monitoring additional variables monitored by sensors in the cluster of components; adjusting the frequency at which the one or more variables are polled by the sensors; adjusting test conditions during an acceleratedlife study of the components; scheduling the failed components to be replaced at the next scheduled maintenance interval; and estimating the remaining useful life of the components. Note that other remedial actions not listed above can be also performed.
If the system determines that one or more components within the cluster have not experienced an anomalous event (step 210, no) or after the system performs one or more remedial actions (step 212), the system determines whether to continue monitoring derivatives of the one or more inferential variables (step 214). If so (step 216, yes), the system returns to step 202. Otherwise (step 216, no), the process ends.
FIG. 3 presents a flow chart illustrating another process of detecting multiple anomalies in a cluster of components in accordance with an embodiment of the present invention. The process begins when the system monitors one or more inferential variables (step 302) which are received from sensors in the cluster of components. Next, the system obtains derivatives from the one or more inferential variables (step 304). The system then monitors derivatives obtained from one or more inferential variables (step 306) which are received from sensors in the cluster of components.
Next, the system determines whether one or more specified events occurred within the cluster of components based on the monitored derivatives (step 308). If so (step 310, yes), the system determines the probability that the cluster of components is in a specified state based on the one or more specified events that occurred (step 312). Note that step 312 is described in more detail with reference to FIG. 5 below.
If the system determines that one or more specified events have not occurred within the cluster of components (step 310, no) or after the system determines the probability that the cluster of components is in a specified state based on the one or more specified events that occurred (step 312), the system determines whether to continue monitoring derivatives of the one or more inferential variables (step 314). If so (step 316, yes), the system returns to step 302. Otherwise (step 316, no), the process ends.
In some embodiments, steps 302 and 304 are not performed. In these embodiments, the derivatives of the one or more inferential variables are monitored directly from the sensors within the cluster of components.
FIG. 4 presents a flow chart illustrating a process of performing one or more remedial actions in accordance with an embodiment of the present invention. This process begins after the system determines the probability that the cluster of components is in a specified state (e.g., step 312 in FIG. 3). The process begins when the system determines whether the probability that the cluster of components is in the specified state meets specified criteria (step 400). For example, the system can determine that the specified criteria is met when the probability that the cluster of components is in state 1 exceeds 10% but the probability that the cluster of components is in state 2 is below 50%. If so (step 402, yes), the system performs one or more remedial actions (step 404). For example, the system can perform one or more of the remedial actions described above. After the system performs one or more remedial actions (step 404) or if the system determines that the probability that the cluster of components is in the specified state does not meet specified criteria (step 402, no), the system continues to step 314 in FIG. 3.
In some embodiments, the one or more specified events include one or more of: a first event wherein the value of the inferential variable is increasing; and a second event wherein the value of the inferential variable is decreasing.
FIG. 5 presents a flow chart illustrating a process of determining the probability that the cluster of components is in a specified state in accordance with an embodiment of the present invention. The process begins when the system determines the number of first events x that occurred (step 500). Next, the system determines the number of second events y that occurred (step 502). The system then determine the probability p<sub>i,j </sub>that the cluster of components is in a specified state i,j> as:
<maths id="MATHUS00001" num="00001"><math overflow="scroll"><mrow><msub><mi>p</mi><mrow><mi>i</mi><mo>,</mo><mi>j</mi></mrow></msub><mo>=</mo><mfrac><msup><mrow><mo></mo><msubsup><mi>a</mi><mrow><mi>i</mi><mo>,</mo><mi>j</mi></mrow><mi>′</mi></msubsup><mo></mo></mrow><mn>2</mn></msup><mrow><munder><mo>∑</mo><mrow><mi>k</mi><mo>,</mo><mi>l</mi></mrow></munder><mo></mo><msup><mrow><mo></mo><msubsup><mi>a</mi><mrow><mi>k</mi><mo>,</mo><mi>l</mi></mrow><mi>′</mi></msubsup><mo></mo></mrow><mn>2</mn></msup></mrow></mfrac></mrow></math></maths>
(step 504), where a′<sub>i,j </sub>is the coefficient associated with the specified state i,j>, wherein −y≦i≦x and −x≦j≦y, −y≦k≦x and −x≦l≦y; and
<maths id="MATHUS00002" num="00002"><math overflow="scroll"><mrow><mrow><munder><mo>∑</mo><mrow><mi>i</mi><mo>,</mo><mi>j</mi></mrow></munder><mo></mo><msub><mi>p</mi><mrow><mi>i</mi><mo>,</mo><mi>j</mi></mrow></msub></mrow><mo>=</mo><mn>1.</mn></mrow></math></maths>
Note that i, j, k and l are counting indices.
In some embodiments, a Sequential Probability Ratio Test (SPRT) is used to monitor the derivative of the inferential variable time series to identify the onset time and completion time of component degradation and/or failure. Note that if the normal operation is defined in terms of the inferential variable itself (e.g., I(t)) instead of the derivative of the inferential variable (e.g., I′(t)), the SPRT alarms corresponding to the signal (i.e., SPRT 1 and SPRT 2) must be reset after each degradation or fault event, or recurring training periods for the inferential variable must be performed after each degradation or fault event. Thus, monitoring the derivative of the inferential variable is beneficial because the extra processing required after each degradation/failure event does not need to be performed.
In some embodiments, data gathered from this technique can be combined with poststresstest physical characterization analyses to gain further insights into the physics of failure of the subsystems (or components) of interest, and/or to build a library which can be used to determine one or more remedial actions to be performed.
Embodiments of the present invention allow the constraints on the tradeoff between the number of subsystems (or components) of interest being monitored and the number of detectors and signals being monitored to be reduced, while yielding higherresolution information on the dynamic evolution of the health and degradation processes of the components as a function of cumulative stress (or workload). These embodiments can:
 (1) detect the time of anomaly onset in any individual component under surveillance, even when the overall functionality of that component cannot be measured directly;
 (2) detect the time of anomaly completion in any individual component under surveillance, even when the overall functionality of that component cannot be measured directly;
 (3) account for available system resources (operational subsystems or components) at any time; and/or
 (4) alter the status of the system or test in various manners: (a) stop the test if overstress is indicated, (b) increase monitoring or (c) add additional inferential variables to monitor.Hence, the abovedescribed technique is beneficial because it can detect the exact times of the onset of the degradation and/or failure for individual components undergoing reliability studies with substantially minimal telemetry resources. Furthermore, the abovedescribed technique enables insitu (live) assessment of operational subsystems (or components) within a complex system (such as a cluster of computer systems or network switches).
Note that the following exemplary implementation is described in terms of the current I(t) and the derivative of the current I′(t). However, the exemplary implementation can be applied to any of the inferential variables described above.
Definitions of Defect CategoriesIn some embodiments, the alarms associated with the changes in the signal derivative, SPRT5 & SPRT6, correspond to the occurrences of two types of anomalies. The first type of anomaly can be referred to as a Type 1 defect, d<sub>1</sub>, and causes an increase of the inferential signal. This change can be captured by an SPRT5 alarm, when the derivative of the inferential signal becomes positive, from its quiescent value of zero (as shown in FIG. 7A). The second type of anomaly can be classified as a Type 2 defect, d<sub>2</sub>, and causes a decrease of the inferential signal (as shown in FIG. 7B). This change can be captured by an SPRT6 alarm, when the derivative of the inferential signal becomes negative. In some embodiments, Type 1 and Type 2 defects are reversible, and recovery events (e.g., selfhealing, selfrepair, selfresets, etc.) are possible. A recovery event of Type 1 can trigger an SPRT6 alarm (as shown in FIG. 7B), while a recovery event of Type 2 can trigger an SPRT5 alarm (as shown in FIG. 7A).
System State and Method to Monitor System StateThe flow chart in FIG. 6 illustrates an exemplary technique for monitoring the state of a complex system in terms of the current count of Type 1 and Type 2 defects. At step 600, the system assesses the initial state of the system (e.g., initialization step). The initial numbers m and n of Type 1 and Type 2 defects, respectively, are recorded. This can be denoted using the following compact notation for the state of the system: m,n>. The count of Type 1 defects is listed first, on the left side of the m,> bracket, while the count of Type 2 defects is listed second, on the right side of the ,n> bracket. In some embodiments, the monitoring process is started when there are no defects present, m=0 and n=0 (e.g., this occurs for a new or newly repaired system). The state containing no defects can be referred to as the system ground state, 0,0>. Furthermore, the monitoring process may be initiated at an arbitrary time in the life of a complex system; therefore, the initial defect count may be finite, such that in general m≧0 and n≧0.
Next, the system sets the counting indices i and j to zero (step 602). The system then monitors I′(t) (step 604). Next, the system determines whether an SPRT alarm is generated. If no SPRT alarm is generated (step 606, 0), the system determines whether i+j=0 (step 622). If so, (step 622, Y), the system returns to step 604. Otherwise (step 622, N), the system returns to step 602.
If at step 606, the system determines that an SPRT 5 alarm is generated (step 606, 5), the system determines whether j>0 (step 608). If so (step 608, Y), the system sets j=0 (step 610). After the system sets j=0 (step 610) or after the system determines that j≦0 (step 608, N), the system determines whether i<1 (step 612). If so (step 612, Y), the system updates the system state (step 614).
The system then determines whether the system state is acceptable (step 616). If the system determines that the system state is acceptable (step 616, Y) or the system determines that i≧1 (step 612, N), the system increments i by 1 (step 618) and returns to step 604. If the system determines that the system state is not acceptable (step 616, N), the system performs one or more remedial actions (step 620). The system then returns to step 600.
If at step 606, the system determines that an SPRT 6 alarm is generated (step 606, 6), the system determines whether i≧0 (step 624). If so (step 624, Y), the system sets i=0 (step 626). After the system sets i=0 (step 626) or after the system determines that i≦0 (step 624, N), the system determines whether j<1 (step 628). If so (step 628, Y), the system updates the system state (step 630).
The system then determines whether the system state is acceptable (step 632). If the system determines that the system state is acceptable (step 632, Y) or the system determines that j≧1 (step 628, N), the system increments j by 1 and returns to step 604. If the system determines that the system state is not acceptable (step 632, N), the system performs one or more remedial actions (step 620). The system then returns to step 600.
Note that in steps 632 and 616, while determining whether the system state is acceptable, the system can determine whether the probability that the system is in a specified state meets specified criteria. For example, if the probability that the system is in state 1 is greater than 10% and the probability that the system is in state 2 is less than 50%, the system can perform a remedial action.
The operations illustrated in FIG. 6 are described in more detail below.
Method to Update System StateInitial State with Arbitrary (but Finite) Defect Count m>0 & n>0
Some embodiments of the present invention use an event operator to determine the probability that the system is in a specified state. For example, during the event illustrated in FIG. 7A (enunciated by an SPRT5 alarm) the system transitions from an initial state with m Type 1 defects and n Type 2 defects, m,n>, into either (i) a state containing one extra Type 1 defect, m+1,n>, or (ii) a state containing one less Type 2 defect, m,n−1>. That is equivalent to a failure event of Type 1, or a recovery event of Type 2. For m≧0 and n>0, the operator U defined below describes mathematically the event in FIG. 7A.
<FORM>Um,n>=cm+1,n>+sm,n−1> (1)</FORM>
After the application of operator U (occurrence of SPRT5 alarm) the new system state consists of a linear combination of possible defect counts, m+1,n> and m,n−1>. The coefficients in Eq. 1 are assigned such that:
<FORM>c<sup>2</sup>+s<sup>2</sup>=1 (2)</FORM>
The physical meaning of Eqs. 1 and 2 is that once the event operator U acts on the initial state m,n>, the probability for the system to be in the final state m+1,n> is equal to c<sup>2</sup>, and the probability for the system to be in the final state m,n−1> is equal to s<sup>2</sup>. Moreover, the probability that the final system state be either m+1,n> or m,n−1> is equal to 1.
Similarly, during the event illustrated in FIG. 7B (enunciated by an SPRT6 alarm) the system transitions from an initial state with m Type 1 defects and n Type 2 defects, m,n>, either (i) into a state containing one less Type 1 defect, m−1,n>, or (ii) into a state containing one extra Type 2 defect, m,n+1>. Again, that is equivalent to a recovery event of Type 1, or a failure event of Type 2. For m>0 and n≧0, the operator V defined below describes mathematically the event in FIG. 7B.
<FORM>Vm,n>=s′m−1,n>+c′m,n+1> (3)</FORM>
After the application of operator V (occurrence of SPRT6 alarm) the new system state consists of a linear combination of possible defect counts, m−1,n> and m,n+1>. The coefficients in Eq. 3 are assigned such that:
<FORM>c′<sup>2</sup>+s′<sup>2</sup>=1 (4)</FORM>
The physical meaning of Eqs. 3 and 4 is that once the event operator V acts on the initial state m,n>, the probability for the system to be in the final state m−1,n> is equal to c′<sup>2</sup>, and the probability for the system to be in the final state m,n+1> is equal to s′<sup>2</sup>. Moreover, the probability that the final system state can be either m−1,n> or m,n+1> is equal to 1.
Note that one could assess deterministically the final system state (e.g., after the onset of an SPRT5 alarm) by stopping the monitoring process and inspecting every system component, thus determining with certainty which of the states m+1,n> or m,n−1> the system is in. The advantage of embodiments of the present invention is that one is able to assess probabilistically the final system state without interrupting the monitoring process. Moreover, the method summarized by Eqs. 14 allows for the probabilistic estimation of the final system state (even when multiple SPRT5 and SPRT6 alarms occur) by sequentially applying the eventoperators U and V (Eqs. 1 and 3) as discussed below. Finally, the probability ratio between a Type 1 failure event and a Type 2 recovery (see Eq. 1) is not necessarily equivalent to the probability ratio between a Type 1 recovery event and a Type 2 failure event (e.g., Eq. 3), even though they could be. Therefore, the set of equalities c=c′ and s=s′, or c=s′ and s=c′ are not required to be true. See the limiting cases discussed below.
The results obtained from Eqs. 1 and 3 are summarized in Table 1.
<tables id="TABLEUS00001" num="00001"><table frame="none" colsep="0" rowsep="0"><tgroup align="left" colsep="0" rowsep="0" cols="4"><colspec colname="1" colwidth="77pt" align="left"/><colspec colname="2" colwidth="21pt" align="left"/><colspec colname="3" colwidth="77pt" align="left"/><colspec colname="4" colwidth="42pt" align="left"/><thead><row><entry namest="1" nameend="4" rowsep="1">TABLE 1</entry></row><row><entry namest="1" nameend="4" align="center" rowsep="1"/></row><row><entry>Initial defect combination</entry><entry>Event</entry><entry>Final defect combination</entry><entry>Coefficient</entry></row><row><entry namest="1" nameend="4" align="center" rowsep="1"/></row></thead><tbody valign="top"><row><entry>m, n></entry><entry>U</entry><entry>m + 1, n></entry><entry>c</entry></row><row><entry/><entry/><entry>m, n − 1></entry><entry>s</entry></row><row><entry/><entry>V</entry><entry>m − 1, n></entry><entry>s′</entry></row><row><entry/><entry/><entry>m, n + 1></entry><entry>c′</entry></row><row><entry namest="1" nameend="4" align="center" rowsep="1"/></row></tbody></tgroup></table></tables>
FIG. 8A illustrates a situation where an SPRT5 alarm is followed at a later time by an SPRT6 alarm. By sequentially applying the operators U, then V to the initial system state m,n>, one can assess probabilistically the final system state of the system.
Using Eqs. 1 and 3, one obtains:
<maths id="MATHUS00003" num="00003"><math overflow="scroll"><mtable><mtr><mtd><mtable><mtr><mtd><mrow><mrow><mi>VU</mi><mo></mo><mi>m</mi></mrow><mo>,</mo><mrow><mi>n</mi><mo>>=</mo><mi/><mo></mo><mrow><mi>V</mi><mo></mo><mrow><mo>(</mo><mrow><mrow><mi>U</mi><mo></mo><mi>m</mi></mrow><mo>,</mo><mrow><mi>n</mi><mo><</mo></mrow></mrow><mo>)</mo></mrow></mrow></mrow></mrow></mtd></mtr><mtr><mtd><mrow><mo>=</mo><mi/><mo></mo><mrow><mi>V</mi><mo></mo><mrow><mo>(</mo><mrow><mrow><mi>c</mi><mo></mo><mrow><mo></mo><mrow><mrow><mi>m</mi><mo>+</mo><mn>1</mn></mrow><mo>,</mo><mrow><mi>n</mi><mo>></mo><mrow><mo>+</mo><mi>s</mi></mrow></mrow></mrow><mo></mo></mrow><mo></mo><mi>m</mi></mrow><mo>,</mo><mrow><mrow><mi>n</mi><mo></mo><mn>1</mn></mrow><mo>></mo></mrow></mrow><mo>)</mo></mrow></mrow></mrow></mtd></mtr><mtr><mtd><mrow><mrow><mo>=</mo><mi/><mo></mo><mrow><mi>cV</mi><mo></mo><mrow><mo></mo><mrow><mrow><mi>m</mi><mo>+</mo><mn>1</mn></mrow><mo>,</mo><mrow><mi>n</mi><mo>></mo><mrow><mo>+</mo><mi>sV</mi></mrow></mrow></mrow><mo></mo></mrow><mo></mo><mi>m</mi></mrow></mrow><mo>,</mo><mrow><mrow><mi>n</mi><mo></mo><mn>1</mn></mrow><mo>></mo></mrow></mrow></mtd></mtr><mtr><mtd><mrow><mo>=</mo><mi/><mo></mo><mrow><mrow><mi>c</mi><mo></mo><mrow><mo>(</mo><mrow><mrow><mrow><msup><mi>s</mi><mi>′</mi></msup><mo></mo><mrow><mo></mo><mrow><mi>m</mi><mo>,</mo><mrow><mi>n</mi><mo>></mo><mrow><mo>+</mo><msup><mi>c</mi><mi>′</mi></msup></mrow></mrow></mrow><mo></mo></mrow><mo></mo><mi>m</mi></mrow><mo>+</mo><mn>1</mn></mrow><mo>,</mo><mrow><mrow><mi>n</mi><mo>+</mo><mn>1</mn></mrow><mo>></mo></mrow></mrow><mo>)</mo></mrow></mrow><mo>+</mo></mrow></mrow></mtd></mtr><mtr><mtd><mrow><mi/><mo></mo><mrow><mi>s</mi><mo></mo><mrow><mo>(</mo><mrow><mrow><msup><mi>s</mi><mi>′</mi></msup><mo></mo><mrow><mo></mo><mrow><mrow><mi>m</mi><mo></mo><mn>1</mn></mrow><mo>,</mo><mrow><mrow><mi>n</mi><mo></mo><mn>1</mn></mrow><mo>></mo><mrow><mo>+</mo><msup><mi>c</mi><mi>′</mi></msup></mrow></mrow></mrow><mo></mo></mrow><mo></mo><mi>m</mi></mrow><mo>,</mo><mrow><mi>n</mi><mo>></mo></mrow></mrow><mo>)</mo></mrow></mrow></mrow></mtd></mtr><mtr><mtd><mrow><mrow><mrow><mrow><mo>=</mo><mi/><mo></mo><mrow><msup><mi>s</mi><mi>′</mi></msup><mo></mo><mi>s</mi><mo></mo><mrow><mo></mo><mrow><mrow><mi>m</mi><mo></mo><mn>1</mn></mrow><mo>,</mo><mrow><mrow><mi>n</mi><mo></mo><mn>1</mn></mrow><mo>></mo><mrow><mo>+</mo><mrow><mo>(</mo><mrow><mrow><msup><mi>c</mi><mi>′</mi></msup><mo></mo><mi>s</mi></mrow><mo>+</mo><mrow><msup><mi>s</mi><mi>′</mi></msup><mo></mo><mi>c</mi></mrow></mrow><mo>)</mo></mrow></mrow></mrow></mrow><mo></mo></mrow><mo></mo><mi>m</mi></mrow></mrow><mo>,</mo><mrow><mi>n</mi><mo>></mo></mrow></mrow><mo>)</mo></mrow><mo>+</mo></mrow></mtd></mtr><mtr><mtd><mrow><mi/><mo></mo><mrow><msup><mi>c</mi><mi>′</mi></msup><mo></mo><mi>c</mi><mo></mo><mrow><mo></mo><mrow><mrow><mi>m</mi><mo>+</mo><mn>1</mn></mrow><mo>,</mo><mrow><mrow><mi>n</mi><mo>+</mo><mn>1</mn></mrow><mo>></mo></mrow></mrow><mo>)</mo></mrow></mrow></mrow></mtd></mtr></mtable></mtd><mtd><mrow><mo>(</mo><mn>5</mn><mo>)</mo></mrow></mtd></mtr></mtable></math></maths>
FIG. 8B illustrates a situation where an SPRT6 alarm is followed at a later time by an SPRT5 alarm. Similar to Eq. 5, one can also show that:
<FORM>UVm,n>=VUm,n> (6)</FORM>
The results obtained from Eqs. 5 and 6 are summarized in Table 2.
<tables id="TABLEUS00002" num="00002"><table frame="none" colsep="0" rowsep="0"><tgroup align="left" colsep="0" rowsep="0" cols="4"><colspec colname="1" colwidth="49pt" align="left"/><colspec colname="2" colwidth="35pt" align="left"/><colspec colname="3" colwidth="84pt" align="left"/><colspec colname="4" colwidth="49pt" align="left"/><thead><row><entry namest="1" nameend="4" rowsep="1">TABLE 2</entry></row><row><entry namest="1" nameend="4" align="center" rowsep="1"/></row><row><entry>Initial defect</entry><entry/><entry/><entry/></row><row><entry>combination</entry><entry>Events</entry><entry>Final defect combination</entry><entry>Coefficient</entry></row><row><entry namest="1" nameend="4" align="center" rowsep="1"/></row></thead><tbody valign="top"><row><entry>m, n></entry><entry>UV</entry><entry>m − 1, n − 1></entry><entry>s′s</entry></row><row><entry/><entry>or</entry><entry>m, n></entry><entry>(sc′ + s′c)</entry></row><row><entry/><entry>VU</entry><entry>m + 1, n + 1></entry><entry>c′c</entry></row><row><entry namest="1" nameend="4" align="center" rowsep="1"/></row></tbody></tgroup></table></tables>
Based on the results summarized in Table 2, the events illustrated in FIGS. 8A8B render 3 possible defect combinations in the final system state: (i) m+1,n+1> wherein the system acquires 1 extra Type 1 defect and 1 extra Type 2 defect; (ii) m,n> wherein the system returns to the initial state after it went through a Type 1 failure followed by a Type 1 recovery, or a Type 2 recovery followed by a Type 2 failure; (iii) m−1,n−1> wherein the system sheds one Type 1 defect and one Type 2 defect. The probability of each final defect count (i)(iii) is proportional to the square of the coefficient for the respective terms in Table 2.
FIG. 9A illustrates a situation where an SPRT5 alarm is followed at a later time by an SPRT6 alarm, which is then followed by another SPRT6 alarm, then the event sequence ends with an SPRT5 alarm. By sequentially applying the operators U, V, V, U to the initial system state m,n>, one can assess probabilistically the final system state of the system.
Using Eqs. 1 and 3 and performing the calculation in the manner detailed by Eq. 5 above, one obtains the results summarized in Table 3. FIG. 9B illustrates a situation where an SPRT6 alarm is followed at a later time by an SPRT5 alarm, which is then followed by another SPRT5 alarm, then the event sequence ends with an SPRT6 alarm. The results corresponding to FIG. 9B are also summarized in Table 3.
Based on the results summarized in Table 3, the events illustrated in FIGS. 9A9B render 5 possible defect combinations in the final system state: (i) m+2,n+2> wherein the system acquires 2 extra Type 1 defects and 2 extra Type 2 defects; (ii) m+1,n+1> wherein the system acquires 1 extra Type 1 defect and 1 extra Type 2 defect; (iii) m,n> wherein the system returns to the initial state after it went through several Type 1 and 2 failures and recoveries; (iv) m−1,n−1> wherein the system sheds 1 Type 1 defect and 1 Type 2 defect; (v) m−2,n−2> wherein the system sheds 2 Type 1 defects and 2 Type 2 defects. The probability of each final defect count (i)(v) is proportional to the square of the coefficient for the respective terms in Table 3.
<tables id="TABLEUS00003" num="00003"><table frame="none" colsep="0" rowsep="0"><tgroup align="left" colsep="0" rowsep="0" cols="4"><colspec colname="1" colwidth="56pt" align="left"/><colspec colname="2" colwidth="35pt" align="left"/><colspec colname="3" colwidth="49pt" align="left"/><colspec colname="4" colwidth="77pt" align="left"/><thead><row><entry namest="1" nameend="4" rowsep="1">TABLE 3</entry></row><row><entry namest="1" nameend="4" align="center" rowsep="1"/></row><row><entry>Initial defect</entry><entry/><entry>Final defect</entry><entry/></row><row><entry>combination</entry><entry>Event</entry><entry>combination</entry><entry>Coefficient</entry></row><row><entry namest="1" nameend="4" align="center" rowsep="1"/></row></thead><tbody valign="top"><row><entry>m, n></entry><entry>UVVU</entry><entry>m − 2, n − 2></entry><entry>s′<sup>2</sup>s<sup>2</sup></entry></row><row><entry/><entry>or</entry><entry>m − 1, n − 1></entry><entry>2(s<sup>2</sup>s′c′ + s′<sup>2</sup>sc)</entry></row><row><entry/><entry>VUUV</entry><entry>m, n></entry><entry>(s′<sup>2</sup>c<sup>2 </sup>+ 4s′sc′c + s<sup>2</sup>c′<sup>2</sup>)</entry></row><row><entry/><entry/><entry>m + 1, n + 1></entry><entry>2(scc′<sup>2 </sup>+ s′c′c<sup>2</sup>)</entry></row><row><entry/><entry/><entry>m + 2, n + 2></entry><entry>c′<sup>2</sup>c<sup>2</sup></entry></row><row><entry namest="1" nameend="4" align="center" rowsep="1"/></row></tbody></tgroup></table></tables>
A few limiting cases are discussed below.
Small Probability for Recovery Events: {s<sup>2</sup>, s′<sup>2</sup>, s′s<sup>2</sup>}<<1, {c<sup>2</sup>, c′<sup>2</sup>}≈<1.
The effect of the events illustrated in FIGS. 8A8B is summarized in Table 4.
<tables id="TABLEUS00004" num="00004"><table frame="none" colsep="0" rowsep="0"><tgroup align="left" colsep="0" rowsep="0" cols="5"><colspec colname="offset" colwidth="14pt" align="left"/><colspec colname="1" colwidth="42pt" align="center"/><colspec colname="2" colwidth="56pt" align="center"/><colspec colname="3" colwidth="56pt" align="left"/><colspec colname="4" colwidth="49pt" align="left"/><thead><row><entry/><entry namest="offset" nameend="4" rowsep="1">TABLE 4</entry></row><row><entry/><entry namest="offset" nameend="4" align="center" rowsep="1"/></row><row><entry/><entry>Initial defect</entry><entry/><entry>Final defect</entry><entry/></row><row><entry/><entry>combination</entry><entry>Events</entry><entry>combination</entry><entry>Coefficient</entry></row><row><entry/><entry namest="offset" nameend="4" align="center" rowsep="1"/></row></thead><tbody valign="top"><row><entry/><entry>m, n></entry><entry>VU or</entry><entry>m, n></entry><entry>(c′s + s′c)</entry></row><row><entry/><entry/><entry>UV</entry><entry>m + 1, n + 1></entry><entry>c′c</entry></row><row><entry/><entry namest="offset" nameend="4" align="center" rowsep="1"/></row></tbody></tgroup></table></tables>
Based on the results summarized in Table 4, the events illustrated in FIGS. 8A8B render 2 possible defect combinations in the final system state: (i) m+1,n+1> wherein the system acquires 1 extra Type 1 defect and 1 extra Type 2 defect; (ii) m,n> wherein the system returns to the initial state after it went through a Type 1 failure followed by a Type 1 recovery, or a Type 2 recovery followed by a Type 2 failure. Note that the probability of final defect count (i) is much larger than the probability of final defect count (ii).
The effect of the events illustrated in FIGS. 9A9B is summarized in Table 5.
<tables id="TABLEUS00005" num="00005"><table frame="none" colsep="0" rowsep="0"><tgroup align="left" colsep="0" rowsep="0" cols="5"><colspec colname="offset" colwidth="14pt" align="left"/><colspec colname="1" colwidth="42pt" align="center"/><colspec colname="2" colwidth="49pt" align="center"/><colspec colname="3" colwidth="56pt" align="left"/><colspec colname="4" colwidth="56pt" align="left"/><thead><row><entry/><entry namest="offset" nameend="4" rowsep="1">TABLE 5</entry></row><row><entry/><entry namest="offset" nameend="4" align="center" rowsep="1"/></row><row><entry/><entry>Initial defect</entry><entry/><entry>Final defect</entry><entry/></row><row><entry/><entry>combination</entry><entry>Events</entry><entry>combination</entry><entry>Coefficient</entry></row><row><entry/><entry namest="offset" nameend="4" align="center" rowsep="1"/></row></thead><tbody valign="top"><row><entry/><entry>m, n></entry><entry>VUUV or</entry><entry>m + 1, n + 1></entry><entry>2(scc′<sup>2 </sup>+ s′c′c<sup>2</sup>)</entry></row><row><entry/><entry/><entry>UVVU</entry><entry>m + 2, n + 2></entry><entry>c′<sup>2</sup>c<sup>2</sup></entry></row><row><entry/><entry namest="offset" nameend="4" align="center" rowsep="1"/></row></tbody></tgroup></table></tables>
Based on the results summarized in Table 5, the events illustrated in FIGS. 9A9B render 2 possible defect combinations in the final system state: (i) m+2,n+2> wherein the system acquires 2 extra Type 1 defects and 2 extra Type 2 defects; (ii) m+1,n+1> wherein the system acquires 1 extra Type 1 defect and 1 extra Type 2 defect; The probability of final defect count (i) is much larger than the probability of final defect count (ii).
(b) No Recovery Events Allowed: s=s′=0, c=c′=1.
The effect of the events illustrated in FIGS. 79 is summarized in Table 6.
Based on the results summarized in Table 6, one recognizes that the eventoperators U and V correctly render the exact (deterministic) defect count of the final system state. For this limiting case, during each event U the system acquires 1 extra Type 1 defect; during each event V the system acquires 1 extra Type 2 defect. For each intermediary step, the probability of knowing the defect count is equal to 1; therefore, the assessment of the final system state is deterministic.
<tables id="TABLEUS00006" num="00006"><table frame="none" colsep="0" rowsep="0"><tgroup align="left" colsep="0" rowsep="0" cols="4"><colspec colname="1" colwidth="63pt" align="center"/><colspec colname="2" colwidth="42pt" align="left"/><colspec colname="3" colwidth="49pt" align="left"/><colspec colname="4" colwidth="63pt" align="center"/><thead><row><entry namest="1" nameend="4" rowsep="1">TABLE 6</entry></row><row><entry namest="1" nameend="4" align="center" rowsep="1"/></row><row><entry>Initial defect</entry><entry/><entry>Final defect</entry><entry/></row><row><entry>combination</entry><entry>Events</entry><entry>combination</entry><entry>Coefficient</entry></row><row><entry namest="1" nameend="4" align="center" rowsep="1"/></row></thead><tbody valign="top"><row><entry>m, n></entry><entry>U</entry><entry>m + 1, n></entry><entry>1</entry></row><row><entry/><entry>V</entry><entry>m, n + 1></entry><entry>1</entry></row><row><entry/><entry>VU or UV</entry><entry>m + 1, n + 1></entry><entry>1</entry></row><row><entry/><entry>UVVU or</entry><entry>m + 2, n + 2></entry><entry>1</entry></row><row><entry/><entry>VUUV</entry></row><row><entry namest="1" nameend="4" align="center" rowsep="1"/></row></tbody></tgroup></table></tables>
FIGS. 1011 illustrate case histories of system monitoring wherein multiple resources/components became unavailable. The effect of the sequence of events shown in FIG. 10 is summarized in Table 7.
<tables id="TABLEUS00007" num="00007"><table frame="none" colsep="0" rowsep="0"><tgroup align="left" colsep="0" rowsep="0" cols="5"><colspec colname="1" colwidth="42pt" align="center"/><colspec colname="2" colwidth="42pt" align="center"/><colspec colname="3" colwidth="49pt" align="center"/><colspec colname="4" colwidth="42pt" align="center"/><colspec colname="5" colwidth="42pt" align="center"/><thead><row><entry namest="1" nameend="5" rowsep="1">TABLE 7</entry></row><row><entry namest="1" nameend="5" align="center" rowsep="1"/></row><row><entry>Initial defect</entry><entry/><entry>Final defect</entry><entry/><entry/></row><row><entry>combination</entry><entry>Events</entry><entry>combination</entry><entry>Coefficient</entry><entry>Probability</entry></row><row><entry namest="1" nameend="5" align="center" rowsep="1"/></row></thead><tbody valign="top"><row><entry>m, n></entry><entry>UUUVUU</entry><entry>m + 5, n + 1></entry><entry>1</entry><entry>100%</entry></row><row><entry namest="1" nameend="5" align="center" rowsep="1"/></row></tbody></tgroup></table></tables>
The effect of the sequence of events illustrated in FIG. 11 is summarized in Table 8.
<tables id="TABLEUS00008" num="00008"><table frame="none" colsep="0" rowsep="0"><tgroup align="left" colsep="0" rowsep="0" cols="5"><colspec colname="1" colwidth="42pt" align="center"/><colspec colname="2" colwidth="42pt" align="center"/><colspec colname="3" colwidth="49pt" align="center"/><colspec colname="4" colwidth="42pt" align="center"/><colspec colname="5" colwidth="42pt" align="center"/><thead><row><entry namest="1" nameend="5" rowsep="1">TABLE 8</entry></row><row><entry namest="1" nameend="5" align="center" rowsep="1"/></row><row><entry>Initial defect</entry><entry/><entry>Final defect</entry><entry/><entry/></row><row><entry>combination</entry><entry>Events</entry><entry>combination</entry><entry>Coefficient</entry><entry>Probability</entry></row><row><entry namest="1" nameend="5" align="center" rowsep="1"/></row></thead><tbody valign="top"><row><entry>m, n></entry><entry>UVUVUVU</entry><entry>m + 4, n + 3></entry><entry>1</entry><entry>100%</entry></row><row><entry namest="1" nameend="5" align="center" rowsep="1"/></row></tbody></tgroup></table></tables>
Based on the results summarized in Tables 68, one notices that by the time the system reaches its final state, it has acquired a number of additional Type 1 defects equal to the number of recorded U events, and a number of additional Type 2 defects equal to the number of recorded V events.
(c) No Defects of Type 2 Allowed: c=s′=1, s=c′=0.
When no Type 2 defects are permitted, the system state is defined as m,n=0>=m>, where m is the count of Type 1 defects. The effect of the events illustrated in FIGS. 79 is summarized in Table 9.
<tables id="TABLEUS00009" num="00009"><table frame="none" colsep="0" rowsep="0"><tgroup align="left" colsep="0" rowsep="0" cols="4"><colspec colname="1" colwidth="63pt" align="center"/><colspec colname="2" colwidth="56pt" align="left"/><colspec colname="3" colwidth="56pt" align="left"/><colspec colname="4" colwidth="42pt" align="center"/><thead><row><entry namest="1" nameend="4" rowsep="1">TABLE 9</entry></row><row><entry namest="1" nameend="4" align="center" rowsep="1"/></row><row><entry>Initial defect count</entry><entry>Events</entry><entry>Final defect count</entry><entry>Coefficient</entry></row><row><entry namest="1" nameend="4" align="center" rowsep="1"/></row></thead><tbody valign="top"><row><entry>m, 0> = m></entry><entry>U</entry><entry>m + 1></entry><entry>1</entry></row><row><entry/><entry>V</entry><entry>m − 1></entry><entry>1</entry></row><row><entry/><entry>VU or UV</entry><entry>m></entry><entry>1</entry></row><row><entry/><entry>UVVU or VUUV</entry><entry>m></entry><entry>1</entry></row><row><entry namest="1" nameend="4" align="center" rowsep="1"/></row></tbody></tgroup></table></tables>
Based on the results summarized in Table 9, one recognizes that the eventoperators U and V correctly render the exact (deterministic) defect count of the final system state. For this limiting case, during each event U, the system acquires one extra Type 1 defect and during each event V, the system sheds one Type 1 defect. For each intermediary step, the probability of knowing the defect count is equal to 1. Therefore, the assessment of the final system state is deterministic.
The effect of the sequence of events illustrated in FIG. 10 is summarized in Table 10.
<tables id="TABLEUS00010" num="00010"><table frame="none" colsep="0" rowsep="0"><tgroup align="left" colsep="0" rowsep="0" cols="5"><colspec colname="1" colwidth="42pt" align="center"/><colspec colname="2" colwidth="42pt" align="center"/><colspec colname="3" colwidth="42pt" align="center"/><colspec colname="4" colwidth="42pt" align="center"/><colspec colname="5" colwidth="49pt" align="center"/><thead><row><entry namest="1" nameend="5" rowsep="1">TABLE 10</entry></row><row><entry namest="1" nameend="5" align="center" rowsep="1"/></row><row><entry>Initial defect</entry><entry/><entry>Final defect</entry><entry/><entry/></row><row><entry>combination</entry><entry>Events</entry><entry>combination</entry><entry>Coefficient</entry><entry>Probability</entry></row><row><entry namest="1" nameend="5" align="center" rowsep="1"/></row></thead><tbody valign="top"><row><entry>m></entry><entry>UUUVUU</entry><entry>m + 4></entry><entry>1</entry><entry>100%</entry></row><row><entry namest="1" nameend="5" align="center" rowsep="1"/></row></tbody></tgroup></table></tables>
The effect of the sequence of events illustrated in FIG. 11 is summarized in Table 11.
<tables id="TABLEUS00011" num="00011"><table frame="none" colsep="0" rowsep="0"><tgroup align="left" colsep="0" rowsep="0" cols="5"><colspec colname="1" colwidth="42pt" align="center"/><colspec colname="2" colwidth="42pt" align="center"/><colspec colname="3" colwidth="42pt" align="center"/><colspec colname="4" colwidth="42pt" align="center"/><colspec colname="5" colwidth="49pt" align="center"/><thead><row><entry namest="1" nameend="5" rowsep="1">TABLE 11</entry></row><row><entry namest="1" nameend="5" align="center" rowsep="1"/></row><row><entry>Initial defect</entry><entry/><entry>Final defect</entry><entry/><entry/></row><row><entry>combination</entry><entry>Events</entry><entry>combination</entry><entry>Coefficient</entry><entry>Probability</entry></row><row><entry namest="1" nameend="5" align="center" rowsep="1"/></row></thead><tbody valign="top"><row><entry>m></entry><entry>UVUVUVU</entry><entry>m + 1></entry><entry>1</entry><entry>100%</entry></row><row><entry namest="1" nameend="5" align="center" rowsep="1"/></row></tbody></tgroup></table></tables>
Based on the results summarized in Tables 911, one notices that by the time the system reaches its final state, it has acquired a number of additional Type 1 defects equal to the difference between the number of recorded U and V events.
(d) Equal Probability Ratios for Failure & Recovery Between Defects d<sub>1 </sub>& d<sub>2</sub>: c=c′, s=s′.
The effect of the events illustrated in FIGS. 8A8B is summarized in Table 12.
<tables id="TABLEUS00012" num="00012"><table frame="none" colsep="0" rowsep="0"><tgroup align="left" colsep="0" rowsep="0" cols="5"><colspec colname="offset" colwidth="14pt" align="left"/><colspec colname="1" colwidth="42pt" align="center"/><colspec colname="2" colwidth="49pt" align="center"/><colspec colname="3" colwidth="49pt" align="left"/><colspec colname="4" colwidth="63pt" align="center"/><thead><row><entry/><entry namest="offset" nameend="4" rowsep="1">TABLE 12</entry></row><row><entry/><entry namest="offset" nameend="4" align="center" rowsep="1"/></row><row><entry/><entry>Initial defect</entry><entry/><entry>Final defect</entry><entry/></row><row><entry/><entry>combination</entry><entry>Events</entry><entry>combination</entry><entry>Coefficient</entry></row><row><entry/><entry namest="offset" nameend="4" align="center" rowsep="1"/></row></thead><tbody valign="top"><row><entry/><entry>m, n></entry><entry>UV or VU</entry><entry>m − 1, n − 1></entry><entry>s<sup>2</sup></entry></row><row><entry/><entry/><entry/><entry>m, n></entry><entry>2sc</entry></row><row><entry/><entry/><entry/><entry>m + 1, n + 1></entry><entry>c<sup>2</sup></entry></row><row><entry/><entry namest="offset" nameend="4" align="center" rowsep="1"/></row></tbody></tgroup></table></tables>
Based on the results summarized in Table 12, the events illustrated in FIGS. 8A8B render 3 possible defect combinations in the final system state: (i) m+1,n+1> wherein the system acquires 1 extra Type 1 defect and 1 extra Type 2 defect; (ii) m,n> wherein the system returns to the initial state after it went through a Type 1 failure followed by a Type 1 recovery, or a Type 2 recovery followed by a Type 2 failure; (iii) m−1,n−1> wherein the system sheds one Type 1 defect and one Type 2 defect. Note that the probability of each final defect count (i)(iii) is proportional to the square of the coefficient for the respective terms in Table 12.
The effect of the events illustrated in FIGS. 9A9B is summarized in Table 13.
<tables id="TABLEUS00013" num="00013"><table frame="none" colsep="0" rowsep="0"><tgroup align="left" colsep="0" rowsep="0" cols="4"><colspec colname="1" colwidth="70pt" align="center"/><colspec colname="2" colwidth="42pt" align="left"/><colspec colname="3" colwidth="56pt" align="left"/><colspec colname="4" colwidth="49pt" align="left"/><thead><row><entry namest="1" nameend="4" rowsep="1">TABLE 13</entry></row><row><entry namest="1" nameend="4" align="center" rowsep="1"/></row><row><entry>Initial defect</entry><entry/><entry>Final defect</entry><entry/></row><row><entry>combination</entry><entry>Events</entry><entry>combination</entry><entry>Coefficient</entry></row><row><entry namest="1" nameend="4" align="center" rowsep="1"/></row></thead><tbody valign="top"><row><entry>m, n></entry><entry>UVVU</entry><entry>m − 2, n − 2></entry><entry>s<sup>4</sup></entry></row><row><entry/><entry>or</entry><entry>m − 1, n − 1></entry><entry>4s<sup>3</sup>c</entry></row><row><entry/><entry>VUUV</entry><entry>m, n></entry><entry>6s<sup>2</sup>c<sup>2</sup></entry></row><row><entry/><entry/><entry>m + 1, n + 1></entry><entry>4sc<sup>3</sup></entry></row><row><entry/><entry/><entry>m + 2, n + 2></entry><entry>c<sup>4</sup></entry></row><row><entry namest="1" nameend="4" align="center" rowsep="1"/></row></tbody></tgroup></table></tables>
Based on the results summarized in Table 13, the events illustrated in FIGS. 9A9B render 5 possible defect combinations in the final system state: (i) m+2,n+2> wherein the system acquires 2 extra Type 1 defects and 2 extra Type 2 defects; (ii) m+1,n+1> wherein the system acquires 1 extra Type 1 defect and 1 extra Type 2 defect; (iii) m,n> wherein the system returned to the initial state after it went through several Type 1 and 2 failures and recoveries; (iv) m−1,n−1> wherein the system sheds one Type 1 defect and one Type 2 defect; (v) m−2,n−2> wherein the system sheds 2 Type 1 defects and 2 Type 2 defects. Note that the probability of each final defect count (i)(v) is proportional to the square of the coefficient for the respective terms in Table 13.
(e) Equal Probabilities for Failure & Recovery for Defects d<sub>1 </sub>& d<sub>2</sub>: c=s=1/√2, c′=s′=1/√2.
The effect of the events illustrated in FIGS. 7A7B is summarized in Table 14.
<tables id="TABLEUS00014" num="00014"><table frame="none" colsep="0" rowsep="0"><tgroup align="left" colsep="0" rowsep="0" cols="5"><colspec colname="1" colwidth="49pt" align="center"/><colspec colname="2" colwidth="28pt" align="center"/><colspec colname="3" colwidth="42pt" align="left"/><colspec colname="4" colwidth="49pt" align="center"/><colspec colname="5" colwidth="49pt" align="center"/><thead><row><entry namest="1" nameend="5" rowsep="1">TABLE 14</entry></row><row><entry namest="1" nameend="5" align="center" rowsep="1"/></row><row><entry>Initial defect</entry><entry/><entry>Final defect</entry><entry/><entry/></row><row><entry>combination</entry><entry>Event</entry><entry>combination</entry><entry>Coefficient</entry><entry>Probability</entry></row><row><entry namest="1" nameend="5" align="center" rowsep="1"/></row></thead><tbody valign="top"><row><entry>m, n></entry><entry>U</entry><entry>m + 1, n></entry><entry>1/√2</entry><entry>50%</entry></row><row><entry/><entry/><entry>m, n − 1></entry><entry>1/√2</entry><entry>50%</entry></row><row><entry/><entry>V</entry><entry>m − 1, n></entry><entry>1/√2</entry><entry>50%</entry></row><row><entry/><entry/><entry>m, n + 1></entry><entry>1/√2</entry><entry>50%</entry></row><row><entry namest="1" nameend="5" align="center" rowsep="1"/></row></tbody></tgroup></table></tables>
Based on the results summarized in Table 14, after an event U there is 50% probability the system could end up in a state containing 1 extra Type 1 defect; and 50% probability in a state containing 1 less Type 2 defect. After an event V there is 50% probability the system could end up in a state containing 1 less Type 1 defect; and 50% probability in a state containing 1 extra Type 2 defect.
The effect of the events illustrated in FIGS. 8A8B is summarized in Table 15.
<tables id="TABLEUS00015" num="00015"><table frame="none" colsep="0" rowsep="0"><tgroup align="left" colsep="0" rowsep="0" cols="5"><colspec colname="1" colwidth="42pt" align="center"/><colspec colname="2" colwidth="35pt" align="left"/><colspec colname="3" colwidth="49pt" align="left"/><colspec colname="4" colwidth="42pt" align="center"/><colspec colname="5" colwidth="49pt" align="center"/><thead><row><entry namest="1" nameend="5" rowsep="1">TABLE 15</entry></row><row><entry namest="1" nameend="5" align="center" rowsep="1"/></row><row><entry>Initial defect</entry><entry/><entry>Final defect</entry><entry/><entry/></row><row><entry>combination</entry><entry>Events</entry><entry>combination</entry><entry>Coefficient</entry><entry>Probability</entry></row><row><entry namest="1" nameend="5" align="center" rowsep="1"/></row></thead><tbody valign="top"><row><entry>m, n></entry><entry>UV</entry><entry>m − 1, n − 1></entry><entry>½</entry><entry>17%</entry></row><row><entry/><entry>or</entry><entry>m, n></entry><entry>1</entry><entry>66%</entry></row><row><entry/><entry>VU</entry><entry>m + 1, n + 1></entry><entry>½</entry><entry>17%</entry></row><row><entry namest="1" nameend="5" align="center" rowsep="1"/></row></tbody></tgroup></table></tables>
Based on the results summarized in Table 15, the events illustrated in FIGS. 8A8B render 3 possible defect combinations in the final system state: (i) m+1,n+1> wherein the system acquires 1 extra Type 1 defect and 1 extra Type 2 defect; (ii) m,n> wherein the system returns to the initial state after it went through a Type 1 failure followed by a Type 1 recovery, or a Type 2 recovery followed by a Type 2 failure; (iii) m−1,n−1> wherein the system sheds one Type 1 defect and one Type 2 defect. Note that the probability for each final defect count (i)(iii) was obtained by dividing the square of each coefficient to the sum of the squares of coefficients as shown in the equations below:
<FORM>Sum of squares of coefficients=(1/4+1+1/4)=3/2</FORM>
<FORM>Probability (i) & (iii)=(1/4)/(3/2)=1/6</FORM>
<FORM>Probability (ii)=1/(3/2)=2/3 (7)</FORM>
The effect of the events illustrated in FIGS. 9A9B is summarized in Table 16.
<tables id="TABLEUS00016" num="00016"><table frame="none" colsep="0" rowsep="0"><tgroup align="left" colsep="0" rowsep="0" cols="5"><colspec colname="1" colwidth="42pt" align="center"/><colspec colname="2" colwidth="35pt" align="left"/><colspec colname="3" colwidth="49pt" align="left"/><colspec colname="4" colwidth="42pt" align="center"/><colspec colname="5" colwidth="49pt" align="center"/><thead><row><entry namest="1" nameend="5" rowsep="1">TABLE 16</entry></row><row><entry namest="1" nameend="5" align="center" rowsep="1"/></row><row><entry>Initial defect</entry><entry/><entry>Final defect</entry><entry/><entry/></row><row><entry>combination</entry><entry>Events</entry><entry>combination</entry><entry>Coefficient</entry><entry>Probability</entry></row><row><entry namest="1" nameend="5" align="center" rowsep="1"/></row></thead><tbody valign="top"><row><entry>m, n></entry><entry>UVVU</entry><entry>m − 2, n − 2></entry><entry>¼</entry><entry>1.5% </entry></row><row><entry/><entry>or</entry><entry>m − 1, n − 1></entry><entry>1</entry><entry>23%</entry></row><row><entry/><entry>VUUV</entry><entry>m, n></entry><entry> 3/2</entry><entry>51%</entry></row><row><entry/><entry/><entry>m + 1, n + 1></entry><entry>1</entry><entry>23%</entry></row><row><entry/><entry/><entry>m + 2, n + 2></entry><entry>¼</entry><entry>1.5% </entry></row><row><entry namest="1" nameend="5" align="center" rowsep="1"/></row></tbody></tgroup></table></tables>
Based on the results summarized in Table 16, the events illustrated in FIGS. 9A9B render 5 possible defect combinations in the final system state: (i) m+2,n+2> wherein the system acquires 2 extra Type 1 defects and 2 extra Type 2 defects; (ii) m+1,n+1> wherein the system acquires 1 extra Type 1 defect and 1 extra Type 2 defect; (iii) m,n> wherein the system returns to the initial state after it went through several Type 1 and 2 failures and recoveries; (iv) m−1,n−1> wherein the system sheds one Type 1 defect and one Type 2 defect; (v) m−2,n−2> wherein the system sheds 2 Type 1 defects and 2 Type 2 defects. Note that the probability of each final defect count (i)(v) was calculated based on the technique presented in Eq. 7.
The effect of the sequence of events illustrated in FIG. 10 is summarized in Table 17.
<tables id="TABLEUS00017" num="00017"><table frame="none" colsep="0" rowsep="0"><tgroup align="left" colsep="0" rowsep="0" cols="5"><colspec colname="1" colwidth="42pt" align="center"/><colspec colname="2" colwidth="35pt" align="center"/><colspec colname="3" colwidth="49pt" align="left"/><colspec colname="4" colwidth="42pt" align="center"/><colspec colname="5" colwidth="49pt" align="center"/><thead><row><entry namest="1" nameend="5" rowsep="1">TABLE 17</entry></row><row><entry namest="1" nameend="5" align="center" rowsep="1"/></row><row><entry>Initial defect</entry><entry/><entry>Final defect</entry><entry/><entry/></row><row><entry>combination</entry><entry>Events</entry><entry>combination</entry><entry>Coefficient</entry><entry>Probability</entry></row><row><entry namest="1" nameend="5" align="center" rowsep="1"/></row></thead><tbody valign="top"><row><entry/></row></tbody></tgroup><tgroup align="left" colsep="0" rowsep="0" cols="5"><colspec colname="1" colwidth="42pt" align="center"/><colspec colname="2" colwidth="35pt" align="center"/><colspec colname="3" colwidth="49pt" align="left"/><colspec colname="4" colwidth="42pt" align="center"/><colspec colname="5" colwidth="49pt" align="char" char="."/><tbody valign="top"><row><entry>m, n></entry><entry>UUUVUU</entry><entry>m − 1, n − 5></entry><entry>⅛</entry><entry>0.1%</entry></row><row><entry/><entry/><entry>m, n − 4></entry><entry>¾</entry><entry>3.9%</entry></row><row><entry/><entry/><entry>m + 1, n − 3></entry><entry> 15/8 </entry><entry>24.4%</entry></row><row><entry/><entry/><entry>m + 2, n − 2></entry><entry> 5/2</entry><entry>43.2%</entry></row><row><entry/><entry/><entry>m + 3, n − 1></entry><entry> 15/8 </entry><entry>24.4%</entry></row><row><entry/><entry/><entry>m + 4, n></entry><entry>¾</entry><entry>3.9%</entry></row><row><entry/><entry/><entry>m + 5, n + 1></entry><entry>⅛</entry><entry>0.1%</entry></row><row><entry namest="1" nameend="5" align="center" rowsep="1"/></row></tbody></tgroup></table></tables>
The effect of the sequence of events illustrated in FIG. 11 is summarized in Table 18.
<tables id="TABLEUS00018" num="00018"><table frame="none" colsep="0" rowsep="0"><tgroup align="left" colsep="0" rowsep="0" cols="5"><colspec colname="1" colwidth="42pt" align="center"/><colspec colname="2" colwidth="42pt" align="center"/><colspec colname="3" colwidth="49pt" align="left"/><colspec colname="4" colwidth="42pt" align="center"/><colspec colname="5" colwidth="42pt" align="center"/><thead><row><entry namest="1" nameend="5" rowsep="1">TABLE 18</entry></row><row><entry namest="1" nameend="5" align="center" rowsep="1"/></row><row><entry>Initial defect</entry><entry/><entry>Final defect</entry><entry/><entry/></row><row><entry>combination</entry><entry>Events</entry><entry>combination</entry><entry>Coefficient</entry><entry>Probability</entry></row><row><entry namest="1" nameend="5" align="center" rowsep="1"/></row></thead><tbody valign="top"><row><entry/></row></tbody></tgroup><tgroup align="left" colsep="0" rowsep="0" cols="5"><colspec colname="1" colwidth="42pt" align="center"/><colspec colname="2" colwidth="42pt" align="center"/><colspec colname="3" colwidth="49pt" align="left"/><colspec colname="4" colwidth="42pt" align="center"/><colspec colname="5" colwidth="42pt" align="char" char="."/><tbody valign="top"><row><entry>m, n></entry><entry>UVUVUVU</entry><entry>m − 3, n − 4></entry><entry> 1/(8√2)</entry><entry>0.03%</entry></row><row><entry/><entry/><entry>m − 2, n − 3></entry><entry> 7/(8√2)</entry><entry>1.43%</entry></row><row><entry/><entry/><entry>m − 1, n − 2></entry><entry>21/(8√2)</entry><entry>12.85%</entry></row><row><entry/><entry/><entry>m, n − 1></entry><entry>35/(8√2)</entry><entry>35.69%</entry></row><row><entry/><entry/><entry>m + 1, n></entry><entry>35/(8√2)</entry><entry>35.69%</entry></row><row><entry/><entry/><entry>m + 2, n + 1></entry><entry>21/(8√2)</entry><entry>12.85%</entry></row><row><entry/><entry/><entry>m + 3, n + 2></entry><entry> 7/(8√2)</entry><entry>1.43%</entry></row><row><entry/><entry/><entry>m + 4, n + 3></entry><entry> 1/(8√2)</entry><entry>0.03%</entry></row><row><entry namest="1" nameend="5" align="center" rowsep="1"/></row></tbody></tgroup></table></tables>
Based on the results summarized in Tables 1418, one concludes that if the initial state of the system was m,n> and the system went through k U event occurrences, and l V event occurrences, say satisfying k≧l, then the possible defect combinations in the final state are m−l,n−k>, m−l+1,n−k+1>, . . . , m+k,n+1>. In general, the coefficient of the final defect combination m+k−l+i,n+i> is a<sub>m+k−l+i,n+i </sub>(the index i is an integer such that −k≦i≦l). The probability p<sub>m+k−l+i,n+i </sub>of encountering this defect combination m+k−l+i,n+i> in the final state is given by:
<maths id="MATHUS00004" num="00004"><math overflow="scroll"><mtable><mtr><mtd><mrow><msub><mi>p</mi><mrow><mrow><mi>m</mi><mo>+</mo><mi>k</mi><mo></mo><mi>l</mi><mo>+</mo><mi>i</mi></mrow><mo>,</mo><mrow><mi>n</mi><mo>+</mo><mi>i</mi></mrow></mrow></msub><mo>=</mo><mfrac><msup><mrow><mo></mo><msub><mi>a</mi><mrow><mrow><mi>m</mi><mo>+</mo><mi>k</mi><mo></mo><mi>l</mi><mo>+</mo><mi>i</mi></mrow><mo>,</mo><mrow><mi>n</mi><mo>+</mo><mi>i</mi></mrow></mrow></msub><mo></mo></mrow><mn>2</mn></msup><mrow><munderover><mo>∑</mo><mrow><mi>j</mi><mo>=</mo><mrow><mo></mo><mi>k</mi></mrow></mrow><mi>l</mi></munderover><mo></mo><msup><mrow><mo></mo><msub><mi>a</mi><mrow><mrow><mi>m</mi><mo>+</mo><mi>k</mi><mo></mo><mi>l</mi><mo>+</mo><mi>j</mi></mrow><mo>,</mo><mrow><mi>n</mi><mo>+</mo><mi>j</mi></mrow></mrow></msub><mo></mo></mrow><mn>2</mn></msup></mrow></mfrac></mrow></mtd><mtd><mrow><mo>(</mo><mn>8</mn><mo>)</mo></mrow></mtd></mtr></mtable></math></maths>
The probability to find one of the possible defect combinations in the final system state is 1:
<maths id="MATHUS00005" num="00005"><math overflow="scroll"><mtable><mtr><mtd><mrow><mrow><munderover><mo>∑</mo><mrow><mi>i</mi><mo>=</mo><mrow><mo></mo><mi>k</mi></mrow></mrow><mi>l</mi></munderover><mo></mo><msub><mi>p</mi><mrow><mrow><mi>m</mi><mo>+</mo><mi>k</mi><mo></mo><mi>l</mi><mo>+</mo><mi>i</mi></mrow><mo>,</mo><mrow><mi>n</mi><mo>+</mo><mi>i</mi></mrow></mrow></msub></mrow><mo>=</mo><mn>1</mn></mrow></mtd><mtd><mrow><mo>(</mo><mn>9</mn><mo>)</mo></mrow></mtd></mtr></mtable></math></maths>
Initial State with No Defects, m=0 & n=0 (Ground State)
During the event illustrated in FIG. 7A (enunciated by an SPRT5 alarm), the system transitions from the initial state with 0 Type 1 defects and 0 Type 2 defects, 0,0> (i.e., the ground state), into a state containing one Type 1 defect, 1,0>. This transition is equivalent to a failure event of Type 1. The operator U defined below describes mathematically the event in FIG. 7A.
<FORM>U0,0>=1,0> (10)</FORM>
After the application of operator U (occurrence of SPRT5 alarm) the new system state is 1,0>, with 100% certainty.
Similarly, during the event illustrated in FIG. 7B (enunciated by an SPRT6 alarm), the system transitions from an initial state with 0 Type 1 defects and 0 Type 2 defects, 0,0>, into a state containing one Type 2 defect, 0,1>. This transition is equivalent to a failure event of Type 2. The operator V defined below describes mathematically the event in FIG. 7B.
<FORM>V0,0>=0,1> (11)</FORM>
After the application of operator V (occurrence of SPRT6 alarm) the new system state is 0,1>, with 100% certainty.
The results obtained from Eqs. 10 and 11 are summarized in Table 19.
<tables id="TABLEUS00019" num="00019"><table frame="none" colsep="0" rowsep="0"><tgroup align="left" colsep="0" rowsep="0" cols="4"><colspec colname="1" colwidth="77pt" align="center"/><colspec colname="2" colwidth="21pt" align="center"/><colspec colname="3" colwidth="77pt" align="center"/><colspec colname="4" colwidth="42pt" align="center"/><thead><row><entry namest="1" nameend="4" rowsep="1">TABLE 19</entry></row><row><entry namest="1" nameend="4" align="center" rowsep="1"/></row><row><entry>Initial defect combination</entry><entry>Event</entry><entry>Final defect combination</entry><entry>Coefficient</entry></row><row><entry namest="1" nameend="4" align="center" rowsep="1"/></row></thead><tbody valign="top"><row><entry>0, 0></entry><entry>U</entry><entry>1, 0></entry><entry>1</entry></row><row><entry/><entry>V</entry><entry>0, 1></entry><entry>1</entry></row><row><entry namest="1" nameend="4" align="center" rowsep="1"/></row></tbody></tgroup></table></tables>
FIG. 8A illustrates a situation where an SPRT5 alarm is followed at a later time by an SPRT6 alarm. By sequentially applying the operators U, then V to the initial system state 0,0>, one can assess probabilistically the final system state of the system. Using Eqs. 10 and 3, one obtains:
<maths id="MATHUS00006" num="00006"><math overflow="scroll"><mtable><mtr><mtd><mtable><mtr><mtd><mrow><mrow><mi>VU</mi><mo></mo><mn>0</mn></mrow><mo>,</mo><mrow><mn>0</mn><mo>>=</mo><mrow><mi>V</mi><mo></mo><mrow><mo>(</mo><mrow><mrow><mi>U</mi><mo></mo><mn>0</mn></mrow><mo>,</mo><mrow><mn>0</mn><mo>></mo></mrow></mrow><mo>)</mo></mrow></mrow></mrow></mrow></mtd></mtr><mtr><mtd><mrow><mrow><mo>=</mo><mrow><mi>V</mi><mo></mo><mn>1</mn></mrow></mrow><mo>,</mo><mrow><mn>0</mn><mo>></mo></mrow></mrow></mtd></mtr><mtr><mtd><mrow><mrow><mo>=</mo><mrow><msup><mi>s</mi><mi>′</mi></msup><mo></mo><mrow><mo></mo><mrow><mn>0</mn><mo>,</mo><mrow><mn>0</mn><mo>></mo><mrow><mo>+</mo><msup><mi>c</mi><mi>′</mi></msup></mrow></mrow></mrow><mo></mo></mrow><mo></mo><mn>1</mn></mrow></mrow><mo>,</mo><mrow><mn>1</mn><mo>></mo></mrow></mrow></mtd></mtr></mtable></mtd><mtd><mrow><mo>(</mo><mn>12</mn><mo>)</mo></mrow></mtd></mtr></mtable></math></maths>
Similarly, for the events illustrated in FIG. 8B, using Eqs. 11 and 1 one obtains:
<maths id="MATHUS00007" num="00007"><math overflow="scroll"><mtable><mtr><mtd><mtable><mtr><mtd><mrow><mrow><mi>UV</mi><mo></mo><mn>0</mn></mrow><mo>,</mo><mrow><mn>0</mn><mo>>=</mo><mrow><mi>U</mi><mo></mo><mrow><mo>(</mo><mrow><mrow><mi>V</mi><mo></mo><mn>0</mn></mrow><mo>,</mo><mrow><mn>0</mn><mo>></mo></mrow></mrow><mo>)</mo></mrow></mrow></mrow></mrow></mtd></mtr><mtr><mtd><mrow><mrow><mo>=</mo><mrow><mi>U</mi><mo></mo><mn>0</mn></mrow></mrow><mo>,</mo><mrow><mn>1</mn><mo>></mo></mrow></mrow></mtd></mtr><mtr><mtd><mrow><mrow><mo>=</mo><mrow><mi>c</mi><mo></mo><mrow><mo></mo><mrow><mn>1</mn><mo>,</mo><mrow><mn>1</mn><mo>></mo><mrow><mo>+</mo><mi>s</mi></mrow></mrow></mrow><mo></mo></mrow><mo></mo><mn>0</mn></mrow></mrow><mo>,</mo><mrow><mn>0</mn><mo>></mo></mrow></mrow></mtd></mtr></mtable></mtd><mtd><mrow><mo>(</mo><mn>13</mn><mo>)</mo></mrow></mtd></mtr></mtable></math></maths>
The results obtained from Eqs. 12 and 13 are summarized in Table 20.
<tables id="TABLEUS00020" num="00020"><table frame="none" colsep="0" rowsep="0"><tgroup align="left" colsep="0" rowsep="0" cols="4"><colspec colname="1" colwidth="49pt" align="center"/><colspec colname="2" colwidth="35pt" align="center"/><colspec colname="3" colwidth="84pt" align="center"/><colspec colname="4" colwidth="49pt" align="center"/><thead><row><entry namest="1" nameend="4" rowsep="1">TABLE 20</entry></row><row><entry namest="1" nameend="4" align="center" rowsep="1"/></row><row><entry>Initial defect</entry><entry/><entry/><entry/></row><row><entry>combination</entry><entry>Events</entry><entry>Final defect combination</entry><entry>Coefficient</entry></row><row><entry namest="1" nameend="4" align="center" rowsep="1"/></row></thead><tbody valign="top"><row><entry>0, 0></entry><entry>VU</entry><entry>0, 0></entry><entry>s′</entry></row><row><entry/><entry/><entry>1, 1></entry><entry>c′</entry></row><row><entry/><entry>UV</entry><entry>0, 0></entry><entry>s</entry></row><row><entry/><entry/><entry>1, 1></entry><entry>c</entry></row><row><entry namest="1" nameend="4" align="center" rowsep="1"/></row></tbody></tgroup></table></tables>
Based on the results summarized in Table 20, the events illustrated in FIGS. 8A8B render 2 possible defect combinations in the final system state: (i) 1,1> wherein the system acquires 1 extra Type 1 defect and 1 extra Type 2 defect; (ii) 0,0> wherein the system returns to the initial state after it went through a Type 1 failure followed by a Type 1 recovery, or a Type 2 failure followed by a Type 2 recovery. Note that the probability of each final defect count (i)(ii) is proportional to the square of the coefficient for the respective terms in Table 20.
FIG. 9A illustrates a situation where an SPRT5 alarm is followed at a later time by an SPRT6 alarm, which is then followed by another SPRT6 alarm, then the event sequence ends with an SPRT5 alarm. By sequentially applying the operators U, V, V, U to the initial system state 0,0>, one can assess probabilistically the final system state of the system. Using the sets of Eqs. 1 and 3 and Eqs. 10 and 11, and performing the calculation using the technique detailed for Eqs. 12 and 13, one obtains the results summarized in Table 21. FIG. 9B illustrates a situation where an SPRT6 alarm is followed at a later time by an SPRT5 alarm, which is then followed by another SPRT5 alarm, then the event sequence ends with an SPRT6 alarm. The results corresponding to FIG. 9B are also summarized in Table 21.
<tables id="TABLEUS00021" num="00021"><table frame="none" colsep="0" rowsep="0"><tgroup align="left" colsep="0" rowsep="0" cols="4"><colspec colname="1" colwidth="70pt" align="center"/><colspec colname="2" colwidth="28pt" align="center"/><colspec colname="3" colwidth="56pt" align="center"/><colspec colname="4" colwidth="63pt" align="left"/><thead><row><entry namest="1" nameend="4" rowsep="1">TABLE 21</entry></row><row><entry namest="1" nameend="4" align="center" rowsep="1"/></row><row><entry>Initial defect</entry><entry/><entry>Final defect</entry><entry/></row><row><entry>combination</entry><entry>Events</entry><entry>combination</entry><entry>Coefficient</entry></row><row><entry namest="1" nameend="4" align="center" rowsep="1"/></row></thead><tbody valign="top"><row><entry>0, 0></entry><entry>UVVU</entry><entry>0, 0></entry><entry>(s′s + s′sc′)</entry></row><row><entry/><entry/><entry>1, 1></entry><entry>(s′c + s′c′c + sc′<sup>2</sup>)</entry></row><row><entry/><entry/><entry>2, 2></entry><entry>c′<sup>2</sup>c</entry></row><row><entry/><entry>VUUV</entry><entry>0, 0></entry><entry>(s′s + s′sc)</entry></row><row><entry/><entry/><entry>1, 1></entry><entry>(sc′ + sc′c + s′c<sup>2</sup>)</entry></row><row><entry/><entry/><entry>2, 2></entry><entry>c′c<sup>2</sup></entry></row><row><entry namest="1" nameend="4" align="center" rowsep="1"/></row></tbody></tgroup></table></tables>
Based on the results summarized in Table 21, the events illustrated in FIGS. 9A9B render 3 possible defect combinations in the final system state: (i) 2,2> wherein the system acquires 2 extra Type 1 defects and 2 extra Type 2 defects; (ii) 1,1> wherein the system acquires 1 extra Type 1 defect and 1 extra Type 2 defect; (iii) 0,0> wherein the system returns to the initial state after it went through several Type 1 and 2 failures and recoveries. Note that the probability of each final defect count (i)(iii) is proportional to the square of the coefficient for the respective terms in Table 21.
A few limiting cases are discussed below.
(a) Small Probability for Recovery Events: {s<sup>2</sup>, s′<sup>2</sup>, s′s<sup>2</sup>}<<1, {c<sup>2</sup>, c′<sup>2</sup>}≈<1.
The effect of the events illustrated in FIGS. 9A9B is summarized in Table 22.
<tables id="TABLEUS00022" num="00022"><table frame="none" colsep="0" rowsep="0"><tgroup align="left" colsep="0" rowsep="0" cols="4"><colspec colname="1" colwidth="70pt" align="center"/><colspec colname="2" colwidth="28pt" align="center"/><colspec colname="3" colwidth="56pt" align="center"/><colspec colname="4" colwidth="63pt" align="left"/><thead><row><entry namest="1" nameend="4" rowsep="1">TABLE 22</entry></row><row><entry namest="1" nameend="4" align="center" rowsep="1"/></row><row><entry>Initial defect</entry><entry/><entry>Final defect</entry><entry/></row><row><entry>combination</entry><entry>Events</entry><entry>combination</entry><entry>Coefficient</entry></row><row><entry namest="1" nameend="4" align="center" rowsep="1"/></row></thead><tbody valign="top"><row><entry>0, 0></entry><entry>UVVU</entry><entry>1, 1></entry><entry>(s′c + s′c′c + sc′<sup>2</sup>)</entry></row><row><entry/><entry/><entry>2, 2></entry><entry>c′<sup>2</sup>c</entry></row><row><entry/><entry>VUUV</entry><entry>1, 1></entry><entry>(sc′ + sc′c + s′c<sup>2</sup>)</entry></row><row><entry/><entry/><entry>2, 2></entry><entry>c′c<sup>2</sup></entry></row><row><entry namest="1" nameend="4" align="center" rowsep="1"/></row></tbody></tgroup></table></tables>
Based on the results summarized in Table 22, the events illustrated in FIGS. 9A9B render 2 possible defect combinations in the final system state: (i) 2,2> wherein the system acquires 2 extra Type 1 defects and 2 extra Type 2 defects; (ii) 1,1> wherein the system acquires 1 extra Type 1 defect and 1 extra Type 2 defect. Note that the probability of final defect count (i) is much larger than the probability of final defect count (ii).
(b) No Recovery Events Allowed: s=s′=0, c=c′=1.
The effect of the events illustrated in FIGS. 89 is summarized in Table 23.
<tables id="TABLEUS00023" num="00023"><table frame="none" colsep="0" rowsep="0"><tgroup align="left" colsep="0" rowsep="0" cols="4"><colspec colname="1" colwidth="70pt" align="center"/><colspec colname="2" colwidth="42pt" align="left"/><colspec colname="3" colwidth="42pt" align="center"/><colspec colname="4" colwidth="63pt" align="center"/><thead><row><entry namest="1" nameend="4" rowsep="1">TABLE 23</entry></row><row><entry namest="1" nameend="4" align="center" rowsep="1"/></row><row><entry>Initial defect</entry><entry/><entry>Final defect</entry><entry/></row><row><entry>combination</entry><entry>Events</entry><entry>combination</entry><entry>Coefficient</entry></row><row><entry namest="1" nameend="4" align="center" rowsep="1"/></row></thead><tbody valign="top"><row><entry>0, 0></entry><entry>VU or UV</entry><entry>1, 1></entry><entry>1</entry></row><row><entry/><entry>UVVU or</entry><entry>2, 2></entry><entry>1</entry></row><row><entry/><entry>VUUV</entry></row><row><entry namest="1" nameend="4" align="center" rowsep="1"/></row></tbody></tgroup></table></tables>
For this limiting case, during each event U, the system acquires 1 extra Type 1 defect; during each event V, the system acquires 1 extra Type 2 defect. For each intermediary step, the probability of knowing the defect count is equal to 1. Therefore, the assessment of the final system state is deterministic.
FIGS. 1011 illustrate case histories of system monitoring wherein multiple resources became unavailable. The effect of the sequence of events shown in FIG. 10 is summarized in Table 24.
<tables id="TABLEUS00024" num="00024"><table frame="none" colsep="0" rowsep="0"><tgroup align="left" colsep="0" rowsep="0" cols="5"><colspec colname="1" colwidth="42pt" align="center"/><colspec colname="2" colwidth="42pt" align="center"/><colspec colname="3" colwidth="42pt" align="center"/><colspec colname="4" colwidth="49pt" align="center"/><colspec colname="5" colwidth="42pt" align="center"/><thead><row><entry namest="1" nameend="5" rowsep="1">TABLE 24</entry></row><row><entry namest="1" nameend="5" align="center" rowsep="1"/></row><row><entry>Initial defect</entry><entry/><entry>Final defect</entry><entry/><entry/></row><row><entry>combination</entry><entry>Events</entry><entry>combination</entry><entry>Coefficient</entry><entry>Probability</entry></row><row><entry namest="1" nameend="5" align="center" rowsep="1"/></row></thead><tbody valign="top"><row><entry>0, 0></entry><entry>UUUVUU</entry><entry>5, 1></entry><entry>1</entry><entry>100%</entry></row><row><entry namest="1" nameend="5" align="center" rowsep="1"/></row></tbody></tgroup></table></tables>
The effect of the sequence of events illustrated in FIG. 11 is summarized in Table 25.
<tables id="TABLEUS00025" num="00025"><table frame="none" colsep="0" rowsep="0"><tgroup align="left" colsep="0" rowsep="0" cols="5"><colspec colname="1" colwidth="49pt" align="center"/><colspec colname="2" colwidth="42pt" align="center"/><colspec colname="3" colwidth="42pt" align="center"/><colspec colname="4" colwidth="42pt" align="center"/><colspec colname="5" colwidth="42pt" align="center"/><thead><row><entry namest="1" nameend="5" rowsep="1">TABLE 25</entry></row><row><entry namest="1" nameend="5" align="center" rowsep="1"/></row><row><entry>Initial defect</entry><entry/><entry>Final defect</entry><entry/><entry/></row><row><entry>combination</entry><entry>Events</entry><entry>combination</entry><entry>Coefficient</entry><entry>Probability</entry></row><row><entry namest="1" nameend="5" align="center" rowsep="1"/></row></thead><tbody valign="top"><row><entry>0, 0></entry><entry>UVUVUVU</entry><entry>4, 3></entry><entry>1</entry><entry>100%</entry></row><row><entry namest="1" nameend="5" align="center" rowsep="1"/></row></tbody></tgroup></table></tables>
Based on the results summarized in Tables 2325, one notices that by the time the system reaches its final state, it has acquired a number of Type 1 defects equal to the number of recorded U events, and a number of Type 2 defects equal to the number of recorded V events.
(c) No Defects of Type 2 Allowed: c=s′=1, s=c′=0.
When no Type 2 defects are permitted, the initial system state is defined as 0,n=0>=0>. The sequence of events shown in FIGS. 7B, 8B, and 9B are not allowable for this limiting case, as V cannot act on 0>. The effect of the events illustrated in FIGS. 7A, 8A, and 9A is summarized in Table 26.
<tables id="TABLEUS00026" num="00026"><table frame="none" colsep="0" rowsep="0"><tgroup align="left" colsep="0" rowsep="0" cols="4"><colspec colname="1" colwidth="70pt" align="center"/><colspec colname="2" colwidth="35pt" align="left"/><colspec colname="3" colwidth="63pt" align="center"/><colspec colname="4" colwidth="49pt" align="center"/><thead><row><entry namest="1" nameend="4" rowsep="1">TABLE 26</entry></row><row><entry namest="1" nameend="4" align="center" rowsep="1"/></row><row><entry>Initial defect count</entry><entry>Events</entry><entry>Final defect count</entry><entry>Coefficient</entry></row><row><entry namest="1" nameend="4" align="center" rowsep="1"/></row></thead><tbody valign="top"><row><entry>0, 0> = 0></entry><entry>U</entry><entry>1></entry><entry>1</entry></row><row><entry/><entry>V</entry><entry>N/A</entry><entry>N/A</entry></row><row><entry/><entry>VU</entry><entry>0></entry><entry>1</entry></row><row><entry/><entry>UVVU</entry><entry>0></entry><entry>1</entry></row><row><entry namest="1" nameend="4" align="center" rowsep="1"/></row></tbody></tgroup></table></tables>
Based on the results summarized in Table 26, one recognizes that the event operators U and V correctly render the exact (deterministic) defect count of the final system state. For this limiting case, during each event U, the system acquires one extra Type 1 defect and during each event V, the system sheds one Type 1 defect. For each intermediary step, the probability of knowing the defect count is equal to 1. Therefore, the assessment of the final system state is deterministic.
The effect of the sequence of events illustrated in FIG. 10 is summarized in Table 27.
<tables id="TABLEUS00027" num="00027"><table frame="none" colsep="0" rowsep="0"><tgroup align="left" colsep="0" rowsep="0" cols="5"><colspec colname="1" colwidth="42pt" align="center"/><colspec colname="2" colwidth="42pt" align="center"/><colspec colname="3" colwidth="42pt" align="center"/><colspec colname="4" colwidth="49pt" align="center"/><colspec colname="5" colwidth="42pt" align="center"/><thead><row><entry namest="1" nameend="5" rowsep="1">TABLE 27</entry></row><row><entry namest="1" nameend="5" align="center" rowsep="1"/></row><row><entry>Initial defect</entry><entry/><entry>Final defect</entry><entry/><entry/></row><row><entry>combination</entry><entry>Events</entry><entry>combination</entry><entry>Coefficient</entry><entry>Probability</entry></row><row><entry namest="1" nameend="5" align="center" rowsep="1"/></row></thead><tbody valign="top"><row><entry>0></entry><entry>UUUVUU</entry><entry>4></entry><entry>1</entry><entry>100%</entry></row><row><entry namest="1" nameend="5" align="center" rowsep="1"/></row></tbody></tgroup></table></tables>
The effect of the sequence of events illustrated in FIG. 11 is summarized in Table 28.
<tables id="TABLEUS00028" num="00028"><table frame="none" colsep="0" rowsep="0"><tgroup align="left" colsep="0" rowsep="0" cols="5"><colspec colname="1" colwidth="49pt" align="center"/><colspec colname="2" colwidth="42pt" align="center"/><colspec colname="3" colwidth="42pt" align="center"/><colspec colname="4" colwidth="42pt" align="center"/><colspec colname="5" colwidth="42pt" align="center"/><thead><row><entry namest="1" nameend="5" rowsep="1">TABLE 28</entry></row><row><entry namest="1" nameend="5" align="center" rowsep="1"/></row><row><entry>Initial defect</entry><entry/><entry>Final defect</entry><entry/><entry/></row><row><entry>combination</entry><entry>Events</entry><entry>combination</entry><entry>Coefficient</entry><entry>Probability</entry></row><row><entry namest="1" nameend="5" align="center" rowsep="1"/></row></thead><tbody valign="top"><row><entry>0></entry><entry>UVUVUVU</entry><entry>1></entry><entry>1</entry><entry>100%</entry></row><row><entry namest="1" nameend="5" align="center" rowsep="1"/></row></tbody></tgroup></table></tables>
Based on the results summarized in Tables 2628, one notices that by the time the system reaches its final state, it has acquired a number of Type 1 defects equal to the difference between the number of recorded U and V events.
(d) Equal Probability Ratios for Failure & Recovery Between Defects d<sub>1 </sub>& d<sub>2</sub>: c=c′, s=s′.
The effect of the events illustrated in FIGS. 9A9B is summarized in Table 29.
<tables id="TABLEUS00029" num="00029"><table frame="none" colsep="0" rowsep="0"><tgroup align="left" colsep="0" rowsep="0" cols="4"><colspec colname="1" colwidth="49pt" align="center"/><colspec colname="2" colwidth="35pt" align="center"/><colspec colname="3" colwidth="84pt" align="center"/><colspec colname="4" colwidth="49pt" align="left"/><thead><row><entry namest="1" nameend="4" rowsep="1">TABLE 29</entry></row><row><entry namest="1" nameend="4" align="center" rowsep="1"/></row><row><entry>Initial defect</entry><entry/><entry/><entry/></row><row><entry>combination</entry><entry>Events</entry><entry>Final defect combination</entry><entry>Coefficient</entry></row><row><entry namest="1" nameend="4" align="center" rowsep="1"/></row></thead><tbody valign="top"><row><entry>0, 0></entry><entry>UVVU</entry><entry>0, 0></entry><entry>(s<sup>2 </sup>+ s<sup>2</sup>c)</entry></row><row><entry/><entry>or</entry><entry>1, 1></entry><entry>(sc + 2sc<sup>2</sup>)</entry></row><row><entry/><entry>VUUV</entry><entry>2, 2></entry><entry>c<sup>3</sup></entry></row><row><entry namest="1" nameend="4" align="center" rowsep="1"/></row></tbody></tgroup></table></tables>
Based on the results summarized in Table 29, the events illustrated in FIGS. 9A9B render 3 possible defect combinations in the final system state: (i) 2,2> wherein the system acquires 2 extra Type 1 defects and 2 extra Type 2 defects; (ii) 1,1> wherein the system acquires 1 extra Type 1 defect and 1 extra Type 2 defect; (iii) 0,0> wherein the system returns to the initial state after it went through several Type 1 and 2 failures and recoveries. The probability of each final defect count (i)(iii) is proportional to the square of the coefficient for the respective terms in Table 29.
(e) Equal Probabilities for Failure & Recovery for Defects d<sub>1 </sub>& d<sub>2</sub>: c=s=1/√2, c′=s′=1/√2.
The effect of the events illustrated in FIGS. 8A8B is summarized in Table 30.
<tables id="TABLEUS00030" num="00030"><table frame="none" colsep="0" rowsep="0"><tgroup align="left" colsep="0" rowsep="0" cols="5"><colspec colname="1" colwidth="42pt" align="center"/><colspec colname="2" colwidth="35pt" align="left"/><colspec colname="3" colwidth="49pt" align="center"/><colspec colname="4" colwidth="42pt" align="center"/><colspec colname="5" colwidth="49pt" align="center"/><thead><row><entry namest="1" nameend="5" rowsep="1">TABLE 30</entry></row><row><entry namest="1" nameend="5" align="center" rowsep="1"/></row><row><entry>Initial defect</entry><entry/><entry>Final defect</entry><entry/><entry/></row><row><entry>combination</entry><entry>Events</entry><entry>combination</entry><entry>Coefficient</entry><entry>Probability</entry></row><row><entry namest="1" nameend="5" align="center" rowsep="1"/></row></thead><tbody valign="top"><row><entry>0, 0></entry><entry>UV or</entry><entry>0, 0></entry><entry>1/√2</entry><entry>50%</entry></row><row><entry/><entry>VU</entry><entry>1, 1></entry><entry>1/√2</entry><entry>50%</entry></row><row><entry namest="1" nameend="5" align="center" rowsep="1"/></row></tbody></tgroup></table></tables>
Based on the results summarized in Table 30, the events illustrated in FIGS. 8A8B render 2 possible defect combinations in the final state: (i) there is 50% probability to obtain 1,1>, wherein the system acquires 1 extra Type 1 defect and 1 extra Type 2 defect; (ii) there is 50% probability to obtain 0,0>, wherein the system returns to the initial state after it went through several Type 1 and 2 failures and recoveries.
The effect of the events illustrated in FIGS. 9A9B is summarized in Table 31.
<tables id="TABLEUS00031" num="00031"><table frame="none" colsep="0" rowsep="0"><tgroup align="left" colsep="0" rowsep="0" cols="5"><colspec colname="1" colwidth="42pt" align="center"/><colspec colname="2" colwidth="35pt" align="center"/><colspec colname="3" colwidth="42pt" align="center"/><colspec colname="4" colwidth="49pt" align="left"/><colspec colname="5" colwidth="49pt" align="center"/><thead><row><entry namest="1" nameend="5" rowsep="1">TABLE 31</entry></row><row><entry namest="1" nameend="5" align="center" rowsep="1"/></row><row><entry>Initial defect</entry><entry/><entry>Final defect</entry><entry/><entry/></row><row><entry>combination</entry><entry>Events</entry><entry>combination</entry><entry>Coefficient</entry><entry>Probability</entry></row><row><entry namest="1" nameend="5" align="center" rowsep="1"/></row></thead><tbody valign="top"><row><entry/></row></tbody></tgroup><tgroup align="left" colsep="0" rowsep="0" cols="5"><colspec colname="1" colwidth="42pt" align="center"/><colspec colname="2" colwidth="35pt" align="center"/><colspec colname="3" colwidth="42pt" align="center"/><colspec colname="4" colwidth="49pt" align="left"/><colspec colname="5" colwidth="49pt" align="char" char="."/><tbody valign="top"><row><entry>0, 0></entry><entry>UVVU</entry><entry>0, 0></entry><entry>[½ + 1/(2√2)]</entry><entry>32%</entry></row><row><entry/><entry>or</entry><entry>1, 1></entry><entry>(½ + 1/√2)</entry><entry>63%</entry></row><row><entry/><entry>VUUV</entry><entry>2, 2></entry><entry>1/(2√2)</entry><entry>5%</entry></row><row><entry namest="1" nameend="5" align="center" rowsep="1"/></row></tbody></tgroup></table></tables>
Based on the results summarized in Table 31, the events illustrated in FIGS. 9A9B render 3 possible defect combinations in the final system state: (i) 2,2> wherein the system acquires 2 extra Type 1 defects and 2 extra Type 2 defects; (ii) 1,1> wherein the system acquires 1 extra Type 1 defect and 1 extra Type 2 defect; (iii) 0,0> wherein the system returns to the initial state after it went through several Type 1 and 2 failures and recoveries. Note that the probability of each final defect count (i)(iii) was calculated based on the technique presented in Eq. 7 above.
The effect of the sequence of events illustrated in FIG. 10 is summarized in Table 32.
<tables id="TABLEUS00032" num="00032"><table frame="none" colsep="0" rowsep="0"><tgroup align="left" colsep="0" rowsep="0" cols="5"><colspec colname="1" colwidth="42pt" align="center"/><colspec colname="2" colwidth="42pt" align="center"/><colspec colname="3" colwidth="42pt" align="center"/><colspec colname="4" colwidth="49pt" align="left"/><colspec colname="5" colwidth="42pt" align="center"/><thead><row><entry namest="1" nameend="5" rowsep="1">TABLE 32</entry></row><row><entry namest="1" nameend="5" align="center" rowsep="1"/></row><row><entry>Initial defect</entry><entry/><entry>Final defect</entry><entry/><entry/></row><row><entry>combination</entry><entry>Events</entry><entry>combination</entry><entry>Coefficient</entry><entry>Probability</entry></row><row><entry namest="1" nameend="5" align="center" rowsep="1"/></row></thead><tbody valign="top"><row><entry/></row></tbody></tgroup><tgroup align="left" colsep="0" rowsep="0" cols="5"><colspec colname="1" colwidth="42pt" align="center"/><colspec colname="2" colwidth="42pt" align="center"/><colspec colname="3" colwidth="42pt" align="center"/><colspec colname="4" colwidth="49pt" align="left"/><colspec colname="5" colwidth="42pt" align="char" char="."/><tbody valign="top"><row><entry>0,0></entry><entry>UUUVUU</entry><entry>4, 0></entry><entry>(½ + 1/√2)</entry><entry>92%</entry></row><row><entry/><entry/><entry>5, 1></entry><entry>1/(2√2)</entry><entry>8%</entry></row><row><entry namest="1" nameend="5" align="center" rowsep="1"/></row></tbody></tgroup></table></tables>
The effect of the sequence of events illustrated in FIG. 11 is summarized in Table 33.
<tables id="TABLEUS00033" num="00033"><table frame="none" colsep="0" rowsep="0"><tgroup align="left" colsep="0" rowsep="0" cols="5"><colspec colname="1" colwidth="42pt" align="center"/><colspec colname="2" colwidth="42pt" align="center"/><colspec colname="3" colwidth="42pt" align="center"/><colspec colname="4" colwidth="49pt" align="left"/><colspec colname="5" colwidth="42pt" align="center"/><thead><row><entry namest="1" nameend="5" rowsep="1">TABLE 33</entry></row><row><entry namest="1" nameend="5" align="center" rowsep="1"/></row><row><entry>Initial defect</entry><entry/><entry>Final defect</entry><entry/><entry/></row><row><entry>combination</entry><entry>Events</entry><entry>combination</entry><entry>Coefficient</entry><entry>Probability</entry></row><row><entry namest="1" nameend="5" align="center" rowsep="1"/></row></thead><tbody valign="top"><row><entry/></row></tbody></tgroup><tgroup align="left" colsep="0" rowsep="0" cols="5"><colspec colname="1" colwidth="42pt" align="center"/><colspec colname="2" colwidth="42pt" align="center"/><colspec colname="3" colwidth="42pt" align="center"/><colspec colname="4" colwidth="49pt" align="left"/><colspec colname="5" colwidth="42pt" align="char" char="."/><tbody valign="top"><row><entry>0, 0></entry><entry>UVUVUVU</entry><entry>1, 0></entry><entry>(5/4 + √2)</entry><entry>52.9%</entry></row><row><entry/><entry/><entry>2, 1></entry><entry>(5/4 + 3√2/4)</entry><entry>39.8%</entry></row><row><entry/><entry/><entry>3, 2></entry><entry>(5/8 + √2/4)</entry><entry>7.2%</entry></row><row><entry/><entry/><entry>4, 3></entry><entry>⅛</entry><entry>0.1%</entry></row><row><entry namest="1" nameend="5" align="center" rowsep="1"/></row></tbody></tgroup></table></tables>
Based on the results summarized in Tables 3033, one concludes that if the initial state of the system was 0,0> and the system went through m U event occurrences, and n V event occurrences, say satisfying m≧n, then the possible defect combinations in the final state are m−n,0>, m−n+1,1>, . . . , m,n>. In general, the coefficient of the final defect combination m−n+i,i> is a<sub>m−n+i,i </sub>(index i is an integer such that 0≦i≦n). The probability p<sub>m−n+i,i </sub>of encountering this defect combination m−n+i,i> in the final state is given by:
<maths id="MATHUS00008" num="00008"><math overflow="scroll"><mtable><mtr><mtd><mrow><msub><mi>p</mi><mrow><mrow><mi>m</mi><mo></mo><mi>n</mi><mo></mo><mi>i</mi></mrow><mo>,</mo><mi>i</mi></mrow></msub><mo>=</mo><mfrac><msup><mrow><mo></mo><msub><mi>a</mi><mrow><mrow><mi>m</mi><mo></mo><mi>n</mi><mo>+</mo><mi>i</mi></mrow><mo>,</mo><mi>i</mi></mrow></msub><mo></mo></mrow><mn>2</mn></msup><mrow><munderover><mo>∑</mo><mrow><mi>j</mi><mo>=</mo><mn>0</mn></mrow><mi>n</mi></munderover><mo></mo><msup><mrow><mo></mo><msub><mi>a</mi><mrow><mrow><mi>m</mi><mo></mo><mi>n</mi><mo>+</mo><mi>j</mi></mrow><mo>,</mo><mi>j</mi></mrow></msub><mo></mo></mrow><mn>2</mn></msup></mrow></mfrac></mrow></mtd><mtd><mrow><mo>(</mo><mn>14</mn><mo>)</mo></mrow></mtd></mtr></mtable></math></maths>
The probability to find one of the possible defect combinations in the final system state is 1:
<maths id="MATHUS00009" num="00009"><math overflow="scroll"><mtable><mtr><mtd><mrow><mrow><munderover><mo>∑</mo><mrow><mi>i</mi><mo>=</mo><mn>0</mn></mrow><mi>n</mi></munderover><mo></mo><msub><mi>p</mi><mrow><mrow><mi>m</mi><mo></mo><mi>n</mi><mo>+</mo><mi>i</mi></mrow><mo>,</mo><mi>i</mi></mrow></msub></mrow><mo>=</mo><mn>1</mn></mrow></mtd><mtd><mrow><mo>(</mo><mn>15</mn><mo>)</mo></mrow></mtd></mtr></mtable></math></maths>
Discussion
Note that the defectcounting method described above, valid under the assumptions m>0 and n>0, led to linear combinations of possible defect combinations in the final state characterized by a high degree of symmetry. As illustrated in Tables 118 above, not only the final state possible defect combinations are symmetric with respect to the initial state defect combination, but also their coefficients, and therefore their probabilities, respectively, are symmetric. Furthermore, the operators describing the events in FIGS. 711 appear to be commutative. However, there exists an implicit assumption in the treatment presented above that the count of either Type 1 or 2 defects cannot not decrease below 0 (i.e., no “negative” defect count is allowed).
In contrast, the defectcounting method described for situations when m=0 and n=0 (ground state) led to linear combinations of possible defect combinations in the final state which are not symmetric. As illustrated in Tables 1933 above, the final state possible defect combinations are not symmetric with respect to the initial ground state, as no negative count is possible for either type of defect. Also the coefficients of the possible defect combinations, and therefore their probabilities, are not respectively symmetric. Furthermore, the operators describing the events in FIGS. 711 are not commutative anymore. The “breaking” of symmetry for the m=0 and n=0 situation as described above is due to the fact that recovery events are not “allowed” in the ground state. In other words, when no defect is present, the recovery of an inexistent defect is not possible.
Moreover, when the initial defect count is finite, m>0 and n>0, but during successive U and/or V events, the count for at least one of the two defect types reaches 0, the way to evaluate the possible defect count combinations in the system final state is to apply a hybrid method between the techniques presented above. Therefore, when m>0 or n>0, one uses Eqs. 14, whereas when m=0 or n=0, one uses Eqs. 1011.
Summary
The method for updating the system state is summarized below. In general, the state of a system which has been subjected to successive U and/or V events is given by the following linear combination of possible defect counts:
<maths id="MATHUS00010" num="00010"><math overflow="scroll"><mtable><mtr><mtd><mrow><munder><mo>∑</mo><mrow><mi>m</mi><mo>,</mo><mi>n</mi></mrow></munder><mo></mo><mrow><msub><mi>a</mi><mrow><mi>m</mi><mo>,</mo><mi>n</mi></mrow></msub><mo></mo><mrow><mo></mo><mrow><mi>m</mi><mo>,</mo><mi>n</mi></mrow><mo>〉</mo></mrow></mrow></mrow></mtd><mtd><mrow><mo>(</mo><mn>16</mn><mo>)</mo></mrow></mtd></mtr></mtable></math></maths>
When an SPRT5 alarm occurs, one applies eventoperator U in the following fashion:
<maths id="MATHUS00011" num="00011"><math overflow="scroll"><mtable><mtr><mtd><mrow><munder><mo>∑</mo><mrow><mi>m</mi><mo>,</mo><mi>n</mi></mrow></munder><mo></mo><mrow><msub><mi>a</mi><mrow><mi>m</mi><mo>,</mo><mi>n</mi></mrow></msub><mo></mo><mi>U</mi><mo></mo><mrow><mo></mo><mrow><mi>m</mi><mo>,</mo><mi>n</mi></mrow><mo>〉</mo></mrow></mrow></mrow></mtd><mtd><mrow><mo>(</mo><mn>17</mn><mo>)</mo></mrow></mtd></mtr></mtable></math></maths>
The event operator U in Eq. 17 is given by Eqs. 12 if (m≧0 and n>0), or by Eq. 10 if (m≧0 and n=0).
When an SPRT6 alarm occurs, one applies event operator V in the following fashion:
<maths id="MATHUS00012" num="00012"><math overflow="scroll"><mtable><mtr><mtd><mrow><munder><mo>∑</mo><mrow><mi>m</mi><mo>,</mo><mi>n</mi></mrow></munder><mo></mo><mrow><msub><mi>a</mi><mrow><mi>m</mi><mo>,</mo><mi>n</mi></mrow></msub><mo></mo><mi>V</mi><mo></mo><mrow><mo></mo><mrow><mi>m</mi><mo>,</mo><mi>n</mi></mrow><mo>〉</mo></mrow></mrow></mrow></mtd><mtd><mrow><mo>(</mo><mn>18</mn><mo>)</mo></mrow></mtd></mtr></mtable></math></maths>
The event operator V in Eq. 18 is given by Eqs. 34 if (m>0 and n≧0), or by Eq. 11 if (m=0 and n≧0). Eqs. 1718 lead to new linear combinations of possible defect counts in the final state:
<maths id="MATHUS00013" num="00013"><math overflow="scroll"><mtable><mtr><mtd><mrow><munder><mo>∑</mo><mrow><mi>i</mi><mo>,</mo><mi>j</mi></mrow></munder><mo></mo><mrow><msubsup><mi>a</mi><mrow><mi>i</mi><mo>,</mo><mi>j</mi></mrow><mi>′</mi></msubsup><mo></mo><mrow><mo></mo><mrow><mi>i</mi><mo>,</mo><mi>j</mi></mrow><mo>〉</mo></mrow></mrow></mrow></mtd><mtd><mrow><mo>(</mo><mn>19</mn><mo>)</mo></mrow></mtd></mtr></mtable></math></maths>
Each possible defect count combination i,j> is characterized by the respective probability:
<maths id="MATHUS00014" num="00014"><math overflow="scroll"><mtable><mtr><mtd><mrow><mrow><msub><mi>p</mi><mrow><mi>i</mi><mo>,</mo><mi>j</mi></mrow></msub><mo>=</mo><mfrac><msup><mrow><mo></mo><msubsup><mi>a</mi><mrow><mi>i</mi><mo>,</mo><mi>j</mi></mrow><mi>′</mi></msubsup><mo></mo></mrow><mn>2</mn></msup><mrow><munder><mo>∑</mo><mrow><mi>j</mi><mo>,</mo><mi>l</mi></mrow></munder><mo></mo><msup><mrow><mo></mo><msubsup><mi>a</mi><mrow><mi>k</mi><mo>,</mo><mi>l</mi></mrow><mi>′</mi></msubsup><mo></mo></mrow><mn>2</mn></msup></mrow></mfrac></mrow><mo></mo><mstyle><mtext></mtext></mstyle><mo></mo><mrow><mrow><mi>where</mi><mo></mo><mstyle><mspace width="0.8em" height="0.8ex"/></mstyle><mo></mo><mrow><munder><mo>∑</mo><mrow><mi>i</mi><mo>,</mo><mi>j</mi></mrow></munder><mo></mo><msub><mi>p</mi><mrow><mi>i</mi><mo>,</mo><mi>j</mi></mrow></msub></mrow></mrow><mo>=</mo><mn>1.</mn></mrow></mrow></mtd><mtd><mrow><mo>(</mo><mn>20</mn><mo>)</mo></mrow></mtd></mtr></mtable></math></maths>
In some embodiments, when the probability of a specific unwanted defect combination is deemed unacceptably high, the monitoring process is stopped and one or more remedial actions are performed. In some embodiments, after one or more remedial actions are performed, the system monitoring may be resumed.
RealFIG. 12 illustrates realtime telemetry system 104 in accordance with an embodiment of the present invention. Referring to FIG. 12, computer system 100 can generally include any computational device including a mechanism for servicing requests from a client for computational and/or data storage resources. In one embodiment, computer system 100 is a highend uniprocessor or multiprocessor server that is being monitored by realtime telemetry system 104.
Realtime telemetry system 104 includes telemetry device 1200, analytical resampling program 1201, sensitivity analysis tool 1202, and SPRT module 1203. Telemetry device 1200 gathers information from the various sensors and monitoring tools within computer system 100. In one embodiment, telemetry device 1200 directs the signals to a remote location that contains analytical resampling program 1201, sensitivity analysis tool 1202, and SPRT module 1203. In another embodiment of the present invention, one or more of analytical resampling program 1201, sensitivity analysis tool 1202, and SPRT module 1203 are located within computer system 100.
Analytical resampling program 1201 ensures that the monitored telemetry variables have a uniform sampling rate. In doing so, analytical resampling program 1201 uses interpolation techniques, if necessary, to fill in missing data points, or to equalize the sampling intervals when the raw data is nonuniformly sampled.
After the telemetry variables pass through analytical resampling program 1201, they are aligned and correlated by sensitivity analysis tool 1202. For example, in one embodiment of the present invention sensitivity analysis tool 1202 incorporates a novel moving window technique that “slides” through the telemetry variables with systematically varying window widths. The system systematically varies the alignment between sliding windows for different telemetry variables to optimize the degree of association between the telemetry variables, as quantified by an “Fstatistic,” which is computed and ranked for all telemetry variable windows by sensitivity analysis tool 1202.
While statistically comparing the quality of two fits, Fstatistics reveal the measure of regression. The higher the value of the Fstatistic, the better the correlation is between two telemetry variables. The lead/lag value for the sliding window that results in the Fstatistic with the highest value is chosen, and the candidate telemetry variable is aligned to maximize this value. This process is repeated for each telemetry variable by sensitivity analysis tool 1202.
Telemetry variables that have an Fstatistic very close to 1 are “completely correlated” and can be discarded. This can result when two telemetry variables are measuring the same metric, but are expressing them in different engineering units. For example, a telemetry variable can convey a temperature in degrees Fahrenheit, while a second telemetry variable conveys the same temperature in degrees Centigrade. Since these two telemetry variables are perfectly correlated, one does not contain any additional information over the other, and therefore, one may be discarded.
Some telemetry variables may exhibit little correlation, or no correlation whatsoever. In this case, these telemetry variables may be dropped because they add little predictive information. Once a highly correlated subset of the telemetry variables has been determined, they are combined into one group or cluster for processing by the SPRT module 1203.
Some embodiments of the present invention continuously monitor a variety of telemetry variables (e.g., sensor signals) in real time during operation of the server. (Note that although we refer to a single computer system in this disclosure, the present invention can also apply to a collection of computer systems).
These telemetry variables can also include signals associated with internal performance parameters maintained by software within the computer system. For example, these internal performance parameters can include, but are not limited to, system throughput, transaction latencies, queue lengths, central processing unit (CPU) utilization, load on CPU, idle time, memory utilization, load on the memory, load on the cache, I/O traffic, bus saturation metrics, FIFO overflow statistics, network traffic, diskrelated metrics, and various operational profiles gathered through “virtual sensors” located within the operating system.
These telemetry variables can also include signals associated with canary performance parameters for synthetic user transactions, which are periodically generated for the purpose of measuring quality of service from the end user's perspective.
These telemetry variables can additionally include hardware variables, including, but not limited to, internal temperatures, voltages, currents, vibration, optical power, optical wavelength, air velocity, measures of signal integrity, and fan speeds. In some embodiments, the measures of signal integrity include, but are not limited to, signal/noise ratio, a biterror rate, the number of times an operation in the component is retried, the size of an eyediagram opening, the height of the eyediagram opening, and the width of the eyediagram opening.
Furthermore, these telemetry variables can include diskrelated metrics for a remote storage device, including, but not limited to, average service time, average response time, number of kilobytes (kB) read per second, number of kB written per second, number of read requests per second, number of write requests per second, and number of soft errors per second.
In one embodiment of the present invention, the foregoing telemetry variables are monitored continuously with one or more SPRT tests.
In one embodiment of the present invention, the components from which the telemetry variables originate are field replaceable units (FRUs), which can be independently monitored. Note that all major system components, including both hardware and software components, can be decomposed into FRUs. (For example, a software FRU can include: an operating system, a middleware component, a database, or an application.)
Also note that the present invention is not meant to be limited to server computer systems. In general, the present invention can be applied to any type of computer system. This includes, but is not limited to, a computer system based on a microprocessor, a mainframe computer, a digital signal processor, a portable computing device, a personal organizer, a device controller, and a computational engine within an appliance.
Furthermore, note that the telemetry signals collected by realtime telemetry system 104 do not need to be processed in realtime. For example, the telemetry signals can collected and processed at a later time (e.g., for reliability studies).
SPRTThe Sequential Probability Ratio Test is a statistical hypothesis test that differs from standard fixedsample tests. In fixedsample statistical tests, a given number of observations are used to select one hypothesis from one or more alternative hypotheses. The SPRT, however, examines one observation at a time, and then makes a decision as soon as it has sufficient information to ensure that prespecified confidence bounds are met.
The basic approach taken by the SPRT technique is to analyze successive observations of a discrete process. Let y<sub>n </sub>represent a sample from the process at a given moment t<sub>n </sub>in time. In one embodiment of the present invention, the sequence of values {Y<sub>n</sub>}=y<sub>0</sub>, y<sub>1</sub>, . . . , y<sub>n </sub>comes from a stationary process characterized by a Gaussian, whitenoise probability density function (PDF) with mean 0. (Note that since the sequence is from nominally stationary processes, any process variables with a nonzero mean can be first normalized to a mean of zero with no loss of generality).
The SPRT is a binary hypothesis test that analyzes process observations sequentially to determine whether or not the telemetry variable is consistent with normal behavior. When an SPRT reaches a decision about current process behavior (i.e., the telemetry variable is behaving normally or abnormally), the system reports the decision and continues to process observations.
For each of the eight types of tandem SPRT tests described below, the telemetry variable data adheres to a Gaussian PDF with mean 0 and variance σ<sup>2 </sup>for normal signal behavior, referred to as the null hypothesis, H<sub>0</sub>. The system computes eight specific SPRT hypothesis tests in parallel for each telemetry variable monitored. One embodiment of the present invention applies an SPRT to an electrical current timeseries. Other embodiments of the present invention apply an SPRT to other telemetry variables, including voltage, internal temperature, stress variables, and other hardware and software variables.
The SPRT surveillance module executes all eight tandem hypothesis tests in parallel. Each test determines whether the current sequence of process observations is consistent with the null hypothesis versus an alternative hypothesis. The first four tests are: (SPRT 1) the positivemean test, (SPRT 2) the negativemean test, (SPRT 3) the nominalvariance test, and (SPRT 4) the inversevariance test. For the positivemean test, the telemetry variable data for the corresponding alternative hypothesis, H<sub>1</sub>, adheres to a Gaussian PDF with mean +M and variance σ<sup>2</sup>. For the negativemean test, the telemetry variable data for the corresponding alternative hypothesis, H<sub>2</sub>, adheres to a Gaussian PDF with mean −M and variance σ<sup>2</sup>. For the nominalvariance test, the telemetry variable data for the corresponding alternative hypothesis, H<sub>3</sub>, adheres to a Gaussian PDF with mean 0 and variance Vσ<sup>2 </sup>(with scalar factor V). For the inversevariance test, the telemetry variable data for the corresponding alternative hypothesis, H<sub>4</sub>, adheres to a Gaussian PDF with mean 0 and variance σ<sup>2</sup>/V.
The next two tandem SPRT tests (SPRTs 5 and 6) are performed not on the raw telemetry variables as above, but on the first difference function of the telemetry variable. For discrete time series, the firstdifference function (i.e., difference between each observation and the observation preceding it) gives an estimate of the numerical derivative of the time series. During uninteresting time periods, the observations in the firstdifference function are a nominally stationary random process centered about zero. If an upward or downward trend suddenly appears in the telemetry variable, SPRTs number 5 and 6 observe an increase or decrease, respectively, in the slope of the telemetry variable.
For example, if there is a decrease in the value of the telemetry variable, SPRT alarms are triggered for SPRTs 2 and 6. SPRT 2 generates a warning because the sequence of raw observations drops with time. And SPRT 6 generates a warning because the slope of the telemetry variable changes from zero to something less than zero. The advantage of monitoring the mean SPRT and slope SPRT in tandem is that the system correlates the SPRT readings from the eight tests and determines if the component has failed. For example, if the telemetry variable levels off to a new stationary value (or plateau), the alarms from SPRT 6 cease because the slope returns to zero when the raw telemetry variable reaches a plateau. However, SPRT 2 will continue generating a warning because the new mean value of the telemetry variable is different from the value prior to the degradation. Therefore, the system correctly identifies that the component has failed.
If SPRTs 3 or 4 generate a warning, the variance of the telemetry variable is either increasing or decreasing, respectively. An increasing variance that is not accompanied by a change in mean (inferred from SPRTs 1 and 2 and SPRTs 5 and 6) signifies an episodic event that is “bursty” or “spiky” with time. A decreasing variance that is not accompanied by a change in mean is a common symptom of a failing component that is characterized by an increasing time constant. Therefore, having variance SPRTs available in parallel with slope and mean SPRTs provides a wealth of supplementary diagnostic information.
The final two tandem SPRT tests, SPRT 7 and SPRT 8, are performed on the firstdifference function of the variance estimates for the telemetry variable. The firstdifference function of the variance estimates is a numerical approximation of the derivative of the sequence of variance estimates. As such, SPRT 7 triggers a warning flag if the variance of the telemetry variable is increasing, while SPRT 8 triggers a warning flag if the variance of the telemetry variable is decreasing. A comparison of SPRT alarms from SPRTs 3, 4, 7, and 8, gives a great deal of diagnostic information on a class of failure modes known collectively as a “changeingain without a changeinmean.” For example, if SPRTs 3 and 7 both trigger warning flags, it is an indication that there has been a sudden increase in the variance of the process. If SPRT 3 continues to trigger warning flags but SPRT 7 ceases issuing warning flags, it is an indication that the degradation mode responsible for the increased noisiness has gone to completion. Such information can be beneficial in root causing the origin of the degradation and eliminating it from future product designs.
Similarly, if SPRTs 4 and 8 both start triggering alarms, there is a decrease in variance for the process. If SPRT 4 continues to issue warning flags but SPRT 8 ceases issuing warning flags, it is an indication that the degradation mode has gone to completion. In safetycritical processes, this failure mode (decreasing variance without a change in mean) is dangerous in conventional systems that are monitored only by threshold limit tests. The reason it is dangerous is that a shrinking variance, when it occurs as a result of a transducer that is losing its ability to respond, never trips a threshold limit. (In contrast degradation that manifests as a linear decalibration bias, or even an increasing variance, eventually trips a high or low threshold limit and sounds a warning). A sustained decreasing variance, which happens, for example, when oilfilled pressure transmitters leak their oil, or electrolytic capacitors leak their electrolyte, never trips a threshold in conventional systems, but will be readily detected by the suite of eight tandem SPRT tests taught in this invention.
The SPRT technique provides a quantitative framework that permits a decision to be made between the null hypothesis and the eight alternative hypotheses with specified misidentification probabilities. If the SPRT accepts one of the alternative hypotheses, an alarm flag is set and data is transmitted.
The SPRT operates as follows. At each time step in a calculation, the system calculates a test index and compares it to two stopping boundaries A and B (defined below). The test index is equal to the natural log of a likelihood ratio (L<sub>n</sub>), which for a given SPRT is the ratio of the probability that the alternative hypothesis for the test (H<sub>j</sub>, where j is the appropriate subscript for the SPRT in question) is true, to the probability that the null hypothesis (H<sub>0</sub>) is true.
<maths id="MATHUS00015" num="00015"><math overflow="scroll"><mtable><mtr><mtd><mrow><msub><mi>L</mi><mi>n</mi></msub><mo>=</mo><mfrac><mtable><mtr><mtd><mrow><mi>probability</mi><mo></mo><mstyle><mspace width="0.8em" height="0.8ex"/></mstyle><mo></mo><mi>of</mi><mo></mo><mstyle><mspace width="0.8em" height="0.8ex"/></mstyle><mo></mo><mi>observed</mi></mrow></mtd></mtr><mtr><mtd><mrow><mi>sequence</mi><mo></mo><mstyle><mspace width="0.6em" height="0.6ex"/></mstyle><mo></mo><mrow><mo>{</mo><msub><mi>Y</mi><mi>n</mi></msub><mo>}</mo></mrow><mo></mo><mstyle><mspace width="0.6em" height="0.6ex"/></mstyle><mo></mo><mi>given</mi><mo></mo><mstyle><mspace width="0.8em" height="0.8ex"/></mstyle><mo></mo><msub><mi>H</mi><mi>j</mi></msub><mo></mo><mstyle><mspace width="0.8em" height="0.8ex"/></mstyle><mo></mo><mi>is</mi><mo></mo><mstyle><mspace width="0.8em" height="0.8ex"/></mstyle><mo></mo><mi>true</mi></mrow></mtd></mtr></mtable><mtable><mtr><mtd><mrow><mi>probability</mi><mo></mo><mstyle><mspace width="0.8em" height="0.8ex"/></mstyle><mo></mo><mi>of</mi><mo></mo><mstyle><mspace width="0.8em" height="0.8ex"/></mstyle><mo></mo><mi>observed</mi></mrow></mtd></mtr><mtr><mtd><mrow><mi>sequence</mi><mo></mo><mstyle><mspace width="0.6em" height="0.6ex"/></mstyle><mo></mo><mrow><mo>{</mo><msub><mi>Y</mi><mi>n</mi></msub><mo>}</mo></mrow><mo></mo><mstyle><mspace width="0.6em" height="0.6ex"/></mstyle><mo></mo><mi>given</mi><mo></mo><mstyle><mspace width="0.8em" height="0.8ex"/></mstyle><mo></mo><msub><mi>H</mi><mn>0</mn></msub><mo></mo><mstyle><mspace width="0.8em" height="0.8ex"/></mstyle><mo></mo><mi>is</mi><mo></mo><mstyle><mspace width="0.8em" height="0.8ex"/></mstyle><mo></mo><mi>true</mi></mrow></mtd></mtr></mtable></mfrac></mrow></mtd><mtd><mrow><mo>(</mo><mn>21</mn><mo>)</mo></mrow></mtd></mtr></mtable></math></maths>
If the logarithm of the likelihood ratio is greater than or equal to the logarithm of the upper threshold limit [i.e., ln(L<sub>n</sub>)>ln(B)], then the alternative hypothesis is true. If the logarithm of the likelihood ratio is less than or equal to the logarithm of the lower threshold limit [i.e., ln(L<sub>n</sub>)<ln(A)], then the null hypothesis is true. If the loglikelihood ratio falls between the two limits, [i.e., ln(A)<ln(L<sub>n</sub>)<ln(B)], then there is not enough information to make a decision (and, incidentally, no other statistical test could yet reach a decision with the same given Type I and II misidentification probabilities).
Equation (22) relates the threshold limits to the misidentification probabilities α and β:
<maths id="MATHUS00016" num="00016"><math overflow="scroll"><mtable><mtr><mtd><mrow><mrow><mi>A</mi><mo>=</mo><mfrac><mi>β</mi><mrow><mn>1</mn><mo></mo><mi>α</mi></mrow></mfrac></mrow><mo>,</mo><mstyle><mspace width="0.8em" height="0.8ex"/></mstyle><mo></mo><mrow><mi>B</mi><mo>=</mo><mfrac><mrow><mn>1</mn><mo></mo><mi>β</mi></mrow><mi>α</mi></mfrac></mrow></mrow></mtd><mtd><mrow><mo>(</mo><mn>22</mn><mo>)</mo></mrow></mtd></mtr></mtable></math></maths>
where α is the probability of accepting H<sub>j </sub>when H<sub>0 </sub>is true (i.e., the falsealarm probability), and β is the probability of accepting H<sub>0 </sub>when H<sub>j </sub>is true (i.e., the missedalarm probability).
The first two SPRT tests for normal distributions examine the mean of the process observations. If the distribution of observations exhibits a nonzero mean (e.g., a mean of either +M or −M, where M is the preassigned system disturbance magnitude for the mean test), the mean tests determine that the system is degraded. Assuming that the sequence {Y<sub>n</sub>} adheres to a Gaussian PDF, then the probability that the null hypothesis H<sub>0 </sub>is true (i.e., mean 0 and variance σ<sup>2</sup>) is:
<maths id="MATHUS00017" num="00017"><math overflow="scroll"><mtable><mtr><mtd><mrow><mrow><mi>P</mi><mo></mo><mrow><mo>(</mo><mrow><msub><mi>y</mi><mn>1</mn></msub><mo>,</mo><msub><mi>y</mi><mn>2</mn></msub><mo>,</mo><mi>…</mi><mo></mo><mstyle><mspace width="0.6em" height="0.6ex"/></mstyle><mo>,</mo><mrow><msub><mi>y</mi><mi>n</mi></msub><mo></mo><msub><mi>H</mi><mn>0</mn></msub></mrow></mrow><mo>)</mo></mrow></mrow><mo>=</mo><mrow><mfrac><mn>1</mn><msup><mrow><mo>(</mo><mrow><mn>2</mn><mo></mo><msup><mi>πσ</mi><mn>2</mn></msup></mrow><mo>)</mo></mrow><mrow><mi>n</mi><mo>/</mo><mn>2</mn></mrow></msup></mfrac><mo></mo><mrow><mi>exp</mi><mo></mo><mrow><mo>[</mo><mrow><mrow><mo></mo><mfrac><mn>1</mn><mrow><mn>2</mn><mo></mo><msup><mi>σ</mi><mn>2</mn></msup></mrow></mfrac></mrow><mo></mo><mrow><munderover><mo>∑</mo><mrow><mi>k</mi><mo></mo><mn>1</mn></mrow><mi>n</mi></munderover><mo></mo><msubsup><mi>y</mi><mi>k</mi><mn>2</mn></msubsup></mrow></mrow><mo>]</mo></mrow></mrow></mrow></mrow></mtd><mtd><mrow><mo>(</mo><mn>23</mn><mo>)</mo></mrow></mtd></mtr></mtable></math></maths>
Similarly, the probability for alternative hypothesis H<sub>1 </sub>is true (i.e., mean M and variance σ<sup>2</sup>) is:
<maths id="MATHUS00018" num="00018"><math overflow="scroll"><mtable><mtr><mtd><mrow><mrow><mi>P</mi><mo></mo><mrow><mo>(</mo><mrow><msub><mi>y</mi><mn>1</mn></msub><mo>,</mo><msub><mi>y</mi><mn>2</mn></msub><mo>,</mo><mi>…</mi><mo></mo><mstyle><mspace width="0.6em" height="0.6ex"/></mstyle><mo>,</mo><mrow><msub><mi>y</mi><mi>n</mi></msub><mo></mo><msub><mi>H</mi><mn>1</mn></msub></mrow></mrow><mo>)</mo></mrow></mrow><mo>=</mo><mrow><mfrac><mn>1</mn><msup><mrow><mo>(</mo><mrow><mn>2</mn><mo></mo><msup><mi>πσ</mi><mn>2</mn></msup></mrow><mo>)</mo></mrow><mrow><mi>n</mi><mo>/</mo><mn>2</mn></mrow></msup></mfrac><mo></mo><mrow><mi>exp</mi><mo></mo><mrow><mo>[</mo><mrow><mrow><mo></mo><mfrac><mn>1</mn><mrow><mn>2</mn><mo></mo><msup><mi>σ</mi><mn>2</mn></msup></mrow></mfrac></mrow><mo></mo><mrow><mo>(</mo><mrow><mrow><munderover><mo>∑</mo><mrow><mi>k</mi><mo></mo><mn>1</mn></mrow><mi>n</mi></munderover><mo></mo><msubsup><mi>y</mi><mi>k</mi><mn>2</mn></msubsup></mrow><mo></mo><mrow><mn>2</mn><mo></mo><mrow><munderover><mo>∑</mo><mrow><mi>k</mi><mo></mo><mn>1</mn></mrow><mi>n</mi></munderover><mo></mo><mrow><msub><mi>y</mi><mi>k</mi></msub><mo></mo><mi>M</mi></mrow></mrow></mrow><mo>+</mo><mrow><munderover><mo>∑</mo><mrow><mi>k</mi><mo></mo><mn>1</mn></mrow><mi>n</mi></munderover><mo></mo><msup><mi>M</mi><mn>2</mn></msup></mrow></mrow><mo>)</mo></mrow></mrow><mo>]</mo></mrow></mrow></mrow></mrow></mtd><mtd><mrow><mo>(</mo><mn>24</mn><mo>)</mo></mrow></mtd></mtr></mtable></math></maths>
The ratio of the probabilities in (23) and (24) gives the likelihood ratio L<sub>n </sub>for the positivemean test:
<maths id="MATHUS00019" num="00019"><math overflow="scroll"><mtable><mtr><mtd><mrow><msub><mi>L</mi><mi>n</mi></msub><mo>=</mo><mrow><mi>exp</mi><mo></mo><mrow><mo>[</mo><mrow><mrow><mo></mo><mfrac><mn>1</mn><mrow><mn>2</mn><mo></mo><msup><mi>σ</mi><mn>2</mn></msup></mrow></mfrac></mrow><mo></mo><mrow><munderover><mo>∑</mo><mrow><mi>k</mi><mo></mo><mn>1</mn></mrow><mi>n</mi></munderover><mo></mo><mrow><mi>M</mi><mo></mo><mrow><mo>(</mo><mrow><mi>M</mi><mo></mo><mrow><mn>2</mn><mo></mo><msub><mi>y</mi><mi>k</mi></msub></mrow></mrow><mo>)</mo></mrow></mrow></mrow></mrow><mo>]</mo></mrow></mrow></mrow></mtd><mtd><mrow><mo>(</mo><mn>25</mn><mo>)</mo></mrow></mtd></mtr></mtable></math></maths>
Taking the logarithm of the likelihood ratio given by (25) produces the SPRT index for the positivemean test (SPRT<sub>pos</sub>):
<maths id="MATHUS00020" num="00020"><math overflow="scroll"><mtable><mtr><mtd><mtable><mtr><mtd><mrow><msub><mi>SPRT</mi><mi>pos</mi></msub><mo>=</mo><mrow><mrow><mo></mo><mfrac><mn>1</mn><mrow><mn>2</mn><mo></mo><msup><mi>σ</mi><mn>2</mn></msup></mrow></mfrac></mrow><mo></mo><mrow><munderover><mo>∑</mo><mrow><mi>k</mi><mo></mo><mn>1</mn></mrow><mi>n</mi></munderover><mo></mo><mrow><mi>M</mi><mo></mo><mrow><mo>(</mo><mrow><mi>M</mi><mo></mo><mrow><mn>2</mn><mo></mo><msub><mi>y</mi><mi>k</mi></msub></mrow></mrow><mo>)</mo></mrow></mrow></mrow></mrow></mrow></mtd></mtr><mtr><mtd><mrow><mo>=</mo><mrow><mfrac><mi>M</mi><msup><mi>σ</mi><mn>2</mn></msup></mfrac><mo></mo><mrow><munderover><mo>∑</mo><mrow><mi>k</mi><mo></mo><mn>1</mn></mrow><mi>n</mi></munderover><mo></mo><mrow><mo>(</mo><mrow><msub><mi>y</mi><mi>k</mi></msub><mo></mo><mfrac><mi>M</mi><mn>2</mn></mfrac></mrow><mo>)</mo></mrow></mrow></mrow></mrow></mtd></mtr></mtable></mtd><mtd><mrow><mo>(</mo><mn>26</mn><mo>)</mo></mrow></mtd></mtr></mtable></math></maths>
The SPRT index for the negativemean test (SPRT<sub>neg</sub>) is derived by substituting −M for each instance of M in (24) through (26) above, resulting in:
<maths id="MATHUS00021" num="00021"><math overflow="scroll"><mtable><mtr><mtd><mrow><msub><mi>SPRT</mi><mi>neg</mi></msub><mo>=</mo><mrow><mrow><mo></mo><mfrac><mi>M</mi><msup><mi>σ</mi><mn>2</mn></msup></mfrac></mrow><mo></mo><mrow><munderover><mo>∑</mo><mrow><mi>k</mi><mo></mo><mn>1</mn></mrow><mi>n</mi></munderover><mo></mo><mrow><mo>(</mo><mrow><mrow><mo></mo><msub><mi>y</mi><mi>k</mi></msub></mrow><mo></mo><mfrac><mi>M</mi><mn>2</mn></mfrac></mrow><mo>)</mo></mrow></mrow></mrow></mrow></mtd><mtd><mrow><mo>(</mo><mn>27</mn><mo>)</mo></mrow></mtd></mtr></mtable></math></maths>
The next two SPRT tests examine the variance of the sequence. This capability gives the SPRT module the ability to detect and quantitatively characterize changes in variability for processes, which is vitally important for 6sigma QA/QC improvement initiatives. In the variance tests, the system is degraded if the sequence exhibits a change in variance by a factor of V or 1/V, where V, the preassigned system disturbance magnitude for the variance test, is a positive scalar. The probability that the alternative hypothesis H<sub>3 </sub>is true (i.e., mean 0 and variance Vσ<sup>2</sup>) is given by (23) with σ<sup>2 </sup>replaced by Vσ<sup>2</sup>:
<maths id="MATHUS00022" num="00022"><math overflow="scroll"><mtable><mtr><mtd><mrow><mrow><mi>P</mi><mo></mo><mrow><mo>(</mo><mrow><msub><mi>y</mi><mn>1</mn></msub><mo>,</mo><msub><mi>y</mi><mn>2</mn></msub><mo>,</mo><mi>…</mi><mo></mo><mstyle><mspace width="0.6em" height="0.6ex"/></mstyle><mo>,</mo><mrow><msub><mi>y</mi><mi>n</mi></msub><mo></mo><msub><mi>H</mi><mn>0</mn></msub></mrow></mrow><mo>)</mo></mrow></mrow><mo>=</mo><mrow><mfrac><mn>1</mn><msup><mrow><mo>(</mo><mrow><mn>2</mn><mo></mo><mi>π</mi><mo></mo><mstyle><mspace width="0.3em" height="0.3ex"/></mstyle><mo></mo><mi>V</mi><mo></mo><mstyle><mspace width="0.3em" height="0.3ex"/></mstyle><mo></mo><msup><mi>σ</mi><mn>2</mn></msup></mrow><mo>)</mo></mrow><mrow><mi>n</mi><mo>/</mo><mn>2</mn></mrow></msup></mfrac><mo></mo><mrow><mi>exp</mi><mo></mo><mrow><mo>[</mo><mrow><mrow><mo></mo><mfrac><mn>1</mn><mrow><mn>2</mn><mo></mo><mstyle><mspace width="0.3em" height="0.3ex"/></mstyle><mo></mo><mi>V</mi><mo></mo><mstyle><mspace width="0.3em" height="0.3ex"/></mstyle><mo></mo><msup><mi>σ</mi><mn>2</mn></msup></mrow></mfrac></mrow><mo></mo><mrow><munderover><mo>∑</mo><mrow><mi>k</mi><mo></mo><mn>1</mn></mrow><mi>n</mi></munderover><mo></mo><msubsup><mi>y</mi><mi>k</mi><mn>2</mn></msubsup></mrow></mrow><mo>]</mo></mrow></mrow></mrow></mrow></mtd><mtd><mrow><mo>(</mo><mn>28</mn><mo>)</mo></mrow></mtd></mtr></mtable></math></maths>
The likelihood ratio for the variance test is given by the ratio of (28) to (23):
<maths id="MATHUS00023" num="00023"><math overflow="scroll"><mtable><mtr><mtd><mrow><msub><mi>L</mi><mi>n</mi></msub><mo>=</mo><mrow><msup><mi>V</mi><mrow><mrow><mo></mo><mi>n</mi></mrow><mo>/</mo><mn>2</mn></mrow></msup><mo></mo><mrow><mi>exp</mi><mo></mo><mrow><mo>[</mo><mrow><mrow><mo></mo><mfrac><mn>1</mn><mrow><mn>2</mn><mo></mo><msup><mi>σ</mi><mn>2</mn></msup></mrow></mfrac></mrow><mo></mo><mfrac><mrow><mn>1</mn><mo></mo><mi>V</mi></mrow><mi>V</mi></mfrac><mo></mo><mrow><munderover><mo>∑</mo><mrow><mi>k</mi><mo></mo><mn>1</mn></mrow><mi>n</mi></munderover><mo></mo><msub><mi>y</mi><mi>k</mi></msub></mrow></mrow><mo>]</mo></mrow></mrow></mrow></mrow></mtd><mtd><mrow><mo>(</mo><mn>29</mn><mo>)</mo></mrow></mtd></mtr></mtable></math></maths>
Taking the logarithm of the likelihood ratio given in (29) produces the SPRT index for the nominalvariance test (SPRT<sub>nom</sub>)
<maths id="MATHUS00024" num="00024"><math overflow="scroll"><mtable><mtr><mtd><mrow><msub><mi>SPRT</mi><mi>nom</mi></msub><mo>=</mo><mrow><mrow><mfrac><mn>1</mn><mrow><mn>2</mn><mo></mo><msup><mi>σ</mi><mn>2</mn></msup></mrow></mfrac><mo></mo><mrow><mo>(</mo><mfrac><mrow><mi>V</mi><mo></mo><mn>1</mn></mrow><mi>V</mi></mfrac><mo>)</mo></mrow><mo></mo><mrow><munderover><mo>∑</mo><mrow><mi>k</mi><mo></mo><mn>1</mn></mrow><mi>n</mi></munderover><mo></mo><msubsup><mi>y</mi><mi>k</mi><mn>2</mn></msubsup></mrow></mrow><mo></mo><mrow><mfrac><mi>n</mi><mn>2</mn></mfrac><mo></mo><mi>ln</mi><mo></mo><mstyle><mspace width="0.3em" height="0.3ex"/></mstyle><mo></mo><mi>V</mi></mrow></mrow></mrow></mtd><mtd><mrow><mo>(</mo><mn>30</mn><mo>)</mo></mrow></mtd></mtr></mtable></math></maths>
The SPRT index for the inversevariance test (SPRT<sub>inv</sub>) is derived by substituting 1/V for each instance of V in (28) through (30), resulting in:
<maths id="MATHUS00025" num="00025"><math overflow="scroll"><mtable><mtr><mtd><mrow><msub><mi>SPRT</mi><mi>inv</mi></msub><mo>=</mo><mrow><mrow><mfrac><mn>1</mn><mrow><mn>2</mn><mo></mo><msup><mi>σ</mi><mn>2</mn></msup></mrow></mfrac><mo></mo><mrow><mo>(</mo><mrow><mn>1</mn><mo></mo><mi>V</mi></mrow><mo>)</mo></mrow><mo></mo><mrow><munderover><mo>∑</mo><mrow><mi>k</mi><mo></mo><mn>1</mn></mrow><mi>n</mi></munderover><mo></mo><msubsup><mi>y</mi><mi>k</mi><mn>2</mn></msubsup></mrow></mrow><mo>+</mo><mrow><mfrac><mi>n</mi><mn>2</mn></mfrac><mo></mo><mi>ln</mi><mo></mo><mstyle><mspace width="0.3em" height="0.3ex"/></mstyle><mo></mo><mi>V</mi></mrow></mrow></mrow></mtd><mtd><mrow><mo>(</mo><mn>31</mn><mo>)</mo></mrow></mtd></mtr></mtable></math></maths>
The tandem SPRT module performs mean, variance, and SPRT tests on the raw process telemetry variable and on its first difference function. To initialize the module for analysis of a telemetry variable timeseries, the user specifies the system disturbance magnitudes for the tests (M and V), the falsealarm probability (α), and the missedalarm probability (β).
Then, during the training phase (before the first failure of a component being monitored), the module calculates the mean and variance of the monitored variable process signal. For most telemetry variables the mean of the raw observations for the telemetry variable will be nonzero; in this case the mean calculated from the training phase is used to normalize the telemetry variable during the monitoring phase. The system disturbance magnitude for the mean tests specifies the number of standard deviations (or fractions thereof) that the distribution must shift in the positive or negative direction to trigger an alarm. The system disturbance magnitude for the variance tests specifies the fractional change of the variance necessary to trigger an alarm.
At the beginning of the monitoring phase, the system sets all eight SPRT indices to 0. Then, during each time step of the calculation, the system updates the SPRT indices using (26), (27), (30), and (31). The system compares each SPRT index to the upper [i.e., ln((1β)/α] and lower [i.e., ln((β/(1−α))] decision boundaries, with these three possible outcomes:
 1. the lower limit is reached, in which case the process is declared healthy, the test statistic is reset to zero, and sampling continues;
 2. the upper limit is reached, in which case the process is declared degraded, an alarm flag is raised indicating a sensor or process fault, the test statistic is reset to zero, and sampling continues; or
 3. neither limit has been reached, in which case no decision concerning the process can yet be made, and the sampling continues.
The advantages of using an SPRT are twofold:
 1. early detection of very subtle anomalies in noisy process variables; and
 2. prespecification of quantitative falsealarm and missedalarm probabilities.
The foregoing descriptions of embodiments of the present invention have been presented only for purposes of illustration and description. They are not intended to be exhaustive or to limit the present invention to the forms disclosed. Accordingly, many modifications and variations will be apparent to practitioners skilled in the art. Additionally, the above disclosure is not intended to limit the present invention. The scope of the present invention is defined by the appended claims.