Systems and methods for selecting training data and generating fault models for use in sensorbased monitoring

0Associated
Cases 
0Associated
Defendants 
0Accused
Products 
5Forward
Citations 
0
Petitions 
3
Assignments
First Claim
1. A system for generating a sensor model for use in sensorbased monitoring, the system comprising:
 a segmenting module for segmenting a collection of sensor vectors into a plurality of bins comprising distinct sensor vectors;
a setgenerating module for generating a set of statistically significant sensor vectors for each bin;
a consistency determination module for generating at least one consistent set of sensor vectors from the sets of statistically significant sensor vectors; and
a modelgenerating module for generating a sensor model based upon the at least one consistent set.
3 Assignments
0 Petitions
Accused Products
Abstract
A system for generating a sensor model for use in sensorbased monitoring is provided. The system includes a segmenting module for segmenting a collection of sensor vectors into a plurality of bins comprising distinct sensor vectors. The system also includes a setgenerating module for generating a set of statistically significant sensor vectors for each bin. The system further includes a consistency determination module for generating at least one consistent set of sensor vectors from the sets of statistically significant sensor vectors. Additionally, the system includes a modelgenerating module for generating a sensor model based upon the at least one consistent set.
18 Citations
View as Search Results
Grid interconnection device, grid interconnection system, and electric power control system  
Patent #
US 7,840,313 B2
Filed 01/29/2008

Current Assignee
Sanyo Electric Company Limited

Sponsoring Entity
Sanyo Electric Company Limited

GRID INTERCONNECTION DEVICE, GRID INTERCONNECTION SYSTEM, AND ELECTRIC POWER CONTROL SYSTEM  
Patent #
US 20080179966A1
Filed 01/29/2008

Current Assignee
Sanyo Electric Company Limited

Sponsoring Entity
Sanyo Electric Company Limited

ACCURACYOPTIMAL CONTROL DECISIONS FOR SYSTEMS  
Patent #
US 20150100165A1
Filed 12/12/2014

Current Assignee
Vigilent Corporation

Sponsoring Entity
Vigilent Corporation

Accuracyoptimal control decisions for systems  
Patent #
US 9,291,358 B2
Filed 12/12/2014

Current Assignee
Vigilent Corporation

Sponsoring Entity
Vigilent Corporation

Sensor data compression for downhole telemetry applications  
Patent #
US 10,465,504 B2
Filed 07/28/2015

Current Assignee
Halliburton Energy Services Incorporated

Sponsoring Entity
Halliburton Energy Services Incorporated

Method and apparatus for enhanced service quality through remote diagnostics  
Patent #
US 6,175,934 B1
Filed 12/15/1997

Current Assignee
GE Global Sourcing LLC

Sponsoring Entity
General Electric Company

Method and apparatus for selecting condition indicators in determining the health of a component  
Patent #
US 6,847,917 B2
Filed 12/04/2001

Current Assignee
Simmonds Precision Products Inc.

Sponsoring Entity
Simmonds Precision Products Inc.

Adaptive modelbased control systems and methods for controlling a gas turbine  
Patent #
US 6,823,675 B2
Filed 11/13/2002

Current Assignee
General Electric Company

Sponsoring Entity
General Electric Company

Remote tuning for gas turbines  
Patent #
US 20030018394A1
Filed 07/17/2001

Current Assignee
General Electric Company

Sponsoring Entity
General Electric Company

Data acquisition unit for remote monitoring system and method for remote monitoring  
Patent #
US 6,556,956 B1
Filed 06/30/2000

Current Assignee
General Electric Company

Sponsoring Entity
General Electric Company

Sensor validation apparatus and method  
Patent #
US 6,594,620 B1
Filed 12/29/1999

Current Assignee
Aspen Technology

Sponsoring Entity
Aspen Technology

Method and system for monitoring the operation of and predicting part life consumption for turbomachinery  
Patent #
US 6,343,251 B1
Filed 12/07/2000

Current Assignee
General Electric Company

Sponsoring Entity
General Electric Company

System for extraction of representative data for training of adaptive process monitoring equipment  
Patent #
US 20020087290A1
Filed 02/16/2001

Current Assignee
SmartSignal Corporation

Sponsoring Entity
SmartSignal Corporation

Method and apparatus for detecting and compensating for compressor surge in a gas turbine using remote monitoring and diagnostics  
Patent #
US 6,438,484 B1
Filed 05/23/2001

Current Assignee
General Electric Company

Sponsoring Entity
General Electric Company

Remote diagnostic system and method collecting sensor data according to two storage techniques  
Patent #
US 6,499,114 B1
Filed 02/17/1999

Current Assignee
General Electric Company

Sponsoring Entity
General Electric Company

Analyzer for modeling and optimizing maintenance operations  
Patent #
US 6,246,972 B1
Filed 05/27/1999

Current Assignee
Aspen Technology

Sponsoring Entity
Aspen Technology

Remote information service access system based on a clientserverservice model  
Patent #
US 5,544,320 A
Filed 06/07/1995

Current Assignee
Allan M. Konrad

Sponsoring Entity
Allan M. Konrad

System and method for monitoring and controlling operation of industrial gas turbine apparatus and gas turbine electric power plants preferably with a digital computer control system  
Patent #
US 4,283,634 A
Filed 12/26/1972

Current Assignee
Siemens Westinghouse Power Corporation

Sponsoring Entity
Westinghouse Electric Company LLC

22 Claims
 1. A system for generating a sensor model for use in sensorbased monitoring, the system comprising:
 a segmenting module for segmenting a collection of sensor vectors into a plurality of bins comprising distinct sensor vectors;
a setgenerating module for generating a set of statistically significant sensor vectors for each bin;
a consistency determination module for generating at least one consistent set of sensor vectors from the sets of statistically significant sensor vectors; and
a modelgenerating module for generating a sensor model based upon the at least one consistent set.  View Dependent Claims (2, 3, 4, 5, 6)
 a segmenting module for segmenting a collection of sensor vectors into a plurality of bins comprising distinct sensor vectors;
 7. A system for generating statistically significant and consistent sets of training data usable in training a statistical model for purposes of sensorbased monitoring, the system comprising:
 a segmenting module for segmenting a plurality of sensor vectors into at least two different bins, each bin containing distinct sensor vectors;
a setgenerating module for generating a first set by selecting and including in the first set at least one sensor vector from the first bin if the sensor vector from the first bin is statistically significant, and generating a second set by selecting and including in the second set at least one sensor vector from the second bin if the sensor vector from the second bin is statistically significant; and
a consistency determination module for adding the second set to the first set if the second set is consistent with the first set.  View Dependent Claims (8, 9, 10)
 a segmenting module for segmenting a plurality of sensor vectors into at least two different bins, each bin containing distinct sensor vectors;
 11. A method for generating a sensor model for use in sensorbased monitoring, the method comprising the steps of:
 segmenting a collection of sensor vectors into a plurality of bins comprising distinct sensor vectors;
generating a set of statistically significant sensor vectors for each bin;
generating at least one consistent set of sensor vectors from the sets of statistically significant sensor vectors; and
generating a sensor model based upon the at least one consistent set.  View Dependent Claims (12, 13, 14, 15, 16)
 segmenting a collection of sensor vectors into a plurality of bins comprising distinct sensor vectors;
 17. A method of selecting training data usable in generating a statistical model for purposes of sensorbased monitoring, the method comprising the steps of:
 segmenting a plurality of sensor vectors into at least two distinct bins, each bin containing distinct sensor vectors;
generating a first set by selecting and including in the first set a sensor vector from the first bin if the sensor vector from the first bin is statistically significant;
generating a second set by selecting and including in the second set a sensor vector from the second bin if the sensor vector from the second bin is statistically significant; and
combining the second set with the first set if the second trainingdata set is consistent with the first set.  View Dependent Claims (18, 19, 20)
 segmenting a plurality of sensor vectors into at least two distinct bins, each bin containing distinct sensor vectors;
 21. A computerreadable storage medium for use in sensorbased monitoring, the storage medium comprising computer instructions for:
 segmenting training data comprising a plurality of sensor vectors into at least two distinct bins, each bin containing distinct sensor vectors;
generating a first set by selecting and including in the first set a sensor vector from the first bin if the sensor vector from the first bin is statistically significant;
generating a second set by selecting and including in the second set a sensor vector from the second bin if the sensor vector from the second bin is statistically significant; and
forming a consistent set by combining the second trainingdata set with the first trainingdata set if the second trainingdata set is consistent with the first trainingdata set.  View Dependent Claims (22)
 segmenting training data comprising a plurality of sensor vectors into at least two distinct bins, each bin containing distinct sensor vectors;
1 Specification
The present invention pertains to the field of sensorbased monitoring, and, more particularly, to the field of sensor based monitoring of power generating systems.
BACKGROUND OF THE INVENTIONSensorbased monitoring can be used in a variety of industrial settings. Power generating systems, manufacturing processes, and a host of other industrial operations involving the coordinated functioning of largescale, multicomponent systems can all be efficiently controlled through sensorbased monitoring. Indeed, sensorbased monitoring can be advantageously employed in virtually any environment in which various systemspecific parameters need to be monitored over time under different conditions.
The control of a system or process typically entails monitoring various physical indicators under different operating conditions, and can be facilitated by sensorbased monitoring. Monitored indicators can include temperature, pressure, flows of both inputs and outputs, and various other operating conditions. The physical indicators are typically monitored using one or more transducers or other type of sensors.
An example of a system with which sensorbased monitoring can be advantageously used is an electrical power generation system. The generation of electrical power typically involves a largescale power generator such as a gas or steam turbine that converts mechanical energy into electrical energy through the process of electromagnetic induction to thereby provide an output of alternating electrical current. A power generator typically acts as reversed electric motor, in which a rotor carrying one or more coils is rotated within a magnetic field generated by an electromagnet. Important operating variables that should be closely monitored during the operation of a power generator include pressure and temperature in various regions of the power generator as well as the vibration of critical components. Accordingly, sensorbased monitoring is a particularly advantageous technique for monitoring the operation of a power generator.
Regardless of the setting in which it is used, a key task of sensorbased monitoring can be to evaluate data provided by a multitude of sensors. This can be done so as to detect and localize faults so that the faults can be corrected in a timely manner. With a power generating plant, in particular, the timely detection of faults can prevent equipment damage, reduce maintenance costs, and avoid costly, unplanned plant shutdowns.
Monitoring typically involves receiving sensorsupplied data, which can be mathematically represented in the form of sensor vectors. These sensor vectors provide data input into a model and are compared with estimated output values obtained by applying the model to the data input. Large deviations between the actual values of the sensor vectors and the estimated values generated by the model can indicate that a fault has occurred or is about to occur. Accordingly, accurate monitoring can depend critically on the accuracy of the model employed.
There are principally two approaches to constructing such a model. The first approach is referred to as principle or physical modeling, and involves constructing a largely deterministic model representing the physical phenomena that underlie the operation of a particular system or process. It can be the case, however, that the physical dimensions of the system are too numerous or too complex to lend themselves to an accurate representation using the physical model. Accordingly, it is sometimes necessary to resort to the second approach, that of statistical modeling. Sensorbased monitoring of a power generation system, largely because it can require the use of literally hundreds of sensors, can necessitate the construction of such a statistical model. Constructing a statistical model involves “training” a probabilistic model using historical data samples of the system. The purpose of training the model is to glean from the historical data the distribution of the sensor vectors when the system is operating normally.
An oftoverlooked fact with respect to conventional statistical modeling is that just as the actual monitoring depends critically on the accuracy of the model employed, so, in turn, the accuracy of the model depends critically on the data set used to train the model. Several drawbacks inherent in statistical modeling flow inevitably from difficulties associated with acquiring good data for training a model, especially in the context of monitoring a power generation system, for example.
Firstly, it is often not known whether there is a fault that has occurred during the training period in which data was collected. If there has been, then the inclusion of that data will obscure faults that may occur during actual testing or monitoring of a system or process.
Secondly, even if the training data is fault free, there can yet be large variations within the set of training data. This can occur if the data is collected during different modes of operation of a system. For example, in the context of a power generation system, the power generator can be operated in both a fullload (or base) mode as well as a partload mode. Because these operating modes are sufficiently different, the resulting training data will likely exhibit significant variability. This makes the difficult task of modeling a complex sensor vector distribution with a single model all the more problematic.
Thirdly, the training data can include data generated during transition periods as the system transitions from one mode of operation to another. For example, in the context of a power generation system, data collected during the time period in which the power generator is in transition between states will inevitably reflect an otherthannormal physical state of the generator. Inclusion of such data among the set of training data, accordingly, can skew the resulting model.
Conventional models have typically been constructed using simple threshold rules, with different thresholds set for individual sensors. Models so constructed generally tend to neglect the inherent problems already described. They also tend to obscure the fact that constructing models using conventional techniques with data that has wide variability results in a secondbest tradeoff. This tradeoff can necessitate a choice between relying on a limited, thresholdbased model or, alternatively, constructing multiple models from a data set that excludes relevant data.
Accordingly, there is a need for a system and method directed to the selection of data for training a model, especially one that can be used for sensorbased monitoring of a power generator or similar type system. Moreover, there is a need for a system and method that addresses the problem of having to either construct a limited thresholdbased model or construct multiple models on the basis of a reduced data set.
SUMMARY OF THE INVENTIONThe present invention provides a system for generating a sensor model for use in sensorbased monitoring. The system can include a segmenting module that segments a collection of sensor vectors into a plurality of bins comprising distinct sensor vectors. The system can also include a setgenerating module that generates a set of statistically significant sensor vectors for each of the bins. The, moreover, can also include a consistency determination module for generating at least one consistent set of sensor vectors from the sets of statistically significant sensor vectors. Additionally, the system can also include a modelgenerating module that generates a sensor model based upon the at least one consistent set.
The present invention also provides a system for generating statistically significant and consistent sets of training data that can be use in training a statistical model for purposes of sensorbased monitoring. The system can include a segmenting module for segmenting a plurality of sensor vectors into at least two different bins, each bin containing distinct sensor vectors. The system further can include a setgenerating module. The setgenerating module generates a first set by selecting and including in the first set at least one sensor vector from the first bin if the sensor vector from the first bin is statistically significant. The setgenerating module also generates a second set by selecting and including in the second set at least one sensor vector from the second bin if the sensor vector from the second bin is statistically significant. The system further includes a consistency determination module for adding the second set to the first set if the second set is consistent with the first set.
A method aspect of the present invention pertains to a method for generating a sensor model for use in sensorbased monitoring. The method can include the step of segmenting a collection of sensor vectors into a plurality of bins comprising distinct sensor vectors. The method also can include the steps of generating a set of statistically significant sensor vectors for each bin, and generating at least one consistent set of sensor vectors from the sets of statistically significant sensor vectors. The system further can include the step of generating a sensor model based upon the at least one consistent set.
An additional method aspect of the present invention pertains to a method of selecting training data usable in generating a statistical model for purposes of sensorbased monitoring. The method can include the step of segmenting a plurality of sensor vectors into at least two distinct bins, each bin containing distinct sensor vectors. The method also can include the steps of generating a first set by selecting and including in the first set a sensor vector from the first bin if the sensor vector from the first bin is statistically significant, and generating a second set by selecting and including in the second set a sensor vector from the second bin if the sensor vector from the second bin is statistically significant. The method further includes the step of combining the second set with the first set if the second trainingdata set is consistent with the first set.
BRIEF DESCRIPTION OF THE DRAWINGSThere are shown in the drawings, embodiments which are presently preferred, it being understood, however, that the invention is not limited to the precise arrangements and instrumentalities shown.
FIG. 1 is a schematic diagram of an electrical power generator monitored by sensorbased monitor, wherein the sensorbased monitor utilizes a system according to one embodiment of the present invention.
FIG. 2 is a schematic diagram of the system shown in FIG. 1.
FIG. 3 is a schematic diagram of a system for use in sensorbased monitoring according to another embodiment of the present invention.
FIG. 4 provides a flowchart illustrative of a method of generating a nondeterministic model for sensorbased monitoring according to yet another embodiment of the present invention.
FIG. 5 provides a flowchart illustrative of a method of generating sets of statistically significant and consistent sensor vectors according to still another embodiment of the present invention.
DETAILED DESCRIPTION OF PREFERRED EMBODIMENTSThe present invention provides a system 20 that performs two distinct functions related to sensorbased monitoring. The functions can be performed separately or jointly. First, the system 20 generates one or more unique sets of training data for training a nondeterministic sensorbased monitoring model. The training data contained in the one or more sets generated by the system 20 is characterized as being both statistically significant and consistent. Training data having these properties can be advantageously used with any of a number of distinct procedures for training a nondeterministic sensorbased monitoring model. Second, the system 20 generates one or more hybrid models. The system 20 generates one or more hybrid models by a unique clustering of available training data, the clustering being based on the statistical properties of the available training data.
FIG. 1 provides a schematic diagram the system 20 according to one embodiment of the present invention, the system being used in conjunction with sensorbased monitor 22 for monitoring an electrical power generator 24. The system 20 illustratively connects to the sensor monitor 22, which, in turn, connects to a plurality of sensors 26af. The sensor 26af, as shown, are each connected to the power generator 26 for supplying sensing data to the sensor monitor 22. The power generator 24 as used in the present description is merely representative of the various types of devices and processes for which sensorbased monitoring can be facilitated by the system 20.
The system 20, as described in greater detail below, generates a sensor model that can be used by the sensor monitor 22 as the latter monitors the state of the power generator 24. More particularly, the sensor monitor 22 can use the model to detect faults that may occur in the power generator 24 by identifying data that, based on the model, indicates the power generator is not operating within a set of predefined parameters as would be expected were the power generator 24 functioning properly.
As illustrated, the system 20 can be implemented as a standalone device comprising dedicated circuitry for receiving and processing data derived from signals generated by the plurality of sensors 26af and received by the sensor monitor 22. The system 20, as shown, can be closely adjacent to the power generator 26 and/or the sensor monitor 22. Alternately, the system 20 can be removed from the vicinity of one or the other of the power generator 24 and sensor monitor 22. The system 20, also can communicate directly with the sensor monitor 22, or, alternately, can communicate with the sensor monitor through various types of data communications networks ranging from a local area network (LAN) to the Internet.
The system 20, though illustrated as a separate device can alternately be incorporated as circuitry within the sensor monitor 22. As will be readily understood by those of ordinary skill in the art, the circuitry can include logic gates, memory components, and/or data buses for implementing each of the functions described. Alternately, the system 20 can be implemented as software configured to run on a general purpose computer or on an applicationspecific device, including the sensor monitor 22 itself. Accordingly, the system 20 can be implemented in one or more hardwired circuits or in a softwarebased set of instructions for carrying out the functions described below. The system 20 also can be implemented in a combination of hardwired circuits and softwarebased instructions
Virtually any number of sensors 26af can be employed, the number being practically limited only by the processing capacity of the circuitry used to implement the system 20 and sensor monitor 22. In practice, for example in monitoring a power generator or similar type plant, the actual sensors 24af are likely to number in the hundreds. A sensor 24af preferably is a transducer or similar type of device that, as readily understood by those of ordinary skill in the art, can generate a signal such as an electrical signal by converting energy from one form into another form. For example, one or more sensors can convert heat energy into an electrical signal so as to measure the temperature of the power generator in a selected region. Similarly, other sensors can be used to convert mechanical energy into an electrical signal so as to measure, for example, pressure in a selected region of the power generator. Still others can be used to generate electrical signals that indicate vibrations or rotations of components of the power generator. Accordingly, it will be readily appreciated that any of the various physical phenomena associated with the operation of a power plant or similar type device can be monitored using sensorbased monitoring.
Referring additionally now to FIG. 2, the system 20 illustratively includes a segmenting module 28 for segmenting a collection of sensor vectors into a plurality of bins, each bin containing distinct sensor vectors. As used herein, the term sensor vector denotes a mathematicallyoriented representation of the data derived from the signals generated by the plurality of sensors 24af. Accordingly, a sensor vector is representative of the different data forms that can be employed by the system 20. Accordingly, a sensor vector can be an ntuple or scalar, the values of which correspond to the sensorsupplied signals. For example, a simple sensor vector might be a 3tuple, with the first element representing a temperature associated with the generator, the second element representing vibration within the generator's stator, and the third element representing the rotation of the rotor within the stator. As already noted, in practice, actual sensor vectors associated with power generation are likely to be ntuples having many more than three elements.
The bins created by operation of the segmenting module 28 can be virtual bins, representing distinct sets of sensor vectors. The different sets of sensor vectors thus can correspond to different data collected at different times. In the context of power generation, the different data is likely to be collected, for example, over time periods that last several days in order to obtain a collection of training data that can be used to train a statistical model of the power generator as it operates under normal conditions.
In general, if d denotes the number of sensors used for monitoring a power generator, then d is the dimension of each sensor vector, and, thus, each sensor vector is dtuple. The power generator 26 can be monitored by the sensorbased monitor 22 continuously over some time span, thereby generating a collection of sensor vectors, each having dimension d. The segmenting module 28 segments the collection of sensor vectors into discrete bins, each containing a different set of sensor vectors. Note that the number of sensor vectors per bin can be equal among all the bins so created. The computational burden of the procedures to be described hereafter can be eased somewhat by making the number of sensor vectors in each bin equal, but this is not necessary to the results achieved by the system 20. Accordingly, the number of sensor vectors for each bin can alternatively vary rather than being uniform.
To facilitate the description, however, let the number of sensor vectors in each bin generated by the segmenting module 28 be equal to N. Note that N can be a function of the size of the entire collection of sensor vectors, which, in turn, is a function of the duration of time over which the data was collected. The time duration is influenced by the time period that the power generator functions normally, because it is desirable to collect data that reflects the normal operating state of the power generator so that an accurate model can be “trained” using the collected data.
To further facilitate the description, assume herein that the segmenting module 28 segments the collection of sensor vectors into K bins. Let D<sub>k </sub>represent the k<sup>th </sup>bin, where k=1, 2, . . . , K. Each sensor vector is denoted herein as x<sub>ki</sub>, where i=1, 2, . . . N. Thus, as assumed above, the number of sensor vectors in each bin is taken to be uniform among all the bins.
As noted above in the context of the system 20 generally, the segmenting module 28 can be implemented using one or more dedicated circuits having analog and/or digital components, including one or more logic gates and memory elements connected by one or more buses or other signalrelaying connectors. Alternately, the segmenting module 28 can be implemented in one or more sets of softwarebased, machinereadable instructions configured to run on a general purpose computer or applicationspecific device. The segmenting module 28 also can be implemented in a combination of hardwired circuits and softwarebase instructions.
The system 20 further includes a setgenerating module 30 for generating a set of statistically significant sensor vectors for each of the bins created by the segmenting module 28. The setgenerating module 30 can generate a set of statistically significant sensors by determining, for each sensor vector in each bin, a likelihood that the sensor vector has a predefined probability distribution. For example, if the power generator 26 is operating normally and is stable throughout the time in which the data is collected, then a reasonable assumption is the each of the x<sub>ki</sub>, i=1, 2, . . ., N, are identically and independently distributed, the specific distribution being a normal or Gaussian distribution.
Although, the embodiment described herein is based upon the stated assumption that the samples of sensor vectors in each bin have independent, identical Gaussian distributions, this assumption is not essential to the invention. Other distributions can be assumed. As will be readily understood by those of ordinary skill in the art, other distributions can be used when the operation of the particular system or process that is the basis of the model produces physical phenomena whose corresponding data have some distribution other than a Gaussian distribution. Moreover, as will also be readily understood by those of ordinary skill in the art, the law of large numbers can make the assumption of a normal or Gaussian distribution valid regardless of the underlying distribution of the data provided that the sample size, in terms of the number of sensor vectors in each bin, is sufficiently large. The manner in which the setgenerating module 30 respondsto a small N is described below.
The setgenerating module 30 is configured so as to compute the average of each element of the sensor vectors of the k<sup>th </sup>bin, and thereby generate a corresponding mean vector, m<sub>k</sub>, for the k<sup>th </sup>bin. The setgenerating module 30 is also configured to compute the pairwise covariance for each pair of sensor vectors of the k<sup>th </sup>bin, from which the setgenerating module 30 also generates a covariance matrix, Σ<sub>k</sub>, for the k<sup>th </sup>bin. Under the abovestated assumptions regarding the normality of the x<sub>ki </sub>and the equal number of sensor vectors in each bin, it follows that the power generator 26 can be modeled by K Nmember sets of sensor vectors having a Gaussian distribution with mean m<sub>k </sub>and covariance Σ<sub>k</sub>, the distribution being concisely denoted as N(m<sub>k</sub>, Σ<sub>k</sub>). If N is not sufficiently large with respect to d, then an estimate of Σ<sub>k </sub>can be obtained by simplifying the covariance matrix to be σ<sup>k</sup><sup>2</sup>I, where σ<sub>k </sub>is the variance of the sensor vectors of the k<sup>th </sup>bin and I is the identity matrix (a matrix whose diagonal elements are unity and whose offdiagonal elements are zero).
During a faulty period when the power generator 26 is not functioning within acceptable limits, or is in a transition state, the various x<sub>ki </sub>may not conform to the assumed Gaussian distribution. Accordingly, the setgenerating module 30 generates a statistically significant set corresponding to each bin by culling from each bin only those sensor vectors that satisfy the stated assumption regarding the sensor vectors' underlying probability distribution. That is, for each bin, each sensor vector in the bin is tested by the setgenerating module 30 to determine the likelihood that the sensor vector has the stated probability distribution.
If N is small, standard statistical tests such as the KolmogorovSmirnov test may not be appropriate for making the determination. Thus, according to one embodiment of the present invention, the setgenerating module 30 performs a chisquared test of normality in generating a set of statistically significant vectors for each bin. As is known, if the x<sub>ki </sub>have a Gaussian distribution, N(m<sub>k</sub>, Σ<sub>k</sub>), then the statistic y<sub>ki</sub>=(x<sub>ki</sub>−m<sub>k</sub>)<sup>T</sup>Σ<sub>k</sub><sup>−1</sup>(x<sub>ki</sub>−m<sub>k</sub>) has a chisquared distribution with d degrees of freedom. The test for each sensor vector in each bin is accordingly whether the statistic computed for each sensor vector satisfies the chisquared test at a given confidence level.
Instead of rejecting all sensor vectors in a particular bin if one of the sensor vectors fails to satisfy the chisquared test, the setgenerating module 30 tests each sensor vector in each bin individually, keeping the statistically significant sensor vectors and discarding the rest.
The chisquared test for determining statistical significance is illustratively implemented by the setgenerating module 30 performing each of the following operations. First, the setgenerating module 30 initializes the sensor vectors belonging to the k<sup>th </sup>bin, X<sub>k</sub>={x<sub>ki</sub>}, i=1, 2, . . . , N. The setgenerating module 30 then computes a mean vector, m<sub>k</sub>, and covariance matrix Σ<sub>k </sub>for the k<sup>th </sup>bin based upon the sensor vectors X<sub>k</sub>={x<sub>ki</sub>}, belonging to the k<sup>th </sup>bin. Next, for each sensor vector, x<sub>ki</sub>, contained in the set of sensor vectors, X<sub>k</sub>, in the k<sup>th </sup>bin, the setgenerating module 30 computes the abovedescribed statistic y<sub>ki</sub>=(x<sub>ki</sub>−m<sub>k</sub>)<sup>T</sup>Σ<sub>k</sub><sup>−1</sup>(x<sub>ki</sub>−m<sub>k</sub>). The setgenerating module 30 performs the chisquared test by comparing Yki to a predetermined threshold, ρ, where the threshold is preselected based upon a desired confidence level. Accordingly, if y<sub>ki</sub><ρ, then the corresponding x<sub>ki </sub>is deemed to be significant and included by the setgenerating module 30 in X<sub>k</sub>′, where X<sub>k</sub>′ denotes the set of statistically significant sensor vectors for the k<sup>th </sup>bin. The complete set of statistically significant sensor vectors for the k<sup>th </sup>bin is thus obtained by performing these operations on each of the sensor vectors in the k<sup>th </sup>bin, X<sub>k</sub>={x<sub>ki</sub>}, i=1, 2, . . . , N.
The setgenerating module 30 tests whether σ<sub>k</sub><T<sub>σ</sub>, where T<sub>σ</sub> is a predefined threshold. If the inequality holds, then the set of statistically significant sensor vectors for the k<sup>th </sup>bin, X<sub>k</sub>′, is deemed to satisfy the conditions for normality and is retained. Otherwise, the set of statistically significant sensor vectors for the k<sup>th </sup>bin, X<sub>k</sub>′ is dropped. This last procedure operates to exclude a set of sensor vectors for which the σ<sub>k </sub>is undesirably large. This helps ensure, for example, that data collected when the power generator 26 is operating in a transition state are not included among the sets of training data generated for use in training the model.
By repeated application of the chisquared test, the setgenerating module 30 generates a training set for each bin. In particular, the setgenerating module 30 generates a first trainingdata set by selecting and including in the first trainingdata set each sensor vector from the first bin that is statistically significant. Subsequently, the set generating module 30 generates a second trainingdata set by selecting and including in the second trainingdata set each sensor vector from the second bin that is statistically significant. The operation is repeated by the setgenerating module 30 until each sensor vector of each bin has been tested. Accordingly, the setgenerating module 30 generates a set of statistically significant sensor vectors for each bin.
As already discussed with respect to the system 20 generally and the segmenting module 28 specifically, the setgenerating module 30 can be implemented in one or more dedicated circuits having analog and/or digital components that can include one or more logic gates and memory elements connected by one or more signalrelaying connectors. The setgenerating module 30 alternatively can be implemented in one or more sets of softwarebased, machinereadable instructions configured to run on a general purpose computer or applicationspecific device. Additionally, the setgenerating module also can be implemented in a combination of hardwired circuits and machinereadable, softwarebase instructions.
The system 20 additionally includes a consistency determination module 32 that generates at least one consistent set of sensor vectors from the sets of statistically significant sensor vectors generated by the setgenerating module 30. Once an initial set of statistically significant sensor vectors for a bin has been generated by the setgenerating module 30, then any succeeding set of statistically significant sensor vectors obtained from another bin is checked for consistency with the initial set by the consistency determination module 32. The consistency determination module 32 combines any two or more sets of statistically significant sensor vectors that are also consistent with one another.
By the iterative repetition of the set generating operations and the combining of statistically significant sets based upon consistency, the system 20 generates one or more statistically significant and consistent sets of sensor vectors. Since a set is consistent with itself, the system 20 accordingly generates at least one set of statistically significant and consistent sensor vectors, provided that at least one bin contains at least one statistically significant sensor vector.
According to one embodiment, the consistency determination module 32 deems two sets to be consistent if a squared difference between a first vector mean computed for the sensor vectors of one of the two sets and a second vector mean computed for the sensor vectors of the other of the two sets is less than a preselected threshold.
Thus, after a set of statistically significant sensor vectors X<sub>k</sub>′={x<sub>ki</sub>′} has been generated by the set generating module 30, the consistency determination module 32 determines the set's consistency with respect to any set of statistically significant sensor vectors already generated. If none has already been generated, then X<sub>k</sub>′ is selected as the initial set, and a corresponding mean vector m<sub>k </sub>is computed based on the sensor vectors contained in X<sub>k</sub>′. A mean vector, w<sub>c</sub>, is selected based upon the m<sub>k</sub>. Preferably, w<sub>c</sub>=m<sub>k</sub>, c=1, 2, . . . , C, where w<sub>c </sub>denotes the mean vector of the statistically significant and consistent sets generated, and where C corresponds to the total number of physical states of the power generator 26.
The consistency determination model 32, according to one embodiment, determines whether a set of statistically significant sensor vectors X<sub>j </sub>is consistent by performing the following operations. A mean vector m<sub>j </sub>is determined based upon the statistically significant sensor vectors contained in X<sub>j</sub>. The squared difference between m<sub>j </sub>and w<sub>c</sub>, m<sub>k</sub>−w<sub>c</sub><sup>2</sup>, is then computed and its minimum over c=1, 2, . . . , C is determined. The minimum squared difference so determined is subsequently compared to a predetermined consistency threshold, T<sub>m</sub>. If the minimum squared difference is less than the consistency threshold, T<sub>m</sub>, such that<FORM>min<sub>c=1,2, . . . ,c</sub>{m<sub>j</sub>−w<sub>c</sub>}<T<sub>m</sub>, </FORM>then the X<sub>j </sub>on which the mean vector m<sub>j </sub>is based is assigned by the. consistency determination module 32 to the state c*. Otherwise, the consistency determination module 32 does not combine X<sub>j </sub>with any existing set of statistically significant and consistent sensor vectors. If, however, X<sub>j </sub>is added by the consistency determination module 32 to an existing set, then the consistency determination module revises w<sub>c* </sub>based upon all the sensor vectors associated with state c*.
Accordingly, if there are multiple states, for each of which there exists a statistically significant and consistent set of sensor vectors as determined above, then it follows that there are multiple sets of statistically significant and consistent sensor vectors. The resulting one or more sets of statistically significant and consistent sets of sensor vectors provide the training data for training the model that can be used for sensorbased monitoring of the power generator 26.
As used herein, the training of a model connotes the building of a representative probabilistic model. More particularly, the model can be built using the training data in combination with various statistical techniques, including linear and nonlinear regression, multivariate analysis, and nonparametric methods.
The consistency determination module 32 like the setgenerating module 30 and the segmenting module 28 can be implemented in one or more dedicated circuits having analog and/or digital components, including one or more logic gates and memory elements connected by one or more signalrelaying connectors, or alternately in one or more sets of softwarebased, machinereadable instructions configured to run on a general purpose computer or applicationspecific device. The consistency determination module 32 similarly can be implemented in a combination of hardwired circuits and machinereadable, softwarebase instructions.
As noted above, the generation of sets of statistically significant and consistent sensors vectors is a distinct aspect of the present invention. The sensor vectors belonging to the one or more sets of statistically significant and consistent sensors vectors provide the training data for training a model, and accordingly can be used for training virtually any model. Another aspect of the present invention, however, pertains to the generation of a specific model for use in sensorbased monitoring.
Accordingly, the system 20 further includes a modelgenerating module 34 for generating a sensor model based upon at least one consistent set of sensor vectors as generated by cooperative operating of the set generating module 30 and the consistency determination module 32. If the set generating module 30 and the consistency determination module 32 have cooperatively generated at least one other consistent set as well, then the modelgenerating module 34 computes a minimum residual for this other consistent set using the sensor model already generated. That is, the modelgenerating module 34 computes an estimated sensor vector for each sensor vector belonging to the at least one other consistent set, and computes a residual based upon the absolute value of the difference between the estimated sensor vector and the corresponding actual sensor vector on which the estimated sensor vector is based. The modelgenerating module 34 then determines the smallest residual so computed, the smallest residual defining a minimum residual.
The modelgenerating module 34 combines the at least one consistent set with the at least one other consistent set, and replaces the sensor model with a revised sensor model based upon the combination, if the minimum residual is less than a preselected residual threshold. Otherwise, if the minimum residual is not less than the preselected threshold, then the modelgenerating module 34 constructs an additional sensor model based upon the at least one other consistent set. The procedure can be repeated by the module generating module 34 for eachadditional consistent set that has also been cooperatively generated by the setgenerating module 30 and the consistency determination module 32. The repetition continues until, for each consistent set, either the set has been combined with another set or a distinct model has been generated for the consistent set.
As with each of the other modules of the system 20, the modelgenerating module 34 can be implemented in one or more hardwired circuits utilizing analog and/or digital components, and including one or more logic gates and memory elements connected by one or more signalrelaying connectors. Alternatively, the modelgenerating module 34 can be implemented in one or more sets of softwarebased, machinereadable instructions configured to run on a general purpose computer or applicationspecific device. The modelgenerating module 34 also can be implemented in a combination of dedicated circuits and machinereadable, softwarebase instructions.
As stated above, the generation of a specific model for use in sensorbased monitoring is a distinct aspect of the invention, and it can be utilized with other types of training data other than that which is selected or obtained through the generation of sets of statistically significant and consistent sensor vectors. Accordingly, as illustrated in FIG. 3, an alternative embodiment of the present invention is a system 120 that includes a segmenting module 128 for segmenting a collection of sensor vectors into a plurality of bins. The system 120, according to this embodiment, also includes a modelgenerating module 134 for training one or more hybrid models, each hybrid being generated using certain of the sensor vectors belonging to one or more of the plurality of bins.
After the segmenting module 128 segments the collection of sensor vectors, the modelgenerating module 134 generates an initial model using the sensor vectors belonging to one of the plurality of bins. Having generated an initial model, the modelgenerating module 134 computes residuals for another of the plurality of bins, the residuals being determined using the initial model. If the minimum of the residuals is less than a preselected threshold, then the modelgenerating module 134 combines the sensor vectors for which the residuals were determined with the sensor vectors which were used to generate the model. Otherwise, the modelgenerating module 134 generates a new model based upon the sensor vectors for which the residuals were determined. The operations are iteratively repeated until the sensor vectors for each of the bins have either been combined with those of another bin or used to form a distinct model.
The method aspects of the present invention include a method for generating a sensor model for use in sensorbased monitoring. One embodiment of the method is illustrated in the flowchart provided in FIG. 4. The method 400 illustratively includes at step 410 segmenting a collection of sensor vectors into a plurality of bins comprising distinct sensor vectors. At step 412, a set of statistically significant sensor vectors is generated for each bin. The set of statistically significant sensor vectors is illustratively generated by determining, for each sensor vector in a bin, the likelihood that the sensor vector has a predefined probability distribution. The likelihood is illustratively based upon a chisquared statistic.
The method 400 further includes at step 414 generating at least one consistent set of sensor vectors from the sets of statistically significant sensor vectors. The step of generating at least one consistent set illustratively includes combining at least two sets of statistically significant vectors if one of the two sets is consistent with the other. Two sets are consistent if a squared difference between a first mean vector computed for the sensor vectors of one of the two sets and a second mean vector computed for the sensor vectors of the other of the two sets is less than a preselected threshold. Finally, the method 400 includes at step 416 generating a sensor model based upon the at least one consistent set.
An additional method aspect of the present invention pertains to a method of generating sets of training data usable for generating a statistical model for purposes of sensorbased monitoring. One embodiment of this aspect of the present invention is illustrated by the flowchart provided in FIG. 5. The method 500 illustratively includes at step 510 segmenting a plurality of sensor vectors into at least two different bins, each bin containing distinct sensor vectors.
The method further includes at step 512 generating a first set by selecting and including in the first set at least one sensor vector from the first bin if the sensor vector from the first bin is statistically significant. The method also includes at step 514 generating a second set by selecting and including in the second set at least one sensor vector from the second bin if the sensor vector from the second bin is statistically significant. The determination of whether a sensor vector is statistically significant according to this embodiment is based upon a likelihood that the sensor vector has a predefined probability distribution. The likelihood is illustratively based upon a chisquared test of normality.
The method concludes at step 516 where combining the second set with the first set if the second set is consistent with the first trainingdata set. The step 516 of combining includes combining the second set with the first set only if a squared difference between a first vector mean based upon the first trainingdata set and second vector means based upon the second trainingdata set is less than a preselected threshold.
The present invention can be realized in hardware, software, or a combination of hardware and software. The present invention can be realized in a centralized fashion in one computer system, or in a distributed fashion where different elements are spread across several interconnected computer systems. Any kind of computer system or other apparatus adapted for carrying out the methods described herein is suited. A typical combination of hardware and software can be a general purpose computer system with a computer program that, when being loaded and executed, controls the computer system such that it carries out the methods described herein.
The present invention also can be embedded in a computer program product, which comprises all the features enabling the implementation of the methods described herein, and which when loaded in a computer system is able to carry out these methods. Computer program in the present context means any expression, in any language, code or notation, of a set of instructions intended to cause a system having an information processing capability to perform a particular function either directly or after either or both of the following: a) conversion to another language, code or notation; b) reproduction in a different material form.