Method for interpreting complex data and detecting abnormal instrumentor process behavior
First Claim
1. A method of using a computing device to conduct an analysis of a sample, comprising:
- (a) performing an analytical technique on the sample, said analytical technique being selected from the group consisting of chromatography and spectrometry so that a set of multivariate data which corresponds to the sample is produced;
(b) obtaining a series of representative multivariate data sets, wherein the representative multivariate data sets are obtained from the same type of analysis as was performed to produce the set of multivariate data in step (a);
(c) creating a model of the series of multivariate data sets obtained in step (b);
(d) creating individual residuals describing the portion of the multivariate data set obtained for each member of the calibration set which is not described by the model created in step (c);
(e) creating an average residual by averaging the individual residuals created in step (d);
(f) determining the distance between the individual residual for each member of the calibration set and the average residual;
(g) creating a residual describing the portion of the multivariate data set produced in step (a) which is not described by the model created in step (c);
(h) determining the distance between a residual obtained in step (g) and the average residual created in step (e);
(i) labeling as an outlier any set of multivariate data whose distance obtained in step (h) is statistically different from the set of distances determined in step (f); and
(j) checking for changes in feedstock, chemical processes and/or instruments used to make or evaluate the sample whenever one or more sets of multivariate data has been labelled as an outlier.
1 Assignment
0 Petitions
Accused Products
Abstract
An improved method is provided for determining when a set of multivariate data (such as a chromatogram or a spectrum) is an outlier. The method involves using a procedure such as Principal Component Analysis to create a model describing a calibration set of spectra or chromatograms which is known to be normal, and to create residuals describing the portion of a particular spectrum or chromatogram which is not described by the model. The improvement comprises using an average residual spectrum calculated for the calibration set, rather than the origin of the model as a reference point for comparing a spectrum or chromatogram obtained from an unknown sample. The present invention also includes separating a complex set of data into various sub-parts such as sub-chromatograms or sub-spectra, so that outliers in any sub-part can be more readily detected. In one particular embodiment, the invention is directed towards a method for dividing a chromatogram into the sub-parts of peak information, baseline shape, baseline offset, and noise.
-
Citations
26 Claims
-
1. A method of using a computing device to conduct an analysis of a sample, comprising:
-
(a) performing an analytical technique on the sample, said analytical technique being selected from the group consisting of chromatography and spectrometry so that a set of multivariate data which corresponds to the sample is produced; (b) obtaining a series of representative multivariate data sets, wherein the representative multivariate data sets are obtained from the same type of analysis as was performed to produce the set of multivariate data in step (a); (c) creating a model of the series of multivariate data sets obtained in step (b); (d) creating individual residuals describing the portion of the multivariate data set obtained for each member of the calibration set which is not described by the model created in step (c); (e) creating an average residual by averaging the individual residuals created in step (d); (f) determining the distance between the individual residual for each member of the calibration set and the average residual; (g) creating a residual describing the portion of the multivariate data set produced in step (a) which is not described by the model created in step (c); (h) determining the distance between a residual obtained in step (g) and the average residual created in step (e); (i) labeling as an outlier any set of multivariate data whose distance obtained in step (h) is statistically different from the set of distances determined in step (f); and (j) checking for changes in feedstock, chemical processes and/or instruments used to make or evaluate the sample whenever one or more sets of multivariate data has been labelled as an outlier. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A method of using a computing device to examine multivariate data to determine outliers, comprising the steps of:
-
(a) selecting a calibration set of multivariate data; (b) representing each member of the calibration set as a single point in a multidimensional axes system; (c) constructing a model describing the points of step (b); (d) obtaining a residual for each member of the calibration set by calculating the portion of each member which is not depicted by the model constructed in step (c); (e) creating an average residual by averaging the residuals of all of the calibration set members; (f) determining the distance between each of the residuals obtained in step (d) and the average residual obtained in step (e); (g) determining the average and standard deviation of the distances obtained in step (f); (h) calculating a t-distance for each member of the calibration set according to the formula;
##EQU2## where Disi is the distance obtained in step (f) for any member i, and AVE and STD are the average and standard deviation values obtained in step (g);(i) acquiring a set of multivariate data from a sample; (j) obtaining a residual for the sample by calculating the portion of the sample which was not depicted by the model constructed in step (c); (k) determining the distance between the residual obtained in step (j) and the average residual obtained in step (e); (l) calculating a t-distance for the sample according to the formula;
##EQU3## where Dissam is the distance obtained in step (k), and AVE and STD are the average and standard deviation values obtained in step (g); and(m) labeling as an outlier any sample whose t-distance is statistically different from the t distances obtained in step (h). (n) checking for changes in feedstock, chemical processes and/or instruments used to make or evaluate the sample whenever a sample has been labelled as an outlier. - View Dependent Claims (15)
-
-
16. A method of using a computing device to separate a set of multivariate data into a plurality of sub-parts, wherein each sub-part comprises at least one member selected from the group consisting of peak information, baseline shape, baseline offset, and noise comprising the steps of:
-
(a) performing an analysis on a sample to obtain a set of multivariate data which includes peak information; (b) calculating the values for the second derivative of the set of multivariate data obtained in step (a); (c) selecting a region in the set of multivariate data which is known to contain substantially no peak information; (d) averaging the values for the second derivative of the points in the region; (e) calculating a standard deviation for the values for the second derivative of the points in the region; and (f) defining any point whose second derivative is further than a preselected number of standard deviations from the average value for the second derivative in the region to be part of a peak; (g) removing the portions identified in step (f) from the set of multivariate data obtained in step (a); (h) replacing the points removed in step (g) from the set of multivariate data obtained in step (a), so that a first approximation of the baseline is formed; and (i) subtracting the first approximation of the baseline formed in step (h) from the set of multivariate data obtained in step (a), thereby forming a set of data comprising peak information. - View Dependent Claims (17, 18, 19, 20, 21, 22, 23)
-
-
24. In a method of conducting an analysis of a sample wherein a set of multivariate data characteristic of the sample is produced through physical manipulations of the sample, and this set of multivariate data is compared to multivariate data obtained from samples having known properties which were similarly manipulated, the improvement comprising:
- using a computing device to rapidly identify when problems exist in either the sample or the instrumentation by determining whether the set of multivariate data produced for the sample is within an expected range;
wherein the determination of whether the set of multivariate data produced for the sample is within an expected range is made by(a) obtaining a series of representative multivariate data sets; (b) creating a model of the series of multivariate data sets obtained in step (a); (c) creating individual residuals describing the portion of the multivariate data set obtained for each member of the calibration set which is not described by the model created in step (b); (d) creating an average residual by averaging the individual residuals created in step (c); (e) determining the distance between the individual residual for each member of the calibration set and the average residual; (f) performing the same type of physical manipulations as was performed to create the series of multivariate data sets obtained in step (a) on a sample, thereby obtaining an additional multivariate data set; (g) creating a residual describing the portion of the multivariate data set obtained in step (f) which is not described by the model created in step (b); (h) determining the distance between a residual obtained in step (g) and the average residual created in step (d); (i) labeling as an outlier any set of multivariate data whose distance obtained in step (h) is statistically different from the set of distances determined in step (e).
- using a computing device to rapidly identify when problems exist in either the sample or the instrumentation by determining whether the set of multivariate data produced for the sample is within an expected range;
-
25. A method of conducting an analysis of a sample comprising:
-
(A) physically manipulating the sample so that a set of multivariate data characteristic of the sample is produced; (B) using a computing device to determine whether the set of data produced in step (A) is an outlier; (C) if the set of data produced in step (A) is not an outlier, then estimating the properties of the sample by comparing the set of multivariate data produced in step (A) with multivariate data obtained under similar circumstances for samples having known properties; (D) if the set of data produced in step (A) is an outlier, then checking for changes in feedstock, chemical processes and/or instrumentation used to make or evaluate the sample; wherein step (B) is accomplished by (a) obtaining a series of representative multivariate data sets; (b) creating a model of the series of multivariate data sets obtained in step (a); (c) creating individual residuals describing the portion of the multivariate data set obtained for each member of the calibration set which is not described by the model created in step (b); (d) creating an average residual by averaging the individual residuals created in step (c); (e) determining the distance between the individual residual for each member of the calibration set and the average residual; (f) creating a residual describing the portion of the multivariate data set obtained in step (A) which is not described by the model created in step (b); (g) determining the distance between a residual obtained in step (f) and the average residual created in step (d); (h) labeling as an outlier any set of multivariate data whose distance obtained in step (g) is statistically different from the set of distances determined in step (e).
-
-
26. In a method of carrying out a chemical reaction wherein feedstocks are reacted under conditions sufficient to produce reaction products and wherein the reaction products are sampled and wherein a set of multivariate data describing the sample is produced, and the set of multivariate data is analyzed to ensure that the reaction products are within a desired range, the improvement comprising:
- automatically determining when the analysis is an outlier and checking for changes in the feedstock, reaction conditions, and/or the instrumentation used to perform the analysis whenever an outlier is determined;
wherein the automatic determination is accomplished by(a) obtaining a series of multivariate data sets representative of range of samples expected to be obtained; (b) creating a model of the series of multivariate data sets obtained in step (a); (c) creating individual residuals describing the portion of the multivariate data set obtained for each member of the calibration set which is not described by the model created in step (b); (d) creating an average residual by averaging the individual residuals created in step (c); (e) determining the distance between the individual residual for each member of the calibration set and the average residual; (f) creating a residual describing the portion of the multivariate data set obtained in step (A) which is not described by the model created in step (b); (g) determining the distance between a residual obtained in step (f) and the average residual created in step (d); (h) labeling as an outlier any set of multivariate data whose distance obtained in step (g) is statistically different from the set of distances determined in step (e).
- automatically determining when the analysis is an outlier and checking for changes in the feedstock, reaction conditions, and/or the instrumentation used to perform the analysis whenever an outlier is determined;
Specification