Orthogonal signal projection
First Claim
1. A method for concentration or property calibration of input data from samples of substances or matter, said calibration determining a filter model for further samples of the same substance or matter comprising to optionally transform, center, and scale the input data to provide a descriptor set (X) and a concentration or property set (y, Y), characterized in that the method removes information or systematic variation in the input data that is not correlated to the concentration or property set by providing the steps of:
- producing a descriptor weight set (w), which is normalized, by projecting the descriptor set (X) on the concentration or property set (y, Y), projecting the descriptor set (X) on the descriptor weight set (w) producing a descriptor score set (t), projecting the descriptor set (X) on the descriptor score set (t), producing a descriptor loading set (p), projecting the property set (y) on the descriptor score set (t), producing a property weight set (c), projecting the property set (y) on the property weight set (c), producing a property score set (u);
comparing the descriptor loading set (p) and the descriptor weight set (w), and their difference (p−
w), thus obtaining the part of the descriptor loading set (p) that is unrelated to the property set (y);
using said difference weight set (wortho), normalized, as a starting set for partial least squares analysis;
calculating the corresponding orthogonal descriptor score set (tortho) as the projection between the descriptor set (X) and said normalized orthogonal difference weight set (wortho), and calculating a corresponding orthogonal descriptor loading set (portho) as the projection of the descriptor set (X) onto the orthogonal descriptor score set (tortho);
removing the outer product of the orthogonal descriptor score set (tortho) and the orthogonal descriptor loading set (portho′
) from the descriptor set (X), thus providing residuals data (E), which is provided as the descriptor set (X) in a next latent variable component;
repeating the above steps for each orthogonal latent variable component;
filtering from the residuals data (E) strong systematic variation that can be bilinearly modeled as the outer product of the orthogonal descriptor score set and the orthogonal orthogonal descriptor loading set (portho′
) from the descriptor set (X), thus providing residuals data (E), which is provided as the descriptor set (X) in a next latent variable component;
repeating the above steps for each orthogonal latent variable component;
filtering from the residuals data (E) strong systematic variation that can be bilinearly modeled as the outer product of the orthogonal descriptor score set and the orthogonal descriptor loading set (Torth*Portho′
), thus providing an orthogonal descriptor set (Xortho) being orthogonal to the property set (y, Y);
optionally providing a principal component analysis (PCA) on the orthogonal descriptor set (Xortho), producing a bilinear decomposition of the orthogonal descriptor set (Xortho) as the outer product of the principal component analysis score set and the principal component analysis loading set and principal component analysis residuals (Tpcaortho*Ppcaortho′
+Epcaortho), adding the principal component analysis residuals data (Epcaortho) back into filtered residuals data (E);
filtering new data with the following steps;
projecting a new descriptor set (xnew′
) onto the normalized orthogonal difference weight set (wortho), thus producing a new orthogonal descriptor score set (tnewortho) and removing the product between the new orthogonal descriptor score set (tnewortho) and the orthogonal descriptor loading set (portho′
) from the new descriptor set (xnew′
), thus providing new residuals (enew′
), which are provided as a new descriptor set (xnew′
) in a next orthogonal component;
repeating said two filtering steps for new data for all estimated orthogonal components;
computing a new orthogonal descriptor set (xnewortho′
=tnewortho*Portho′
) as the outer product of the new orthogonal descriptor score set (tnewortho) and the orthogonal descriptor loading set (portho′
), computing a new orthogonal principal component score set (tnewpcaortho) from the projection of the new orthogonal descriptor set onto the principal component analysis loading set (xnewortho′
*Ppcaortho′
), whereby the new principal component analysis residuals formed (enewpcaortho=xnewortho′
−
tnewpcaortho* Ppcaortho′
) are added back into the new residuals (enew′
) if principal component analysis was used on the orthogonal descriptor set (Xortho), and only the outer product of the principal component analysis score sets and the principal components loading set (Tpcaortho*Ppcaortho′
) was removed from the original descriptor set (X); and
for multiple concentration or property sets (Y), calculating a principal component analysis model on said property sets (Y=TP′
+E) and repeating the above steps for each separate principal component analysis score set (t) and use the orthogonal descriptor (Xortho) as the input descriptor set (X) for each subsequent principal component analysis score set (t), thus making up said filtering model for filtering of further samples of the same type.
1 Assignment
0 Petitions
Accused Products
Abstract
The invention provides a method and an arrangement for filtering or pre-processing most any type of multivariate data exemplified by NIR or NMR spectra measured on samples in order to remove systematic noise such as base-line variation and multiplicative scatter effects. This is accomplished by differentiating the spectra to first or second derivatives, by Multiplicative Signal Correction (MSC), or by similar filtering methods. The pre-processing may, however, also remove information from the spectra, as well as other multiple measurement arrays, regarding (Y) (the response variables). Provided is a variant of PLS that can be used to achieve a signal correction that is as close to orthogonal as possible to a given (y) vector or (Y) matrix. Hence, ensuring that the signal correction removes as little information as possible regarding (Y). A filter according to the present invention is named Orthogonal Partial Least Squares (OPLS).
-
Citations
20 Claims
-
1. A method for concentration or property calibration of input data from samples of substances or matter, said calibration determining a filter model for further samples of the same substance or matter comprising to optionally transform, center, and scale the input data to provide a descriptor set (X) and a concentration or property set (y, Y), characterized in that the method removes information or systematic variation in the input data that is not correlated to the concentration or property set by providing the steps of:
-
producing a descriptor weight set (w), which is normalized, by projecting the descriptor set (X) on the concentration or property set (y, Y), projecting the descriptor set (X) on the descriptor weight set (w) producing a descriptor score set (t), projecting the descriptor set (X) on the descriptor score set (t), producing a descriptor loading set (p), projecting the property set (y) on the descriptor score set (t), producing a property weight set (c), projecting the property set (y) on the property weight set (c), producing a property score set (u);
comparing the descriptor loading set (p) and the descriptor weight set (w), and their difference (p−
w), thus obtaining the part of the descriptor loading set (p) that is unrelated to the property set (y);
using said difference weight set (wortho), normalized, as a starting set for partial least squares analysis;
calculating the corresponding orthogonal descriptor score set (tortho) as the projection between the descriptor set (X) and said normalized orthogonal difference weight set (wortho), and calculating a corresponding orthogonal descriptor loading set (portho) as the projection of the descriptor set (X) onto the orthogonal descriptor score set (tortho);
removing the outer product of the orthogonal descriptor score set (tortho) and the orthogonal descriptor loading set (portho′
) from the descriptor set (X), thus providing residuals data (E), which is provided as the descriptor set (X) in a next latent variable component;
repeating the above steps for each orthogonal latent variable component;
filtering from the residuals data (E) strong systematic variation that can be bilinearly modeled as the outer product of the orthogonal descriptor score set and the orthogonal orthogonal descriptor loading set (portho′
) from the descriptor set (X), thus providing residuals data (E), which is provided as the descriptor set (X) in a next latent variable component;
repeating the above steps for each orthogonal latent variable component;
filtering from the residuals data (E) strong systematic variation that can be bilinearly modeled as the outer product of the orthogonal descriptor score set and the orthogonal descriptor loading set (Torth*Portho′
), thus providing an orthogonal descriptor set (Xortho) being orthogonal to the property set (y, Y);
optionally providing a principal component analysis (PCA) on the orthogonal descriptor set (Xortho), producing a bilinear decomposition of the orthogonal descriptor set (Xortho) as the outer product of the principal component analysis score set and the principal component analysis loading set and principal component analysis residuals (Tpcaortho*Ppcaortho′
+Epcaortho), adding the principal component analysis residuals data (Epcaortho) back into filtered residuals data (E);
filtering new data with the following steps;
projecting a new descriptor set (xnew′
) onto the normalized orthogonal difference weight set (wortho), thus producing a new orthogonal descriptor score set (tnewortho) andremoving the product between the new orthogonal descriptor score set (tnewortho) and the orthogonal descriptor loading set (portho′
) from the new descriptor set (xnew′
), thus providing new residuals (enew′
), which are provided as a new descriptor set (xnew′
) in a next orthogonal component;
repeating said two filtering steps for new data for all estimated orthogonal components;
computing a new orthogonal descriptor set (xnewortho′
=tnewortho*Portho′
) as the outer product of the new orthogonal descriptor score set (tnewortho) and the orthogonal descriptor loading set (portho′
), computing a new orthogonal principal component score set (tnewpcaortho) from the projection of the new orthogonal descriptor set onto the principal component analysis loading set (xnewortho′
*Ppcaortho′
), whereby the new principal component analysis residuals formed (enewpcaortho=xnewortho′
−
tnewpcaortho* Ppcaortho′
) are added back into the new residuals (enew′
) if principal component analysis was used on the orthogonal descriptor set (Xortho), and only the outer product of the principal component analysis score sets and the principal components loading set (Tpcaortho*Ppcaortho′
) was removed from the original descriptor set (X); and
for multiple concentration or property sets (Y), calculating a principal component analysis model on said property sets (Y=TP′
+E) and repeating the above steps for each separate principal component analysis score set (t) and use the orthogonal descriptor (Xortho) as the input descriptor set (X) for each subsequent principal component analysis score set (t), thus making up said filtering model for filtering of further samples of the same type. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. An arrangement for concentration or property calibration of input data from samples of substances or matter, said calibration determining a filter model for further samples of the same substance or matter comprising to optionally transform, center, and scale the input data to provide a descriptor set (X) and a concentration or property set (y, Y), characterized in that said filter model removes information or systematic variation in the input data that is not correlated to the concentration or property set, comprising:
-
projecting means for producing a descriptor weight set (w), which is normalized, by projecting the descriptor set (X) on the concentration or property set (y, Y);
projecting means for the descriptor set (X) on the descriptor weight set (w) producing a descriptor score set (t);
projecting for the descriptor set (X) on the descriptor score set (t) producing a descriptor loading set (p);
projecting means for the property set (y) on the descriptor score set (t) producing a property weight set (c);
projecting means for the property set (y) on the property weight set (c) producing a property score set (u);
comparing means for the descriptor loading set (p) and the descriptor weight set (w), and their difference (p−
w), thus obtaining the part of the descriptor loading set (p) that is unrelated to the property set (y, Y);
using said difference weight set (wortho), normalized, as a starting set for partial least squares analysis;
first calculating means for the corresponding orthogonal descriptor score set (tortho) as the projection between the descriptor set (X) and said normalized orthogonal difference weight set (wortho), and for calculating a corresponding orthogonal descriptor loading set (portho) as the projection of the descriptor set (X) onto the orthogonal descriptor score set (tortho);
second calculating means for removing the outer product of the orthogonal descriptor score set (tortho) and the orthogonal descriptor loading set (portho′
) from the descriptor set (X), thus pr residuals data (E), which is provided as the descriptor set (X) in a next component;
the above means being repeatedly used for each orthogonal latent variable component;
first filtering means for the residuals data (E) from strong systematic variation that can be bilinearly modeled as the outer product of the orthogonal descriptor score set and the orthogonal descriptor loading set (Tortho*Portho′
), thus providing an orthogonal descriptor set (Xortho) being orthogonal to the property set (y, Y);
optionally providing analyzing means for a principal component analysis (PCA) on the orthogonal descriptor set (Xortho), producing a bilinear decomposition of the orthogonal descriptor set (Xortho) as the outer product of the principal component analysis score set and the principal component analysis loading set and principal component analysis residuals (Tpcaortho*Ppcaortho′
+Epcaortho), adding the principal component analysis residuals data (Epcaortho) back into filtered residuals data (E);
second filtering means for new data including;
projecting means for a new descriptor set (xnew′
) onto the normalized orthogonal difference weight set (wortho), thus producing a new orthogonal descriptor score set (tnewortho); and
calculating means for removing the product between the new orthogonal descriptor score set (tnewortho) and the orthogonal descriptor loading set (portho′
) from the new descriptor set (xnew′
), thus providing new residuals (enew′
), which are provided as a new descriptor set (xnew′
) in a next orthogonal component;
said second filtering means of new data being repeatedly for estimated orthogonal components;
calculating means for a new orthogonal descriptor set (xnewortho′
=tnewortho*Portho′
) as the outer product of the new orthogonal descriptor score set (tnewortho) and the orthogonal descriptor loading set (portho′
), calculating a new orthogonal principal component score set (tnewpcaortho) from the projection of the new orthogonal descriptor set onto the principal component analysis loading set (xnewortho′
* Ppcaortho′
), whereby the new principal component analysis residuals formed (enewpcaortho=xnewortho′
−
tnewpcaortho*Ppcaortho′
) are added back into the new residuals (enew′
) if principal component analysis was used on the orthogonal descriptor set (Xortho), and only removing the outer product of the principal component analysis score sets and the principal components loading set (Tpcaortho*Ppcaortho′
) was removed from the original descriptor set (X); and
for multiple concentration or property sets (Y), means for calculating a principal component analysis model on said property sets (Y=TP′
+E) and repeatedly using the above means for each separate principal component analysis score set (t) and using the orthogonal descriptor (Xortho) as the input descriptor set (X) for each subsequent principal component analysis score set (t), thus making up said filtering model for filtering of further samples of the same type. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
-
Specification