Orthogonal signal projection

US 6,853,923 B2
Filed: 02/22/2001
Issued: 02/08/2005
Est. Priority Date: 02/22/2000
Status: Expired due to Fees

First Claim

Patent Images

1. A method for concentration or property calibration of input data from samples of substances or matter, said calibration determining a filter model for further samples of the same substance or matter comprising to optionally transform, center, and scale the input data to provide a descriptor set (X) and a concentration or property set (y, Y), characterized in that the method removes information or systematic variation in the input data that is not correlated to the concentration or property set by providing the steps of:

producing a descriptor weight set (w), which is normalized, by projecting the descriptor set (X) on the concentration or property set (y, Y), projecting the descriptor set (X) on the descriptor weight set (w) producing a descriptor score set (t), projecting the descriptor set (X) on the descriptor score set (t), producing a descriptor loading set (p), projecting the property set (y) on the descriptor score set (t), producing a property weight set (c), projecting the property set (y) on the property weight set (c), producing a property score set (u);

comparing the descriptor loading set (p) and the descriptor weight set (w), and their difference (p−

w), thus obtaining the part of the descriptor loading set (p) that is unrelated to the property set (y);

using said difference weight set (wortho), normalized, as a starting set for partial least squares analysis;

calculating the corresponding orthogonal descriptor score set (tortho) as the projection between the descriptor set (X) and said normalized orthogonal difference weight set (wortho), and calculating a corresponding orthogonal descriptor loading set (portho) as the projection of the descriptor set (X) onto the orthogonal descriptor score set (tortho);

removing the outer product of the orthogonal descriptor score set (tortho) and the orthogonal descriptor loading set (portho′

) from the descriptor set (X), thus providing residuals data (E), which is provided as the descriptor set (X) in a next latent variable component;

repeating the above steps for each orthogonal latent variable component;

filtering from the residuals data (E) strong systematic variation that can be bilinearly modeled as the outer product of the orthogonal descriptor score set and the orthogonal orthogonal descriptor loading set (portho′

) from the descriptor set (X), thus providing residuals data (E), which is provided as the descriptor set (X) in a next latent variable component;

repeating the above steps for each orthogonal latent variable component;

filtering from the residuals data (E) strong systematic variation that can be bilinearly modeled as the outer product of the orthogonal descriptor score set and the orthogonal descriptor loading set (Torth*Portho′

), thus providing an orthogonal descriptor set (Xortho) being orthogonal to the property set (y, Y);

optionally providing a principal component analysis (PCA) on the orthogonal descriptor set (Xortho), producing a bilinear decomposition of the orthogonal descriptor set (Xortho) as the outer product of the principal component analysis score set and the principal component analysis loading set and principal component analysis residuals (Tpcaortho*Ppcaortho′

+Epcaortho), adding the principal component analysis residuals data (Epcaortho) back into filtered residuals data (E);

filtering new data with the following steps;

projecting a new descriptor set (xnew′

) onto the normalized orthogonal difference weight set (wortho), thus producing a new orthogonal descriptor score set (tnewortho) and removing the product between the new orthogonal descriptor score set (tnewortho) and the orthogonal descriptor loading set (portho′

) from the new descriptor set (xnew′

), thus providing new residuals (enew′

), which are provided as a new descriptor set (xnew′

) in a next orthogonal component;

repeating said two filtering steps for new data for all estimated orthogonal components;

computing a new orthogonal descriptor set (xnewortho′

=tnewortho*Portho′

) as the outer product of the new orthogonal descriptor score set (tnewortho) and the orthogonal descriptor loading set (portho′

), computing a new orthogonal principal component score set (tnewpcaortho) from the projection of the new orthogonal descriptor set onto the principal component analysis loading set (xnewortho′

*Ppcaortho′

), whereby the new principal component analysis residuals formed (enewpcaortho=xnewortho′

−

tnewpcaortho* Ppcaortho′

) are added back into the new residuals (enew′

) if principal component analysis was used on the orthogonal descriptor set (Xortho), and only the outer product of the principal component analysis score sets and the principal components loading set (Tpcaortho*Ppcaortho′

) was removed from the original descriptor set (X); and

for multiple concentration or property sets (Y), calculating a principal component analysis model on said property sets (Y=TP′

+E) and repeating the above steps for each separate principal component analysis score set (t) and use the orthogonal descriptor (X_ortho) as the input descriptor set (X) for each subsequent principal component analysis score set (t), thus making up said filtering model for filtering of further samples of the same type.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The invention provides a method and an arrangement for filtering or pre-processing most any type of multivariate data exemplified by NIR or NMR spectra measured on samples in order to remove systematic noise such as base-line variation and multiplicative scatter effects. This is accomplished by differentiating the spectra to first or second derivatives, by Multiplicative Signal Correction (MSC), or by similar filtering methods. The pre-processing may, however, also remove information from the spectra, as well as other multiple measurement arrays, regarding (Y) (the response variables). Provided is a variant of PLS that can be used to achieve a signal correction that is as close to orthogonal as possible to a given (y) vector or (Y) matrix. Hence, ensuring that the signal correction removes as little information as possible regarding (Y). A filter according to the present invention is named Orthogonal Partial Least Squares (OPLS).

Citations

20 Claims

1. A method for concentration or property calibration of input data from samples of substances or matter, said calibration determining a filter model for further samples of the same substance or matter comprising to optionally transform, center, and scale the input data to provide a descriptor set (X) and a concentration or property set (y, Y), characterized in that the method removes information or systematic variation in the input data that is not correlated to the concentration or property set by providing the steps of:
- producing a descriptor weight set (w), which is normalized, by projecting the descriptor set (X) on the concentration or property set (y, Y), projecting the descriptor set (X) on the descriptor weight set (w) producing a descriptor score set (t), projecting the descriptor set (X) on the descriptor score set (t), producing a descriptor loading set (p), projecting the property set (y) on the descriptor score set (t), producing a property weight set (c), projecting the property set (y) on the property weight set (c), producing a property score set (u);
  
  comparing the descriptor loading set (p) and the descriptor weight set (w), and their difference (p−
  
  w), thus obtaining the part of the descriptor loading set (p) that is unrelated to the property set (y);
  
  using said difference weight set (wortho), normalized, as a starting set for partial least squares analysis;
  
  calculating the corresponding orthogonal descriptor score set (tortho) as the projection between the descriptor set (X) and said normalized orthogonal difference weight set (wortho), and calculating a corresponding orthogonal descriptor loading set (portho) as the projection of the descriptor set (X) onto the orthogonal descriptor score set (tortho);
  
  removing the outer product of the orthogonal descriptor score set (tortho) and the orthogonal descriptor loading set (portho′
  
  ) from the descriptor set (X), thus providing residuals data (E), which is provided as the descriptor set (X) in a next latent variable component;
  
  repeating the above steps for each orthogonal latent variable component;
  
  filtering from the residuals data (E) strong systematic variation that can be bilinearly modeled as the outer product of the orthogonal descriptor score set and the orthogonal orthogonal descriptor loading set (portho′
  
  ) from the descriptor set (X), thus providing residuals data (E), which is provided as the descriptor set (X) in a next latent variable component;
  
  repeating the above steps for each orthogonal latent variable component;
  
  filtering from the residuals data (E) strong systematic variation that can be bilinearly modeled as the outer product of the orthogonal descriptor score set and the orthogonal descriptor loading set (Torth*Portho′
  
  ), thus providing an orthogonal descriptor set (Xortho) being orthogonal to the property set (y, Y);
  
  optionally providing a principal component analysis (PCA) on the orthogonal descriptor set (Xortho), producing a bilinear decomposition of the orthogonal descriptor set (Xortho) as the outer product of the principal component analysis score set and the principal component analysis loading set and principal component analysis residuals (Tpcaortho*Ppcaortho′
  
  +Epcaortho), adding the principal component analysis residuals data (Epcaortho) back into filtered residuals data (E);
  
  filtering new data with the following steps;
  
  projecting a new descriptor set (xnew′
  
  ) onto the normalized orthogonal difference weight set (wortho), thus producing a new orthogonal descriptor score set (tnewortho) and removing the product between the new orthogonal descriptor score set (tnewortho) and the orthogonal descriptor loading set (portho′
  
  ) from the new descriptor set (xnew′
  
  ), thus providing new residuals (enew′
  
  ), which are provided as a new descriptor set (xnew′
  
  ) in a next orthogonal component;
  
  repeating said two filtering steps for new data for all estimated orthogonal components;
  
  computing a new orthogonal descriptor set (xnewortho′
  
  =tnewortho*Portho′
  
  ) as the outer product of the new orthogonal descriptor score set (tnewortho) and the orthogonal descriptor loading set (portho′
  
  ), computing a new orthogonal principal component score set (tnewpcaortho) from the projection of the new orthogonal descriptor set onto the principal component analysis loading set (xnewortho′
  
  *Ppcaortho′
  
  ), whereby the new principal component analysis residuals formed (enewpcaortho=xnewortho′
  
  −
  
  tnewpcaortho* Ppcaortho′
  
  ) are added back into the new residuals (enew′
  
  ) if principal component analysis was used on the orthogonal descriptor set (Xortho), and only the outer product of the principal component analysis score sets and the principal components loading set (Tpcaortho*Ppcaortho′
  
  ) was removed from the original descriptor set (X); and
  
  for multiple concentration or property sets (Y), calculating a principal component analysis model on said property sets (Y=TP′
  
  +E) and repeating the above steps for each separate principal component analysis score set (t) and use the orthogonal descriptor (X_ortho) as the input descriptor set (X) for each subsequent principal component analysis score set (t), thus making up said filtering model for filtering of further samples of the same type.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. A method according to claim 1, characterized in that:
    - performing an ordinary PLS analysis with the filtered residuals data (E) and the concentration or property set (y, Y);
      
      performing an ordinary PLS analysis with said filtered new residuals set (enew′
      
      ) as prediction set.
  - 3. A method according to claim 2, characterized in that by finding said orthogonal components for each component seperately an amount of disturbing variation in each partial least square component can be analyzed.
  - 4. A method according to claim 2, characterized in that the method uses crossvalidation and/or eigenvalue criteria for reducing overfitting.
  - 5. A method according to claim 2, characterized in that said principal component analysis (PCA) components are chosen acordidng to a crossvalidation or eigenvalue criteria.
  - 6. A method according to claim 2, characterized in that the method includes a step to remove specific types of variation in the descriptor set (X), when an unwanted or non-relevant concentration or property set (y) or (Y) exist by using the orthogonal descriptor (X_ortho) as a data set of interest, as it contains no correlated variation to the concentration or property set (y, Y).
  - 7. A method according to claim 1, characterized in that by finding said orthogonal components for each component separately an amount of disturbing variation in each partial least square component can be analyzed.
  - 8. A method according to claim 1, characterized in that the method uses crossvalidation and/or eigenvalue criteria for reducing overfitting.
  - 9. A method according to claim 1, characterized in that said principal component analysis (PCA) components are chosen according to a crossvalidation or eigenvalue criteria.
  - 10. A method according to claim 1, characterized in that the method includes a step to remove specific types of variation in the descriptor set (X), when an unwanted or non-relevant concentration or property set (y) or (Y) exist by using the orthogonal descriptor (X_ortho) as a data set of interest, as it contains no correlated variation to the concentration or property set (y, Y).

11. An arrangement for concentration or property calibration of input data from samples of substances or matter, said calibration determining a filter model for further samples of the same substance or matter comprising to optionally transform, center, and scale the input data to provide a descriptor set (X) and a concentration or property set (y, Y), characterized in that said filter model removes information or systematic variation in the input data that is not correlated to the concentration or property set, comprising:
- projecting means for producing a descriptor weight set (w), which is normalized, by projecting the descriptor set (X) on the concentration or property set (y, Y);
  
  projecting means for the descriptor set (X) on the descriptor weight set (w) producing a descriptor score set (t);
  
  projecting for the descriptor set (X) on the descriptor score set (t) producing a descriptor loading set (p);
  
  projecting means for the property set (y) on the descriptor score set (t) producing a property weight set (c);
  
  projecting means for the property set (y) on the property weight set (c) producing a property score set (u);
  
  comparing means for the descriptor loading set (p) and the descriptor weight set (w), and their difference (p−
  
  w), thus obtaining the part of the descriptor loading set (p) that is unrelated to the property set (y, Y);
  
  using said difference weight set (wortho), normalized, as a starting set for partial least squares analysis;
  
  first calculating means for the corresponding orthogonal descriptor score set (tortho) as the projection between the descriptor set (X) and said normalized orthogonal difference weight set (wortho), and for calculating a corresponding orthogonal descriptor loading set (portho) as the projection of the descriptor set (X) onto the orthogonal descriptor score set (tortho);
  
  second calculating means for removing the outer product of the orthogonal descriptor score set (tortho) and the orthogonal descriptor loading set (portho′
  
  ) from the descriptor set (X), thus pr residuals data (E), which is provided as the descriptor set (X) in a next component;
  
  the above means being repeatedly used for each orthogonal latent variable component;
  
  first filtering means for the residuals data (E) from strong systematic variation that can be bilinearly modeled as the outer product of the orthogonal descriptor score set and the orthogonal descriptor loading set (Tortho*Portho′
  
  ), thus providing an orthogonal descriptor set (Xortho) being orthogonal to the property set (y, Y);
  
  optionally providing analyzing means for a principal component analysis (PCA) on the orthogonal descriptor set (Xortho), producing a bilinear decomposition of the orthogonal descriptor set (Xortho) as the outer product of the principal component analysis score set and the principal component analysis loading set and principal component analysis residuals (Tpcaortho*Ppcaortho′
  
  +Epcaortho), adding the principal component analysis residuals data (Epcaortho) back into filtered residuals data (E);
  
  second filtering means for new data including;
  
  projecting means for a new descriptor set (xnew′
  
  ) onto the normalized orthogonal difference weight set (wortho), thus producing a new orthogonal descriptor score set (tnewortho); and
  
  calculating means for removing the product between the new orthogonal descriptor score set (tnewortho) and the orthogonal descriptor loading set (portho′
  
  ) from the new descriptor set (xnew′
  
  ), thus providing new residuals (enew′
  
  ), which are provided as a new descriptor set (xnew′
  
  ) in a next orthogonal component;
  
  said second filtering means of new data being repeatedly for estimated orthogonal components;
  
  calculating means for a new orthogonal descriptor set (xnewortho′
  
  =tnewortho*Portho′
  
  ) as the outer product of the new orthogonal descriptor score set (tnewortho) and the orthogonal descriptor loading set (portho′
  
  ), calculating a new orthogonal principal component score set (tnewpcaortho) from the projection of the new orthogonal descriptor set onto the principal component analysis loading set (xnewortho′
  
  * Ppcaortho′
  
  ), whereby the new principal component analysis residuals formed (enewpcaortho=xnewortho′
  
  −
  
  tnewpcaortho*Ppcaortho′
  
  ) are added back into the new residuals (enew′
  
  ) if principal component analysis was used on the orthogonal descriptor set (Xortho), and only removing the outer product of the principal component analysis score sets and the principal components loading set (Tpcaortho*Ppcaortho′
  
  ) was removed from the original descriptor set (X); and
  
  for multiple concentration or property sets (Y), means for calculating a principal component analysis model on said property sets (Y=TP′
  
  +E) and repeatedly using the above means for each separate principal component analysis score set (t) and using the orthogonal descriptor (X_ortho) as the input descriptor set (X) for each subsequent principal component analysis score set (t), thus making up said filtering model for filtering of further samples of the same type.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
- - 12. An arrangement according to claim 11, characterized in that:
    - partial least square analysis means for the filtered residuals data (E) and the concentration or property set (y, Y), and for said filtered new residuals set (enew′
      
      ) as prediction set.
  - 13. An arrangement according to claim 12, characterized in that by finding said orthogonal components for each component separately an amount of disturbing variation in each partial least square component can be analyzed by said analyzing means.
  - 14. An arrangement according to claim 12, characterized in that the arrangement uses crossvalidation and/or eigenvalue criteria for reducing overfitting.
  - 15. An arrangement according to claim 12, characterized in that said principal component analysis (PCA) components are chosen according to an crosssvalidation or eigenvalue criteria by said analyzing means.
  - 16. An arrangement according to claim 12, characterized in that the arrangement removes specific types of variation in the descriptor set (X), when an unwanted or non-relevant concentration or property set (y) exist by using the orthogonal descriptor (X_ortho) as a data set of interest, as it contains no correlated variation to the concentration or property set (y,Y).
  - 17. An arrangement according to claim 11, characterized in that by finding said orthogonal components for each component separately an amount of disturbing variation in each partial least square component can be analyzed by said analyzing means.
  - 18. An arrangement according to claim 11, characterized in that the arrangement uses crossvalidation and/or eigenvalue criteria for reducing overfitting.
  - 19. An arrangement according to claim 11, characterized in that said principal component analysis (PCA) components are chosen according to an crossvalidation or eigenvalue criteria by said analyzing means.
  - 20. An arrangement according to claim 11, characterized in that the arrangement removes specific types of variation in the descriptor set (X), when an unwanted or non-relevant concentration or property set (y) exist by using the orthogonal descriptor (X_ortho) as a data set of interest, as it contains no correlated variation to the concentration or property set (y, Y).

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Umetrics AB (Sartorius AG)
Original Assignee
Umetrics AB (Sartorius AG)
Inventors
Trygg, Johan, Wold, Svante
Primary Examiner(s)
Hoff, Marc S.
Assistant Examiner(s)
Raymond, Edward

Application Number

US10/204,646
Publication Number

US 20030200040A1
Time in Patent Office

1,447 Days
Field of Search

702/23, 702/19, 702/27, 702/30, 702/32, 702/86, 702/190, 703/12, 436/8, 436/43, 436/55
US Class Current

702/23
CPC Class Codes

G01N 21/3563   for analysing solids; Prepa...

G01N 21/3577   for analysing liquids, e.g....

G01N 21/359   using near infrared light

G06F 17/18   for evaluating statistical ...

Y10T 436/10   Composition for standardiza...

Orthogonal signal projection

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Orthogonal signal projection

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links