Model for spectral and chromatographic data

US 6,487,523 B2
Filed: 05/25/2001
Issued: 11/26/2002
Est. Priority Date: 04/07/1999
Status: Expired due to Fees

First Claim

Patent Images

1. A method of determining whether a sample matches a reference species, the method comprising:

selecting N indices l₁, l₂, . . . l_Nof peaks in an indexed data set characterizing the reference species;

selecting a first set of probabilities p₁, p₂, . . . p_Nthat peaks will occur at indices l₁, l₂, . . . l_N, respectively, of an indexed data set that characterizes the sample when the sample matches the reference species;

selecting a second set of probabilities q₁, q₂, . . . q_Nthat peaks will occur at indices l₁, l₂, . . . l_N,respectively, of an indexed data set that characterizes the sample when the sample does not match the reference species;

choosing a threshold K_c;

obtaining an indexed observation data set x₁, x₂, . . . x_N, where x_j∈

{0, 1} and x_j=1 if and only if a peak is present in the sample at l_j;

deciding that the sample matches the reference species if λ

≦

K_cwhere $λ = \sum_{1 \leq j \leq N} \log (\frac{1 - p_{j}}{1 - q_{j}}) + \sum_{1 \leq j \leq N} x_{j} \log [\frac{p_{j} (1 - q_{j})}{q_{j} (1 - p_{j})}];$ anddeciding that the sample does not match the reference species if λ

>

K_c.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and apparatus using a spectral analysis technique are disclosed. In one form of the invention, probabilities are selected to characterize the presence (and in another form, also a quantification of a characteristic) of peaks in an indexed data set for samples that match a reference species, and other probabilities are selected for samples that do not match the reference species. An indexed data set is acquired for a sample, and a determination is made according to techniques exemplified herein as to whether the sample matches or does not match the reference species. When quantification of peak characteristics is undertaken, the model is appropriately expanded, and the analysis accounts for the characteristic model and data. Further techniques are provided to apply the methods and apparatuses to process control, cluster analysis, hypothesis testing, analysis of variance, and other procedures involving multiple comparisons of indexed data.

Citations

29 Claims

1. A method of determining whether a sample matches a reference species, the method comprising:
- selecting N indices l₁, l₂, . . . l_Nof peaks in an indexed data set characterizing the reference species;
  
  selecting a first set of probabilities p₁, p₂, . . . p_Nthat peaks will occur at indices l₁, l₂, . . . l_N, respectively, of an indexed data set that characterizes the sample when the sample matches the reference species;
  
  selecting a second set of probabilities q₁, q₂, . . . q_Nthat peaks will occur at indices l₁, l₂, . . . l_N,respectively, of an indexed data set that characterizes the sample when the sample does not match the reference species;
  
  choosing a threshold K_c;
  
  obtaining an indexed observation data set x₁, x₂, . . . x_N, where x_j∈
  
  {0, 1} and x_j=1 if and only if a peak is present in the sample at l_j;
  
  deciding that the sample matches the reference species if λ
  
  ≦
  
  K_cwhere $λ = \sum_{1 \leq j \leq N} \log (\frac{1 - p_{j}}{1 - q_{j}}) + \sum_{1 \leq j \leq N} x_{j} \log [\frac{p_{j} (1 - q_{j})}{q_{j} (1 - p_{j})}];$ anddeciding that the sample does not match the reference species if λ
  
  >
  
  K_c.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method of claim 1, wherein K_cis selected such that, given that the sample matches the reference species, P{λ
    - >
      
      K_c}≦
      
      α
      
      for a predetermined type I error probability α
      
      .
  - 3. The method of claim 1, wherein said selecting steps comprise iterative proportional scaling calculations.
  - 4. The method of claim 1, wherein said selecting steps comprise iterative weighted least squares calculations.
  - 5. The method of claim 1, wherein said selecting steps comprise application of a Lancaster model.
  - 6. The method of claim 1, wherein said selecting steps comprise application of a latent class model.

7. A method of detelaring whether a sample matches a reference species, the method comprising:
- selecting N indices l₁, l₂, . . . l_Nof peaks in an indexed data set characterizing the reference species;
  
  selecting a first set of probabilities p₁, p₂, . . . p_Nthat peaks will occur at indices l₁, l₂, . . . l_Nof an indexed data set that characterizes the sample when the sample matches the reference species;
  
  selecting a first set of probability density functions g_i(y_i;
  
  θ
  
  _i) that characterize a measurable feature y_iof the peak at index l_igiven the presence of a peak at index l_iof a data set that characterizes the sample when the sample matches the reference species;
  
  selecting a second set of probabilities q₁, q₂, . . . q_Nthat peaks will occur at indices l₁, l₂, . . . l_Nof an indexed data set that characterizes the sample when the sample does not match the reference species;
  
  selecting a second set of probability density functions g_i(y_i;
  
  Ω
  
  _i) that characterize the measurable feature y_iof the peak at index l_igiven the presence of a peak at index l_iof a data set that characterizes the sample when the sample does not match the reference species;
  
  selecting a threshold K_c;
  
  obtaining an indexed observation data set x₁, x₂, . . . x_Nwhere x_i∈
  
  {0, 1} and x_i=1 if and only if a peak is present in the sample at l_i;
  
  obtaining a feature data set y_i, y₂, . . . y_N; and
  
  deciding that the sample matches the reference species if λ
  
  ≦
  
  K_cwhere $λ = \sum_{i = 1}^{N} [\log \frac{1 - p_{i}}{1 - q_{i}} + x_{i} {\log \frac{p_{i} (1 - q_{i})}{q_{i} (1 - p_{i})} + \log \frac{g_{i} (y_{i}; θ_{i})}{g_{i} (y_{i}; Ω_{i})}}];$ anddeciding that the sample does not match the reference species if λ
  
  >
  
  K_c.
- View Dependent Claims (8, 9, 10, 11, 12, 13)
- - 8. The method of claim 7, wherein one or more g_i(·
    - ) is a lognormal density given by $g_{i} (y_{i}; θ_{i}) = g_{i} (y_{i}; μ_{i}, σ_{i}^{2}) = \frac{1}{y_{i} \sqrt{2 {πσ}^{2}}} \exp {- \frac{{(\log y_{i} - μ_{i})}^{2}}{2 σ_{i}^{2}}}, y_{i} \geq 0.$
  - 9. The method of claim 7, wherein one or more g_i(·
    - ) is a gamma density given by $g_{i} (y_{i}; θ_{i}) = g_{i} (y_{i}; α_{i}, β_{i}) = \frac{1}{Γ_{α_{i}} β_{i}^{α_{i}}} y_{i}^{α_{i} - 1} \exp (- y_{i} / β_{i}), y_{i} \geq 0.$
  - 10. The method of claim 7, wherein one or more g_i(·
    - ) is a Poisson density given by $g_{i} (y_{i}; θ_{i}) = \frac{θ^{y_{i}} \exp (- θ_{i})}{y_{i}!}, y_{i} = 0, 1, 2, \dots .$
  - 11. The method of claim 7, wherein the measurable feature is the intensity of the peak at index l_i.
  - 12. The method of claim 7, wherein the measurable feature is the width of the peak at index l_i.
  - 13. The method of claim 7, wherein the measurable feature is a quantification of the skew of the peak at index l_i.

14. A method, wherein the status of a process at any point t in time is characterized by an indexed observation data set X_t={x_1,t, x_2,t, . . . x_N,t}, where x_j,t∈
- {0, 1} and x_j,t=1 if and only if a peak is present at time t in the sample at index l_j, the method comprising;
  
  selecting a first set of probabilities p₁, p₂, . . . p_Nthat peaks will occur at x_1,t, x_2,t, . . . x_N,t, respectively, when the process is operating normally;
  
  selecting a second set of probabilities q₁, q₂, . . . q_Nthat peaks will occur at x_1,t, x_2,t, . . . x_N,t, respectively, when the process is not operating normally;
  
  acquiring a sequence X₁, X₂, . . . X_Tof indexed observation data sets;
  
  intervening in the process when it is determined that C_nequals or exceeds a predetermined value A, where
- View Dependent Claims (15, 16)
- - 15. The method of claim 14, wherein A is selected as a function of the desired false alarm rate for the test.
  - 16. The method of claim 14, wherein said intervening comprises stopping the process.

17. A method,wherein the status of a process at any point t in time is characterized by an indexed observation data set X_t={x_1,t, x_2,t, . . . x_N,t}, where x_j,t∈
- {0, 1} and x_j,t=1 if and only if a peak is present at time t in the sample at index l_j, and a feature data set Y_t={y_1,t, y_2,t, . . . y_N,t}, where if x_j,t=0, y_j,t=0, and if x_j,t=1, y_j,tquantifies a feature of the peak at time t in the sample at index l_j, the method comprising;
  
  selecting a first set of probabilities p₁, p₂, . . . p_Nthat peaks will occur at x_1,t, x_2,t, . . . x_N,t, respectively, when the process is operating normally;
  
  selecting a first set of probability density functions g_i(y_i;
  
  θ
  
  _i) that characterize a measurable feature y_iof the peak at index l_igiven the presence of a peak at index l_iof a data set that characterizes the process when it is operating normally;
  
  selecting a second set of probabilities q₁, q₂, . . . q_Nthat peaks will occur at x_1,t, x_2,t, . . . x_N,t, respectively, when the process is not operating normally;
  
  selecting a second set of probability density functions g_i(y_i;
  
  Ω
  
  _i) that characterize the measurable feature y_iof the peak at index l_igiven the presence of a peak at index l_iof a data set that characterizes the process when it is operating normally;
  
  acquiring a sequence X₁, X₂, . . . X_Tof indexed observation data sets;
  
  acquiring a sequence Y₁, Y₂, . . . Y_Tof feature data sets;
  
  intervening in the process when it is determined that C_nequals or exceeds a predetermined value A, where
- View Dependent Claims (18, 19, 20, 21, 22, 23, 24, 25)
- - 18. The method of claim 17, wherein one or more g_i(·
    - ) is a lognormal density function given by $g_{i} (y_{i}; θ_{i}) = g_{i} (y_{i}; μ_{i}, σ_{i}^{2}) = \frac{1}{y_{i} \sqrt{2 {πσ}^{2}}} \exp {- \frac{{(\log y_{i} - μ_{i})}^{2}}{2 σ_{i}^{2}}}, y_{i} \geq 0.$
  - 19. The method of claim 17, wherein one or more g_i(·
    - ) is a gamma density given by $g_{i} (y_{i}; θ_{i}) = g_{i} (y_{i}; α_{i}, β_{i}^{2}) = \frac{1}{Γ_{α_{i}} β_{i}^{α_{i}}} y_{i}^{α_{i} - 1} \exp (- y_{i} / β_{i}), y_{i} \geq 0.$
  - 20. The method of claim 17, wherein one or more g_i(·
    - ) is a Poisson density given by $g_{i} (y_{i}; θ_{i}) = \frac{θ^{y_{i}} \exp (- θ_{i})}{y_{i}!}, y_{i} = 0, 1, 2, \dots .$
  - 21. The method of claim 17, wherein the measurable feature is the intensity of the peak at index l_i.
  - 22. The method of claim 17, wherein the measurable feature is the width of the peak at index l_i.
  - 23. The method of claim 17, wherein the measurable feature is a quantification of the skew of the peak at index l_i.
  - 24. The method of claim 17, wherein A is selected as a function of the desired false alarm rate for the test.
  - 25. The method of claim 17, wherein said intervening comprises stopping the process.

26. A system for analyzing a sample in comparison with a reference species, comprising:
- a processor;
  
  a memory storing data indicative of;
  
  probabilities p₁, p₂, . . . p_Nthat peaks will occur at indices l₁, l₂, . . . l_Nof an indexed data set that characterizes the sample when the sample matches the reference species;
  
  probabilities q₁, q₂, . . . q_Nthat peaks will occur at indices l₁, l₂, . . . l_Nof an indexed data set that characterizes the sample when the sample does not match the reference species;
  
  a threshold value; and
  
  an indexed sample data set x₁, x₂, . . . x_Ncharacterizing the sample, wherein each x_iis a binary value that indicates whether or not a peak is present at index l_i; and
  
  a computer-readable medium encoded with programming instructions executable by said processor to;
  
  calculate a log-likelihood ratio λ
  
  , where $λ = \sum_{1 \leq j \leq N} \log (\frac{1 - p}{1 - q_{j}}) + \sum_{1 \leq j \leq N} x_{j} \log [\frac{p_{j} (1 - q_{j})}{q_{j} (1 - p_{j})}];$ generate a first signal when λ
  
  is less than said threshold value; and
  
  generate a second signal when λ
  
  is greater than said threshold value.

27. A method of performing discriminant analysis, the method comprising:
- selecting N indices l₁, l₂, . . . l_Nof peaks in an indexed data set characterizing a first reference species or a second reference species;
  
  selecting a first set of probabilities p_1,1, p_2,1, . . . p_N,1that peaks will occur at indices l₁, l₂, . . . l_N, respectively, of an indexed data set that characterizes the sample when the sample matches the first reference species;
  
  selecting a second set of probabilities p_1,2, p_2,2, . . . p_N,2that peaks will occur at indices l₁, l₂, . . . l_N, respectively, of an indexed data set that characterizes the sample when the sample matches the second reference species;
  
  selecting a third set of probabilities q_1,1, q_2,1, . . . q_N,1that peaks will occur at indices l₁, l₂, . . . l_N, respectively, of an indexed data set that characterizes the sample when the sample matches a second reference species;
  
  selecting a fourth set of probabilities q_1,2, q_2,2, . . . q_N,2that peaks will occur at indices l₁, l₂, . . . l_N, respectively, of an indexed data set that characterizes the sample when the sample matches a second reference species;
  
  obtaining an indexed observation data set x₁, x₂, . . . x_N, where x_j∈
  
  {0, 1} and x_j=1 if and only if a peak is present in the sample at l_j;
  
  calculating $λ_{1} = \sum_{1 \leq j \leq N} \log (\frac{1 - p_{j, 1}}{1 - q_{j, 1}}) + \sum_{1 \leq j \leq N} x_{j} \log [\frac{p_{j, 1} (1 - q_{j, 1})}{q_{j, 1} (1 - p_{j, 1})}] and$ $λ_{2} = \sum_{1 \leq j \leq N} \log (\frac{1 - p_{j, 2}}{1 - q_{j, 2}}) + \sum_{1 \leq j \leq N} x_{j} \log [\frac{p_{j, 2} (1 - q_{j, 2})}{q_{j, 2} (1 - p_{j, 2})}]; and$ deciding that the sample matches the first reference species if λ
  
  ₁≦
  
  λ
  
  ₂; and
  
  the sample matches the second reference species if λ
  
  ₁>
  
  λ
  
  ₂.

28. A method of performing a cluster analysis of M samples, comprising:
- selecting N indices l₁, l₂, .. l_Nof possible peak locations in indexed data sets characterizing the M samples;
  
  obtaining indexed data sets X_i={x_1,i, x_2,i, . . . x_N,i};
  
  i=1, 2, . . . M, each data set corresponding to a different sample, wherein x_j,i={0, 1} and x_j,i=1 if and only if a peak exists in the data set for sample i at index l_j; and
  
  defining P groups of samples by selecting a first array of probabilities p_k,i;
  
  k=1, 2, . . . P;
  
  i=1, 2, . . . N that peaks will occur at indices l₁, l₂, . . . l_N, respectively, of an indexed data set that characterizes a sample when the sample is in group k;
  
  selecting a second array of probabilities q_k,i;
  
  k=1, 2, . . . P;
  
  i=1, 2, . . . N that peaks will occur at indices l₁, l₂, . . . l_N, respectively, of an indexed data set that characterizes the sample when the sample is not in group k; and
  
  selecting g_j∈
  
  {1, 2, . . . P};
  
  j=1, 2, . . . M, where sample j is in group g_j;
  
  wherein p_k,i, q_k,i, and g_jare selected to maximize $λ = \sum_{1 \leq j \leq M} {\sum_{1 \leq j \leq N} [\log (\frac{1 - p_{i}}{1 - q_{i}}) + x_{i, j} \log [\frac{p_{k, i} (1 - q_{k, i})}{q_{k, i} (1 - p_{k, i})}]] | k = g_{j}} .$
- View Dependent Claims (29)
- - 29. The method of claim 28, wherein P is also selected to maximize λ
    - .

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Battelle Memorial Institute
Original Assignee
Battelle Memorial Institute
Inventors
Jarman, Kristin, Wahl, Karen, Wahl, Jon, Willse, Alan
Primary Examiner(s)
ASSOUAD, PATRICK J

Application Number

US09/866,201
Publication Number

US 20020035449A1
Time in Patent Office

550 Days
Field of Search

702/189, 702/22, 702/23, 702/24, 702/25, 702/27, 702/28, 702/181, 356/300, 250/281, 250/282
US Class Current

702/189
CPC Class Codes

G01N 30/8631   Peaks

G01N 30/8675   Evaluation, i.e. decoding o...

G01N 30/8679   Target compound analysis, i...

G01N 30/8693   Models, e.g. prediction of ...

G06F 2218/14   by matching peak patterns

H01J 49/0036   Step by step routines descr...

Model for spectral and chromatographic data

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

29 Claims

Specification

Solutions

Use Cases

Quick Links

Model for spectral and chromatographic data

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

29 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links