Systems and methods for diagnosing a biological specimen using probabilities

US 7,747,547 B1
Filed: 02/10/2009
Issued: 06/29/2010
Est. Priority Date: 10/31/2007
Status: Expired due to Fees

First Claim

Patent Images

1. A computer implemented method of determining, for each respective phenotypic characterization in a set of {T₁, . . . , T_k} phenotypic characterizations, a probability that a test biological specimen has the respective phenotypic characterization, the method comprising:

(A) learning a pairwise probability function g_pq(X, W_pq) using a training population, for a pair of phenotypic characterizations (T_p, T_q) in the set of {T₁, . . . , T_k} phenotypic characterizations, wherein(i) there are at least five training samples in the training population for each phenotypic characterization in the set of {T₁, . . . , T_k} phenotypic characterizations;

(ii) Y is the set of all training samples in the training population that exhibits either phenotypic characterization T_por phenotypic characterization T_q, and each Y_iin Y is the set of {y_i1, . . . , y_in} cellular constituent abundance values for a plurality of cellular constituents measured from a sample i, from the training population, which exhibits either phenotypic characterization T_por phenotypic characterization T_q;

(iii) W_pqis a set of parameters derived from Y in the learning step (A) for a pair of phenotypic characterizations (T_p, T_q) by substituting each Y_iinto g_pq(X, W_pq), as X, during said learning step (A);

(iv) k is 3 or greater;

(v) n is at least 1; and

(vi) p is not equal to q;

(B) repeating the learning step (A) for a different pair of phenotypic characterizations (T_p, T_q), using the training population, for all unique pairs of phenotypic characterizations in the set of {T₁. . . , T_k} phenotypic characterizations, thereby deriving a plurality of pairwise probability functions G={g_1,2(X, W_1,2), . . . , g_{k-1, k}(X, W_{k-1, k})};

(C) computing a plurality of pairwise probability values P={p_1,2, . . . , p_{k-1, k}}, wherein each pairwise probability value p_pqin P is equal to g_pq(Z, W_pq) in G, the probability that the test biological specimen has phenotypic characterization T_pand does not have phenotypic characterization T_q, wherein Z is a set of {z₁, . . . , z_n} cellular constituent abundance values measured from the test biological specimen for said plurality of cellular constituents;

(D) optionally converting P to a set M of k probabilities, wherein M={p₁, p₂, . . . , p_k}, wherein each probability p_jin M is a probability for a phenotypic characterization in the set of {T₁, . . . , T_k} phenotypic characterizations that the test biological specimen has the phenotypic characterization such that

View all claims

8 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Apparatus, systems and methods for determining, for each respective phenotypic characterization in a set of {T₁, . . . , T_k} characterizations, that a test specimen has the respective characterization are provided. A pairwise probability function g_pq(X, W_pq), for a phenotypic pair (T_p, T_q) in {T₁, . . . , T_k} is learned using a training population. W_pqis a set of parameters derived from Y for (T_p, T_q) by substituting each Y₁in Y into g_pq(X, W_pq), as X, where Y_iis the set of cellular constituent abundance values from sample i in the training population exhibiting T_por T_q. The learning step is repeated for each (T_p, T_q) in {T₁. . . , T_k}, thereby deriving pairwise probability functions G={g_1,2(X, W_1,2), . . . , g_{k-1, k}(X, W_{k-1, k})}. Pairwise probability values P={p_1,2, . . . , p_{k-1, k}} are computed, where each p_pqis equal to g_pq(Z, W_pq) in G, the probability that the test specimen has T_pand not T_q, where Z is cellular constituent abundance values of the test specimen.

41 Citations

View as Search Results

25 Claims

1. A computer implemented method of determining, for each respective phenotypic characterization in a set of {T₁, . . . , T_k} phenotypic characterizations, a probability that a test biological specimen has the respective phenotypic characterization, the method comprising:
- (A) learning a pairwise probability function g_pq(X, W_pq) using a training population, for a pair of phenotypic characterizations (T_p, T_q) in the set of {T₁, . . . , T_k} phenotypic characterizations, wherein(i) there are at least five training samples in the training population for each phenotypic characterization in the set of {T₁, . . . , T_k} phenotypic characterizations;
  
  (ii) Y is the set of all training samples in the training population that exhibits either phenotypic characterization T_por phenotypic characterization T_q, and each Y_iin Y is the set of {y_i1, . . . , y_in} cellular constituent abundance values for a plurality of cellular constituents measured from a sample i, from the training population, which exhibits either phenotypic characterization T_por phenotypic characterization T_q;
  
  (iii) W_pqis a set of parameters derived from Y in the learning step (A) for a pair of phenotypic characterizations (T_p, T_q) by substituting each Y_iinto g_pq(X, W_pq), as X, during said learning step (A);
  
  (iv) k is 3 or greater;
  
  (v) n is at least 1; and
  
  (vi) p is not equal to q;
  
  (B) repeating the learning step (A) for a different pair of phenotypic characterizations (T_p, T_q), using the training population, for all unique pairs of phenotypic characterizations in the set of {T₁. . . , T_k} phenotypic characterizations, thereby deriving a plurality of pairwise probability functions G={g_1,2(X, W_1,2), . . . , g_{k-1, k}(X, W_{k-1, k})};
  
  (C) computing a plurality of pairwise probability values P={p_1,2, . . . , p_{k-1, k}}, wherein each pairwise probability value p_pqin P is equal to g_pq(Z, W_pq) in G, the probability that the test biological specimen has phenotypic characterization T_pand does not have phenotypic characterization T_q, wherein Z is a set of {z₁, . . . , z_n} cellular constituent abundance values measured from the test biological specimen for said plurality of cellular constituents;
  
  (D) optionally converting P to a set M of k probabilities, wherein M={p₁, p₂, . . . , p_k}, wherein each probability p_jin M is a probability for a phenotypic characterization in the set of {T₁, . . . , T_k} phenotypic characterizations that the test biological specimen has the phenotypic characterization such that
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
- - 2. The computer implemented method of claim 1, wherein
  - 3. The computer implemented method of claim 2, wherein s(Z, R_i) is equal to the value of the kernel function e⁽⁻
    - y[(z¹^−
      
      rⁱ¹⁾²^+(z²^−
      
      rⁱ²⁾²^{+ . . . (z}ⁿ^−
      
      rⁱⁿ⁾²^]), wherein z₁, . . . , z_nare cellular constituent abundance values in Z that respectively correspond to cellular constituent abundance values r_i1, . . . , r_inin R_i; and
      
      wherein s(Z, R_j) is equal to the value of the kernel function e^(−
      
      y[(z¹^−
      
      r^j1⁾²^+(z²^−
      
      r^j2⁾²^{+ . . . (z}ⁿ^−
      
      r^jn⁾²^]), where z₁, . . . , z_nare cellular constituent abundance values in Z that respectively correspond to cellular constituent abundance values r_j1, . . . , r_jnin R_j.
  - 4. The computer implemented method of claim 2, the method further comprising determining values, for the given pair of phenotypic characterization (T_p, T_q), for the set of weights w′
    - _is, w″
      
      _j, and b used in g_pq(Z, W_pq) before the computing step (C) by subjecting each set of cellular constituent abundance values in the training population that was measured from samples that have phenotypic characterization T_por T_qto a support vector machine.
  - 5. The computer implemented method of claim 2, wherein
  - 6. The computer implemented method of claim 1, wherein a phenotypic characterization in the plurality of phenotypic characterizations is an organ type, an abnormal state in an organ, an tissue type, an abnormal state in a tissue, a cell type, an abnormal cell type, a cell morphology, an abnormal cell morphology, a disease state, a disease prognosis, or a therapeutic response.
  - 7. The computer implemented method of claim 1, wherein the set of cellular constituent abundance values Z for the plurality of cellular constituents measured from the test biological specimen and the set of cellular constituent abundance values Y_ifor the plurality of cellular constituents measured from the sample i from the training population are measured from a microarray comprising probes arranged with a density of 100 different probes per 1 cm²or higher.
  - 8. The computer implemented method of claim 1, wherein the set of cellular constituent abundance values Z for the plurality of cellular constituents measured from the test biological specimen and the set of cellular constituent abundance values Y_ifor the plurality of cellular constituents measured from the sample i from the training population are measured from a microarray comprising probes arranged with a density of at least 2,500 different probes per 1 cm².
  - 9. The computer implemented method of claim 1, wherein the set of cellular constituent abundance values Z for the plurality of cellular constituents measured from the test biological specimen and the set of cellular constituent abundance values Y_ifor the plurality of cellular constituents measured from the sample i from the training population are measured from a microarray comprising at least 10,000 different probes.
  - 10. The computer implemented method of claim 1, wherein the set of cellular constituent abundance values Z for the plurality of cellular constituents measured from the test biological specimen and the set of cellular constituent abundance values Y_ifor the plurality of cellular constituents measured from the sample i from the training population are measured from an expression microarray, a comparative genomic hybridization microarray, an exon microarray, or a microRNA microarray.
  - 11. The computer implemented method of claim 1, wherein the set of cellular constituent abundance values Z for the plurality of cellular constituents measured from the test biological specimen and the set of cellular constituent abundance values Y_ifor the plurality of cellular constituents measured from the sample i from the training population are measured from a microarray comprising between 10 oligonucleotides and 5×
    - 10⁶oligonucleotides.
  - 12. The computer implemented method of claim 1, wherein the plurality of cellular constituents is between 5 mRNAs and 50,000 mRNAs and the cellular constituent abundance values are amounts of each mRNA.
  - 13. The computer implemented method of claim 1, wherein the plurality of cellular constituents is between 50 proteins and 200,000 proteins and the cellular constituent abundance values are amounts of each protein.
  - 14. The computer implemented method of claim 1, wherein, for each respective phenotypic characterization in the plurality of phenotypic characterizations, the training population comprises at least three samples that have the respective phenotypic characterization.
  - 15. The computer implemented method of claim 1, wherein each phenotypic characterization in the plurality of phenotypic characterizations is a cancer tissue of origin and wherein the plurality of phenotypic characterizations comprises bladder cancer, breast cancer, colorectal cancer, gastric cancer, germ cell cancer, kidney cancer, hepatocellular cancer, non-small cell lung cancer, non-Hodgkin'"'"'s lymphoma, melanoma, ovarian cancer, pancreatic cancer, prostate cancer, soft tissue sarcoma, and thyroid cancer.
  - 16. The computer implemented method of claim 1, wherein the plurality of phenotypic characterizations is between 2 phenotypic characterizations and 100 phenotypic characterizations.
  - 17. The computer implemented method of claim 1, further comprising receiving the set of cellular constituent abundance values Z from a remote source over a computer network, and communicating the one or more pairwise probabilities p_pqin P and/or the one or more p_jin M to the remote source over said computer network.
  - 18. The computer implemented method of claim 17, wherein the remote source is a remote computer or a remote computer system.
  - 19. The computer implemented method of claim 1, wherein the converting step (C) comprises deeming the set of probabilities {p₁, p₂, . . . , p_k} that minimize the criterion f(p₁, p₂, . . . , p_k) over p_i, where f( )is defined as
  - 20. The computer implemented method of claim 1, wherein said learning of the pairwise probability function g_pq(X, W_pq) comprises using a decision tree, predictive analysis of microarrays, a multiple additive regression tree, a neural network, a clustering algorithm, principal component analysis, a nearest neighbor analysis, a linear discriminant analysis, a quadratic discriminant analysis, a support vector machine, an evolutionary method, a projection pursuit, a radial basis function, or weighted voting.

21. An apparatus for determining, for each respective phenotypic characterization in a set of {T₁, . . . , T_k} phenotypic characterizations, a probability that a test biological specimen has the respective characterization, the apparatus comprising:
- a processor; and
  
  a memory, coupled to the processor, the memory storing a module comprising;
  
  (A) instructions for learning a pairwise probability function g_pq(X, W_pq) using a training population, for a pair of phenotypic characterizations (T_p, T_q) in the set of {T₁, . . . , T_k} phenotypic characterizations, wherein;
  
  (i) there are at least five training samples in the training population for each phenotypic characterization in the set of {T₁, . . . , T_k} phenotypic characterizations;
  
  (ii) Y is the set of all training samples in the training population that exhibits either phenotypic characterization T_por phenotypic characterization T_q, and each Y_iin Y is the set of {y_i1, . . . , y_in} cellular constituent abundance values for a plurality of cellular constituents measured from a sample i, from the training population, which exhibits either phenotypic characterization T_por phenotypic characterization T_q;
  
  (iii) W_pqis a set of parameters derived from Y by the instructions for learning (A) for a pair of phenotypic characterizations (T_p, T_q) by substituting each Y_iinto g_pq(X, W_pq), as X, during said learning step (A);
  
  (iv) k is 3 or greater;
  
  (v) n is at least 1; and
  
  (vi) p is not equal to q,(B) instructions for repeating the instructions for learning (A) for a different pair of phenotypic characterizations (T_p, T_q), using the training population, for all unique pairs of phenotypic characterizations in the set of {T₁. . . , T_k} phenotypic characterizations, thereby deriving a plurality of pairwise probability functions G={g_1,2(X, W_1,2), . . . , g_{k-1, k}(X, W_{k-1, k})};
  
  (C) instructions for computing a plurality of pairwise probability values P={p_1,2, . . . , p_{k-1, k}}, wherein each pairwise probability value p_pqin P is equal to g_pq(Z, W_pq) in G, the probability that the test biological specimen has phenotypic characterization T_pand does not have phenotypic characterization T_q, wherein Z is a set of {z₁, . . . , z_n} cellular constituent abundance values measured from the test biological specimen for said plurality of cellular constituents;
  
  (D) optionally, instructions for converting P to a set M of k probabilities, wherein M={p₁, p₂, . . . , p_k}, wherein each probability p_jin M is a probability for a phenotypic characterization in the set of {T₁, . . . , T_k} phenotypic characterizations that the biological specimen has the phenotypic characterization such that
- View Dependent Claims (22, 23, 24)
- - 22. The apparatus of claim 21, said module further comprising instructions for:
    - receiving the set of cellular constituent abundance values Z from a remote source over a computer network, andcommunicating the one or more pairwise probabilities p_pqin P or the one or more p_jin M to the remote source over said computer network.
  - 23. The apparatus of claim 21, wherein the memory further comprises Y and an indication of the phenotypic characterization of each sample i in the training population.
  - 24. The apparatus of claim 21, wherein the network is the Internet.

25. A computer-readable medium storing a computer program executable by a computer to determine, for each respective phenotypic characterization in a set of {T₁, . . . , T_k} phenotypic characterizations, a probability that a test biological specimen has the respective phenotypic characterization, the computer program comprising:
- (A) instructions for learning a pairwise probability function g_pq(X, W_pq) using a training population, for a pair of phenotypic characterizations (T_p, T_q) in the set of {T₁, . . . , T_k} phenotypic characterizations, wherein;
  
  (i) there are at least five training samples in the training population for each phenotypic characterization in the set of {T₁, . . . , T_k} phenotypic characterizations;
  
  (ii) Y is the set of all training samples in the training population that exhibits either phenotypic characterization T_por phenotypic characterization T_q, and each Y_iin Y is the set of {y_i1, . . . , y_in} cellular constituent abundance values for a plurality of cellular constituents measured from a sample i, from the training population, which exhibits either phenotypic characterization T_por phenotypic characterization T_q;
  
  (iii) W_pqis a set of parameters derived from Y in the learning step (A) for a pair of phenotypic characterizations (T_p, T_q) by substituting each Y_iinto g_pq(X, W_pq), as X, by the instructions for learning (A);
  
  (iv) k is 3 or greater;
  
  (v) n is at least 1; and
  
  (vi) p is not equal to q,(B) instructions for repeating the instructions for learning (A) for a different pair of phenotypic characterizations (T_p, T_q), using the training population, for all unique pairs of phenotypic characterizations in the set of {T₁. . . , T_k} phenotypic characterizations, thereby deriving a plurality of pairwise probability functions G={g_1,2(X, W_1,2), . . . , g_{k-1, k}(X, W_{k-1, k})};
  
  (C) instructions for computing a plurality of pairwise probability values P={p_1,2, . . . , p_{k-1, k}}, wherein each pairwise probability value p_pqin P is equal to g_pq(Z, W_pq) in G, the probability that the test biological specimen has phenotypic characterization T_pand does not have phenotypic characterization T_q, wherein Z is a set of {z₁, . . . , z_n} cellular constituent abundance values measured from the test biological specimen for said plurality of cellular constituents;
  
  (D) optionally, instructions for converting P to a set M of k probabilities, wherein M={p₁, p₂, . . . , p_k}, wherein each probability p_jin M is a probability for a phenotypic characterization in the set of {T₁, . . . , T_k} phenotypic characterizations that the biological specimen has the phenotypic characterization such that

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Cancer Genetics
Original Assignee
Pathwork Diagnostics, Inc. (Response Genetics, Inc)
Inventors
Buturovic, Ljubomir J., Anderson, Glenda G.
Primary Examiner(s)
VINCENT, DAVID ROBERT

Application Number

US12/378,165
Time in Patent Office

504 Days
Field of Search

706/12, 706/45, 706/20, 435/6, 702/19, 702/20, 702/188, 703/11
US Class Current

706/12
CPC Class Codes

G06F 18/2415   based on parametric or prob...

G06V 10/764   using classification, e.g. ...

G06V 2201/04   Recognition of patterns in ...

G16B 20/00   ICT specially adapted for f...

G16B 20/20   Allele or variant detection...

G16B 20/30   Detection of binding sites ...

G16B 25/00   ICT specially adapted for h...

G16B 40/00   ICT specially adapted for b...

G16B 40/20   Supervised data analysis

G16B 40/30   Unsupervised data analysis

Systems and methods for diagnosing a biological specimen using probabilities

First Claim

8 Assignments

0 Petitions

Accused Products

Abstract

41 Citations

25 Claims

Specification

Use Cases

Quick Links

Others

Systems and methods for diagnosing a biological specimen using probabilities

First Claim

8 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

41 Citations

25 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others