Data analysis and predictive systems and related methodologies

US 9,195,949 B2
Filed: 03/30/2015
Issued: 11/24/2015
Est. Priority Date: 10/15/2008
Status: Active Grant

First Claim

Patent Images

1. A method of decreasing a risk of disease in a person x, comprising:

(A) obtaining a single nucleotide polymorphism (SNP) transductive model Mx suitable for use in data analysis, wherein the risk of disease specific to the person x is represented as input vector x, which comprises a plurality of variable features in relation to the risk of disease for which there is a global dataset D of samples also having the same variable features relating to the risk of disease as input vector x, and for which an outcome is known,(B) optimizing the transductive model bya) determining what number and a subset Vx of variable features of input vector x will be used in assessing an outcome for the input vector x;

b) determining what number Kx of samples from within the global data set D will form a neighborhood about input vector x;

c) selecting suitable Kx samples from the global data set which have the variable features that most closely accord to the variable features of the person x to form the neighborhood Dx;

d) ranking the Vx variable features within the neighborhood Dx in order of importance to the outcome and obtaining a weight vector Wx for all variable features Vx;

e) creating a prognostic transductive model Mx for each input vector x, having a set of model parameters Px and the other parameters Vx and Kx from elements a)-d);

f) testing an accuracy of the model Mx for each sample from Dx by a method selected from the group consisting of;

(i) calculating Wx as normalized SNR (Signal-to-Noise Ratio) coefficients and sorting the variables in descending order;

V1, V2, . . . , Vv, where;

w₁>

=w₂>

= . . . >

=w_v, calculated as follows;

w_l=abs(M_l^{(class 1,x)}−

M_l^{(class 2,x)})/(Std_l^(class1)+Std_l^class2));

(ii) testing for a plurality of variables Vx a plurality of possible combinations of values of their weights Wx tested through a search to increase the overall accuracy of a model built on the data Dx;

(iii) applying a genetic statistical analysis procedure, if the number of variables prevents using method (ii) above;

(iv) applying a quantum inspired evolutionary statistical analysis technique, to select the optimal variable set Vx for every new input vector x and to weigh the variables through a probability wave function;

g) storing both the accuracy and the set of model parameters;

h) repeating elements a) and/or b) while applying an optimization procedure to optimize Vx and/or Kx, to determine their optimal values, before repeating elements c)-h) until the accuracy is maximized, wherein a number and a subset Vx of variable features of input vector x, and a number Kx of samples from within the global data set D that form a neighborhood about input vector x are determined anew each time elements a) and b) are repeated while applying an optimization procedure to optimize Vx and/or Kx;

(C) creating a SNP profile of sample x from patient x and a corresponding gene profile by mapping the SNPs from a final set Vx into genes;

(D) determining the risk of disease specific to the patient x using the optimized transductive model Mx by;

(I) forming a vector;

Fx={Vx,Wx,Kx,Dx,Mx,Px,t}, where the variable t represents the time of the model Mx creation;

(II) calculating the weighted distance D(Fx,Fd) as an aggregated indication of how much a person'"'"'s profile should change to reach an average desired profile Fd;

D(Fx,Fd)=Σ

_l=1,vabs(V_lx−

V_ld)·

w_l;

(III) designing a vector of required variable changes, defined as;

deltaFx,d=(deltaV_lx,d)_{,for l−

1,v}as follows;

(20)
deltaV_lx,d=V_lx−

V_ld, with an importance of;

w_l

(21)(E) modifying variable features Vx in the patient x to be closer to Kx values associated with an improved outcome relative to a prognostic outcome y determined for the patient x so as to improve the prognostic outcome of the patient x;

(F) repeating elements a) through h) to determine an improved prognostic outcome using re-optimized transductive model Mx; and

(G) creating a scenario for treatment/drug design that includes a set of SNPs/genes and required changes for the person x to match in future, average profiles of control samples from Dx in order to decrease the risk of disease.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method, computer system, and computer memory medium optimizing a transductive model Mx suitable for use in data analysis and for determining a prognostic outcome specific to a particular subject are disclosed. The particular subject may be represented by an input vector, which includes a number of variable features in relation to a scenario of interest. Samples from a global dataset D also having the same features relating to the scenario and for which the outcome is known are determined. In an embodiment, a subset of the variable features within a neighborhood formed by the samples are ranked in order of importance to an outcome. The prognostic transductive model is then created based, at least in part, on the subset, the ranking, and the neighborhood. The subset and the neighborhood are then optimized until the accuracy of the transductive model is maximized.

11 Citations

View as Search Results

11 Claims

1. A method of decreasing a risk of disease in a person x, comprising:
- (A) obtaining a single nucleotide polymorphism (SNP) transductive model Mx suitable for use in data analysis, wherein the risk of disease specific to the person x is represented as input vector x, which comprises a plurality of variable features in relation to the risk of disease for which there is a global dataset D of samples also having the same variable features relating to the risk of disease as input vector x, and for which an outcome is known,(B) optimizing the transductive model bya) determining what number and a subset Vx of variable features of input vector x will be used in assessing an outcome for the input vector x;
  
  b) determining what number Kx of samples from within the global data set D will form a neighborhood about input vector x;
  
  c) selecting suitable Kx samples from the global data set which have the variable features that most closely accord to the variable features of the person x to form the neighborhood Dx;
  
  d) ranking the Vx variable features within the neighborhood Dx in order of importance to the outcome and obtaining a weight vector Wx for all variable features Vx;
  
  e) creating a prognostic transductive model Mx for each input vector x, having a set of model parameters Px and the other parameters Vx and Kx from elements a)-d);
  
  f) testing an accuracy of the model Mx for each sample from Dx by a method selected from the group consisting of;
  
  (i) calculating Wx as normalized SNR (Signal-to-Noise Ratio) coefficients and sorting the variables in descending order;
  
  V1, V2, . . . , Vv, where;
  
  w₁>
  
  =w₂>
  
  = . . . >
  
  =w_v, calculated as follows;
  
  w_l=abs(M_l^{(class 1,x)}−
  
  M_l^{(class 2,x)})/(Std_l^(class1)+Std_l^class2));
  
  (ii) testing for a plurality of variables Vx a plurality of possible combinations of values of their weights Wx tested through a search to increase the overall accuracy of a model built on the data Dx;
  
  (iii) applying a genetic statistical analysis procedure, if the number of variables prevents using method (ii) above;
  
  (iv) applying a quantum inspired evolutionary statistical analysis technique, to select the optimal variable set Vx for every new input vector x and to weigh the variables through a probability wave function;
  
  g) storing both the accuracy and the set of model parameters;
  
  h) repeating elements a) and/or b) while applying an optimization procedure to optimize Vx and/or Kx, to determine their optimal values, before repeating elements c)-h) until the accuracy is maximized, wherein a number and a subset Vx of variable features of input vector x, and a number Kx of samples from within the global data set D that form a neighborhood about input vector x are determined anew each time elements a) and b) are repeated while applying an optimization procedure to optimize Vx and/or Kx;
  
  (C) creating a SNP profile of sample x from patient x and a corresponding gene profile by mapping the SNPs from a final set Vx into genes;
  
  (D) determining the risk of disease specific to the patient x using the optimized transductive model Mx by;
  
  (I) forming a vector;
  
  Fx={Vx,Wx,Kx,Dx,Mx,Px,t}, where the variable t represents the time of the model Mx creation;
  
  (II) calculating the weighted distance D(Fx,Fd) as an aggregated indication of how much a person'"'"'s profile should change to reach an average desired profile Fd;
  
  D(Fx,Fd)=Σ
  
  _l=1,vabs(V_lx−
  
  V_ld)·
  
  w_l;
  
  (III) designing a vector of required variable changes, defined as;
  
  deltaFx,d=(deltaV_lx,d)_{,for l−
  
  1,v}as follows;
  
  (20)
  deltaV_lx,d=V_lx−
  
  V_ld, with an importance of;
  
  w_l
  
  (21)(E) modifying variable features Vx in the patient x to be closer to Kx values associated with an improved outcome relative to a prognostic outcome y determined for the patient x so as to improve the prognostic outcome of the patient x;
  
  (F) repeating elements a) through h) to determine an improved prognostic outcome using re-optimized transductive model Mx; and
  
  (G) creating a scenario for treatment/drug design that includes a set of SNPs/genes and required changes for the person x to match in future, average profiles of control samples from Dx in order to decrease the risk of disease.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method as claimed in claim 1, wherein optimizing the transductive model further comprises profiling input vector x and comparing important variable features against important variable features associated with a desired outcome to provide for, or assist with, development of scenarios for improvement of the outcome for input vector x.
  - 3. The method as claimed in claim 1, wherein the prognostic transductive model Mx is a personalized model.
  - 4. The method as claimed in claim 3, wherein the personalized model is a unique personalized model.
  - 5. The method as claimed in claim 1, wherein a known outcome is associated with each sample in the global dataset and determined neighborhood.
  - 6. The method as claimed in claim 1, wherein the global dataset has samples having one of at least two different outcomes, wherein a particular outcome for each sample is known.
  - 7. The method as claimed in claim 1, wherein new data is compared with accumulated existing data samples for which a future outcome is known for each sample.
  - 8. The method as claimed in claim 1, wherein one or more variable features of input vector x are selected as incapable of being altered for step (D)(III).
  - 9. The method as claimed in claim 1, wherein step (E) comprises administration of a drug.
  - 10. A computer system which includes:
    - a hardware comprising, a processor and associated memory for performing the method of claim 1.

11. A non-transitory computer readable medium which contains a program executed by a processor for performing a method, the method comprising:
- (A) obtaining a single nucleotide polymorphism (SNP) transductive model Mx suitable for use in data analysis, wherein the risk of disease specific to the person x is represented as input vector x, which comprises a plurality of variable features in relation to the risk of disease for which there is a global dataset D of samples also having the same variable features relating to the risk of disease as input vector x, and for which an outcome is known,(B) optimizing the transductive model by;
  
  a) determining what number and a subset Vx of variable features of input vector x will be used in assessing an outcome for the input vector x;
  
  b) determining what number Kx of samples from within the global data set D will form a neighborhood about input vector x;
  
  c) selecting suitable Kx samples from the global data set which have the variable features that most closely accord to the variable features of the person x to form the neighborhood Dx;
  
  d) ranking the Vx variable features within the neighborhood Dx in order of importance to the outcome and obtaining a weight vector Wx for all variable features Vx;
  
  e) creating a prognostic transductive model Mx for each input vector x, having a set of model parameters Px and the other parameters Vx and Kx from elements a)-d);
  
  f) testing an accuracy of the model Mx for each sample from Dx by a method selected from the group consisting of;
  
  (i) calculating Wx as normalized SNR (Signal-to-Noise Ratio) coefficients and sorting the variables in descending order;
  
  V1, V2, . . . , Vv, where;
  
  w1>
  
  =w2>
  
  , . . . >
  
  =wy, calculated as follows;
  
  w₁=abs(M1^{(class 1,x)}−
  
  M1^{(class 2,x)})/(Std1^(class1)+Std1^{(class
  
  2)});
  
  (ii) testing for a plurality of variables Vx a plurality of possible combinations of values of their weights Wx tested through a search to increase the overall accuracy of a model built on the data Dx;
  
  (iii) applying a genetic statistical analysis procedure, if the number of variables prevents using method (ii) above;
  
  (iv) applying a quantum inspired evolutionary statistical analysis technique, to select the optimal variable set Vx for every new input vector x and to weigh the variables through a probability wave function;
  
  g) storing both the accuracy and the set of model parameters;
  
  h) repeating elements a) and/or b) while applying an optimization procedure to optimize Vx and Kx, to determine their optimal values, before repeating elements c)-h) until the accuracy is maximized, wherein a number and a subset Vx of variable features of input vector x, and a number Kx of samples from within the global data set D that form a neighborhood about input vector x are determined anew each time elements a) and b) are repeated while applying an optimization procedure to optimize Vx or Kx;
  
  (C) creating a SNP profile of sample x from person x and a corresponding gene profile by mapping the SNPs from a final set Vx into genes;
  
  (D) determining a prognostic outcome y specific to the person x using the optimized transductive model Mx by;
  
  (I) forming a vector;
  
  Fx={Vx,Wx,Kx,Dx,Mx,Px,t}, where the variable t represents the time of the model Mx creation;
  
  (II) calculating the weighted distance D(Fx,Fd) as an aggregated indication of how much a person'"'"'s profile should change to reach an average desired profile Fd by using the following;
  
  D(Fx,Fd)=Σ
  
  _l=1,vabs(V_lx−
  
  V_ld)·
  
  w_l;
  
  (III) designing a vector of required variable changes, defined as;
  
  deltaFx,d=(deltaV_lx,d), for l=1, v as follows;
  
  deltaV_lx,d=V_lx−
  
  V_ld, with an importance of;
  
  Wl;
  
  (E) modifying variable features Vx in the person x to be closer to Kx values associated with an improved outcome relative to the prognostic outcome y determined for the person x so as to improve the prognostic outcome of the person x;
  
  (F) repeating elements a) through h) to determine an improved prognostic outcome using re-optimized transductive model Mx; and
  
  (G) creating a scenario for treatment/drug design that includes a set of SNPs/genes and required changes for the person x to match in future, average profiles of control samples from Dx in order to decrease the risk of disease.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nikola Kirilov Kasabov
Original Assignee
Nikola Kirilov Kasabov
Inventors
Kasabov, Nikola Kirilov
Primary Examiner(s)
Shah, Kamini S
Assistant Examiner(s)
PIERRE LOUIS, ANDRE

Application Number

US14/673,697
Publication Number

US 20150261926A1
Time in Patent Office

239 Days
Field of Search

703/2
US Class Current

1/1
CPC Class Codes

G06N 20/00   Machine learning

G06Q 10/04   Forecasting or optimisation...

G16B 20/00   ICT specially adapted for f...

G16B 20/20   Allele or variant detection...

G16B 40/00   ICT specially adapted for b...

G16B 40/20   Supervised data analysis

G16H 50/20   for computer-aided diagnosi...

Data analysis and predictive systems and related methodologies

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

11 Citations

11 Claims

Specification

Solutions

Use Cases

Quick Links

Data analysis and predictive systems and related methodologies

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

11 Citations

11 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links