Data analysis and predictive systems and related methodologies
First Claim
1. A method of decreasing a risk of disease in a person x, comprising:
- (A) obtaining a single nucleotide polymorphism (SNP) transductive model Mx suitable for use in data analysis, wherein the risk of disease specific to the person x is represented as input vector x, which comprises a plurality of variable features in relation to the risk of disease for which there is a global dataset D of samples also having the same variable features relating to the risk of disease as input vector x, and for which an outcome is known,(B) optimizing the transductive model bya) determining what number and a subset Vx of variable features of input vector x will be used in assessing an outcome for the input vector x;
b) determining what number Kx of samples from within the global data set D will form a neighborhood about input vector x;
c) selecting suitable Kx samples from the global data set which have the variable features that most closely accord to the variable features of the person x to form the neighborhood Dx;
d) ranking the Vx variable features within the neighborhood Dx in order of importance to the outcome and obtaining a weight vector Wx for all variable features Vx;
e) creating a prognostic transductive model Mx for each input vector x, having a set of model parameters Px and the other parameters Vx and Kx from elements a)-d);
f) testing an accuracy of the model Mx for each sample from Dx by a method selected from the group consisting of;
(i) calculating Wx as normalized SNR (Signal-to-Noise Ratio) coefficients and sorting the variables in descending order;
V1, V2, . . . , Vv, where;
w1>
=w2>
= . . . >
=wv, calculated as follows;
wl=abs(Ml(class 1,x)−
Ml(class 2,x))/(Stdl(class1)+Stdlclass2));
(ii) testing for a plurality of variables Vx a plurality of possible combinations of values of their weights Wx tested through a search to increase the overall accuracy of a model built on the data Dx;
(iii) applying a genetic statistical analysis procedure, if the number of variables prevents using method (ii) above;
(iv) applying a quantum inspired evolutionary statistical analysis technique, to select the optimal variable set Vx for every new input vector x and to weigh the variables through a probability wave function;
g) storing both the accuracy and the set of model parameters;
h) repeating elements a) and/or b) while applying an optimization procedure to optimize Vx and/or Kx, to determine their optimal values, before repeating elements c)-h) until the accuracy is maximized, wherein a number and a subset Vx of variable features of input vector x, and a number Kx of samples from within the global data set D that form a neighborhood about input vector x are determined anew each time elements a) and b) are repeated while applying an optimization procedure to optimize Vx and/or Kx;
(C) creating a SNP profile of sample x from patient x and a corresponding gene profile by mapping the SNPs from a final set Vx into genes;
(D) determining the risk of disease specific to the patient x using the optimized transductive model Mx by;
(I) forming a vector;
Fx={Vx,Wx,Kx,Dx,Mx,Px,t}, where the variable t represents the time of the model Mx creation;
(II) calculating the weighted distance D(Fx,Fd) as an aggregated indication of how much a person'"'"'s profile should change to reach an average desired profile Fd;
D(Fx,Fd)=Σ
l=1,vabs(Vlx−
Vld)·
wl;
(III) designing a vector of required variable changes, defined as;
deltaFx,d=(deltaVlx,d),for l−
1,v as follows;
(20)
deltaVlx,d=Vlx−
Vld, with an importance of;
wl
(21)(E) modifying variable features Vx in the patient x to be closer to Kx values associated with an improved outcome relative to a prognostic outcome y determined for the patient x so as to improve the prognostic outcome of the patient x;
(F) repeating elements a) through h) to determine an improved prognostic outcome using re-optimized transductive model Mx; and
(G) creating a scenario for treatment/drug design that includes a set of SNPs/genes and required changes for the person x to match in future, average profiles of control samples from Dx in order to decrease the risk of disease.
0 Assignments
0 Petitions
Accused Products
Abstract
A method, computer system, and computer memory medium optimizing a transductive model Mx suitable for use in data analysis and for determining a prognostic outcome specific to a particular subject are disclosed. The particular subject may be represented by an input vector, which includes a number of variable features in relation to a scenario of interest. Samples from a global dataset D also having the same features relating to the scenario and for which the outcome is known are determined. In an embodiment, a subset of the variable features within a neighborhood formed by the samples are ranked in order of importance to an outcome. The prognostic transductive model is then created based, at least in part, on the subset, the ranking, and the neighborhood. The subset and the neighborhood are then optimized until the accuracy of the transductive model is maximized.
11 Citations
11 Claims
-
1. A method of decreasing a risk of disease in a person x, comprising:
-
(A) obtaining a single nucleotide polymorphism (SNP) transductive model Mx suitable for use in data analysis, wherein the risk of disease specific to the person x is represented as input vector x, which comprises a plurality of variable features in relation to the risk of disease for which there is a global dataset D of samples also having the same variable features relating to the risk of disease as input vector x, and for which an outcome is known, (B) optimizing the transductive model by a) determining what number and a subset Vx of variable features of input vector x will be used in assessing an outcome for the input vector x; b) determining what number Kx of samples from within the global data set D will form a neighborhood about input vector x; c) selecting suitable Kx samples from the global data set which have the variable features that most closely accord to the variable features of the person x to form the neighborhood Dx; d) ranking the Vx variable features within the neighborhood Dx in order of importance to the outcome and obtaining a weight vector Wx for all variable features Vx; e) creating a prognostic transductive model Mx for each input vector x, having a set of model parameters Px and the other parameters Vx and Kx from elements a)-d); f) testing an accuracy of the model Mx for each sample from Dx by a method selected from the group consisting of; (i) calculating Wx as normalized SNR (Signal-to-Noise Ratio) coefficients and sorting the variables in descending order;
V1, V2, . . . , Vv, where;
w1>
=w2>
= . . . >
=wv, calculated as follows;
wl=abs(Ml(class 1,x)−
Ml(class 2,x))/(Stdl(class1)+Stdlclass2));(ii) testing for a plurality of variables Vx a plurality of possible combinations of values of their weights Wx tested through a search to increase the overall accuracy of a model built on the data Dx; (iii) applying a genetic statistical analysis procedure, if the number of variables prevents using method (ii) above; (iv) applying a quantum inspired evolutionary statistical analysis technique, to select the optimal variable set Vx for every new input vector x and to weigh the variables through a probability wave function; g) storing both the accuracy and the set of model parameters; h) repeating elements a) and/or b) while applying an optimization procedure to optimize Vx and/or Kx, to determine their optimal values, before repeating elements c)-h) until the accuracy is maximized, wherein a number and a subset Vx of variable features of input vector x, and a number Kx of samples from within the global data set D that form a neighborhood about input vector x are determined anew each time elements a) and b) are repeated while applying an optimization procedure to optimize Vx and/or Kx; (C) creating a SNP profile of sample x from patient x and a corresponding gene profile by mapping the SNPs from a final set Vx into genes; (D) determining the risk of disease specific to the patient x using the optimized transductive model Mx by; (I) forming a vector;
Fx={Vx,Wx,Kx,Dx,Mx,Px,t}, where the variable t represents the time of the model Mx creation;(II) calculating the weighted distance D(Fx,Fd) as an aggregated indication of how much a person'"'"'s profile should change to reach an average desired profile Fd;
D(Fx,Fd)=Σ
l=1,vabs(Vlx−
Vld)·
wl;(III) designing a vector of required variable changes, defined as;
deltaFx,d=(deltaVlx,d),for l−
1,v as follows;
(20)
deltaVlx,d=Vlx−
Vld, with an importance of;
wl
(21)(E) modifying variable features Vx in the patient x to be closer to Kx values associated with an improved outcome relative to a prognostic outcome y determined for the patient x so as to improve the prognostic outcome of the patient x; (F) repeating elements a) through h) to determine an improved prognostic outcome using re-optimized transductive model Mx; and (G) creating a scenario for treatment/drug design that includes a set of SNPs/genes and required changes for the person x to match in future, average profiles of control samples from Dx in order to decrease the risk of disease. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A non-transitory computer readable medium which contains a program executed by a processor for performing a method, the method comprising:
-
(A) obtaining a single nucleotide polymorphism (SNP) transductive model Mx suitable for use in data analysis, wherein the risk of disease specific to the person x is represented as input vector x, which comprises a plurality of variable features in relation to the risk of disease for which there is a global dataset D of samples also having the same variable features relating to the risk of disease as input vector x, and for which an outcome is known, (B) optimizing the transductive model by; a) determining what number and a subset Vx of variable features of input vector x will be used in assessing an outcome for the input vector x; b) determining what number Kx of samples from within the global data set D will form a neighborhood about input vector x; c) selecting suitable Kx samples from the global data set which have the variable features that most closely accord to the variable features of the person x to form the neighborhood Dx; d) ranking the Vx variable features within the neighborhood Dx in order of importance to the outcome and obtaining a weight vector Wx for all variable features Vx; e) creating a prognostic transductive model Mx for each input vector x, having a set of model parameters Px and the other parameters Vx and Kx from elements a)-d); f) testing an accuracy of the model Mx for each sample from Dx by a method selected from the group consisting of; (i) calculating Wx as normalized SNR (Signal-to-Noise Ratio) coefficients and sorting the variables in descending order;
V1, V2, . . . , Vv, where;
w1>
=w2>
, . . . >
=wy, calculated as follows;
w1=abs(M1(class 1,x)−
M1(class 2,x))/(Std1(class1)+Std1(class
2));(ii) testing for a plurality of variables Vx a plurality of possible combinations of values of their weights Wx tested through a search to increase the overall accuracy of a model built on the data Dx; (iii) applying a genetic statistical analysis procedure, if the number of variables prevents using method (ii) above; (iv) applying a quantum inspired evolutionary statistical analysis technique, to select the optimal variable set Vx for every new input vector x and to weigh the variables through a probability wave function; g) storing both the accuracy and the set of model parameters; h) repeating elements a) and/or b) while applying an optimization procedure to optimize Vx and Kx, to determine their optimal values, before repeating elements c)-h) until the accuracy is maximized, wherein a number and a subset Vx of variable features of input vector x, and a number Kx of samples from within the global data set D that form a neighborhood about input vector x are determined anew each time elements a) and b) are repeated while applying an optimization procedure to optimize Vx or Kx; (C) creating a SNP profile of sample x from person x and a corresponding gene profile by mapping the SNPs from a final set Vx into genes; (D) determining a prognostic outcome y specific to the person x using the optimized transductive model Mx by; (I) forming a vector;
Fx={Vx,Wx,Kx,Dx,Mx,Px,t}, where the variable t represents the time of the model Mx creation;(II) calculating the weighted distance D(Fx,Fd) as an aggregated indication of how much a person'"'"'s profile should change to reach an average desired profile Fd by using the following;
D(Fx,Fd)=Σ
l=1,vabs(Vlx−
Vld)·
wl;(III) designing a vector of required variable changes, defined as;
deltaFx,d=(deltaVlx,d), for l=1, v as follows;
deltaVlx,d=Vlx−
Vld, with an importance of;
Wl;(E) modifying variable features Vx in the person x to be closer to Kx values associated with an improved outcome relative to the prognostic outcome y determined for the person x so as to improve the prognostic outcome of the person x; (F) repeating elements a) through h) to determine an improved prognostic outcome using re-optimized transductive model Mx; and (G) creating a scenario for treatment/drug design that includes a set of SNPs/genes and required changes for the person x to match in future, average profiles of control samples from Dx in order to decrease the risk of disease.
-
Specification