Significance testing and confidence interval construction based on user-specified distributions
First Claim
1. A method for testing validity of a prediction model based on an original data set, comprising:
- specifying a test statistic formula;
computing a numerical value NTS of the test statistic using the test statistic formula and the original data set;
specifying a probability distribution relating to the original data set;
creating a plurality of random data sets RDB(i) using randomly generated data, in which i is a positive integer;
computing a plurality of numerical values TS(i) of the test statistic corresponding to the plurality of random data sets RDB(i), and storing each numerical value TS(i) in a numerical test statistic array; and
comparing the numerical value NTS with the numerical test statistic array to determine a non-empty set of percentile values corresponding to the numerical value NTS and an associated non-empty set of percentile indices.
0 Assignments
0 Petitions
Accused Products
Abstract
A computer implemented method and program for analyzing statistical an original data set having a first size, dimension and distribution. Multiple random data sets are generated, each having a second size, dimension and distribution related to the first size, dimension and distribution of the original data set. Numerical values of test statistics corresponding to the random data sets are calculated in accordance with a predetermined test statistic formula. A relationship between the numerical values corresponding to the random data sets and the numerical value of the test statistic corresponding to the random data set, calculated in accordance with the test statistic formula, is determined. It is determined that the original data set includes at least one factor not based on chance when the relationship indicates that the numerical value of the original test statistic is not within a range of the numerical values corresponding to the random data sets.
-
Citations
34 Claims
-
1. A method for testing validity of a prediction model based on an original data set, comprising:
-
specifying a test statistic formula;
computing a numerical value NTS of the test statistic using the test statistic formula and the original data set;
specifying a probability distribution relating to the original data set;
creating a plurality of random data sets RDB(i) using randomly generated data, in which i is a positive integer;
computing a plurality of numerical values TS(i) of the test statistic corresponding to the plurality of random data sets RDB(i), and storing each numerical value TS(i) in a numerical test statistic array; and
comparing the numerical value NTS with the numerical test statistic array to determine a non-empty set of percentile values corresponding to the numerical value NTS and an associated non-empty set of percentile indices. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A computing apparatus for analyzing an original data set, the original data set having a first size, dimension and distribution, the computing apparatus comprising:
-
a computing device for executing computer readable code;
an input device for receiving data, the input device being in communication with the computing device;
at least one data storage device for storing computer data, the data storage device being in communication with the computing device; and
a programming code reading device that reads computer executable code, the programming code reading device being in communication with the computing device;
the computer executable code causing the computing device to generate a plurality of random data sets, each random data set having a second size, dimension and distribution relating to the original data set;
calculate a plurality of numerical values of test statistics corresponding to the plurality of random data sets, each numerical value being calculated according to a test statistic formula; and
determine a relationship between the plurality of numerical values and the numerical value of the test statistic corresponding to the original data set, calculated in accordance with the test statistic formula. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22, 23)
-
-
24. A computer readable medium storing a computer program that determines a likelihood of at least one factor in an original data set not arising by chance, in accordance with a predetermined test statistic formula, the original data set having a first size, dimension and distribution, the program comprising:
-
a calculating source code segment that calculates a plurality of numerical values of test statistics corresponding to a plurality of randomly generated data sets, calculated in accordance with the predetermined test statistic formula, each randomly generated data set having a second size, dimension and distribution relating to the original data set;
a comparing source code segment that compares a numerical value of a test statistic calculated in accordance with the predetermined test statistic formula and calculated with the original data set, with the plurality of numerical values corresponding to the plurality of randomly generated data sets; and
a determining source code segment that determines that at least one factor in the original data set did not arise by chance when the numerical value of the test statistic calculated from the original data set is not within a range, within the plurality of numerical values corresponding to the plurality of randomly generated data sets, representative of numerical values arising by chance. - View Dependent Claims (25, 26, 27, 28, 29, 30, 31, 32, 33, 34)
-
Specification