Method and apparatus for significance testing and confidence interval construction based on user-specified distribution
First Claim
1. An apparatus for analyzing statistical data, said apparatus comprising a computing device having an input device;
- a data storage device in communication with said computing device; and
programming code means readable by said computing device whicha. receives a set of original statistical data in a data set DB;
b. receives a test statistic formula by which said data set DB will be analyzed;
c. receives a hypothesis in terms of said test statistic formula defining one of a property and a potential relationship among data contained in said data set DB;
d. calculates the numerical value NTS of said test statistic formula using said data set DB;
e. receives a probability distribution D relating to said statistical data set DB;
f. initiates an index i;
g. generates a random data set RDB(i) that is the same size and dimension as said data set DB and distributed according to said probability distribution D;
h. calculates the numerical value TS(i) of said test statistic formula using said randomly generated data set RDB(i);
i. stores, in said storage device, said numerical value TS(i) of said test statistic formula in a numerical test statistic array;
j. increments the index i and repeats steps g through i N times to create randomly generated data sets RDB(1) through RDB(N), calculates, for each random data set, a corresponding numerical value of its test statistic TS(1) through TS(N), and stores each numerical test statistic in said numerical test statistic array in said data storage device;
k. compares said numerical value NTS with said numerical test statistic array to determine a set of percentile values P corresponding to said numerical value NTS and an associated set of percentile indices p; and
l. output, by a computer output device, of a probability estimated from said set of percentile indices p to be used to accept or reject said hypothesis.
0 Assignments
0 Petitions
Accused Products
Abstract
A computer and computer implemented method and program product for analyzing statistical data in which the data to be analyzed need not be transformed into a “Normal” distribution, thus avoiding introduction of error. Generally, the user first determines a test statistic (formula) and associated null hypothesis. Then the distribution from which the original data arose, consistent with the null hypothesis, is defined. The computer then produces numerous randomly-generated data sets of the identical size and dimensions of the original statistical data set, according to the distribution defined above. A numerical value of the test statistic is computed from the test statistic formula for each randomly generated data set and stored in a vectored array. The numerical value of the test statistic computed from the original statistical data is then compared with the array and the associated percentile determined. With this information, the significance of the numerical value of the test statistic derived from the original data can be determined and the null hypothesis may be rejected, as indicated. Embodiments of the invention may likewise be used in alternative statistical applications, including computation of confidence intervals and likelihood ratios.
-
Citations
19 Claims
-
1. An apparatus for analyzing statistical data, said apparatus comprising a computing device having an input device;
- a data storage device in communication with said computing device; and
programming code means readable by said computing device whicha. receives a set of original statistical data in a data set DB;
b. receives a test statistic formula by which said data set DB will be analyzed;
c. receives a hypothesis in terms of said test statistic formula defining one of a property and a potential relationship among data contained in said data set DB;
d. calculates the numerical value NTS of said test statistic formula using said data set DB;
e. receives a probability distribution D relating to said statistical data set DB;
f. initiates an index i;
g. generates a random data set RDB(i) that is the same size and dimension as said data set DB and distributed according to said probability distribution D;
h. calculates the numerical value TS(i) of said test statistic formula using said randomly generated data set RDB(i);
i. stores, in said storage device, said numerical value TS(i) of said test statistic formula in a numerical test statistic array;
j. increments the index i and repeats steps g through i N times to create randomly generated data sets RDB(1) through RDB(N), calculates, for each random data set, a corresponding numerical value of its test statistic TS(1) through TS(N), and stores each numerical test statistic in said numerical test statistic array in said data storage device;
k. compares said numerical value NTS with said numerical test statistic array to determine a set of percentile values P corresponding to said numerical value NTS and an associated set of percentile indices p; and
l. output, by a computer output device, of a probability estimated from said set of percentile indices p to be used to accept or reject said hypothesis. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- a data storage device in communication with said computing device; and
-
10. A method for analyzing statistical data, comprising the steps of:
-
a. collecting said statistical data in a data set DB;
b. specifying a test statistic formula by which said data set DB will be analyzed;
c. specifying a hypothesis in terms of said test statistic formula defining one of a property and potential relationship among data contained in said data set DB;
d. computing the numerical value NTS of said test statistic formula using said data set DB;
e. specifying a probability distribution D relating to said statistical data set DB;
f. initiating an index i;
g. generating random data to create a random data set RDB(i) that is the same size and dimension as said data set DB and distributed according to said probability distribution D;
h. computing the numerical value TS(i) of said test statistic using said randomly generated data set RDB(i);
i. storing said numerical value TS(i) of said test statistic in a numerical test statistic array;
j. incrementing index i and repeating steps g through i N times to create randomly generated date sets RDB(1) through RDB(N), to determine, for each random data set, a corresponding numerical value of its test statistic TS(1) through TS(N), and to store each numerical test statistic in said numerical test statistic array;
k. comparing said numerical value NTS of said test statistic with said numerical test statistic array to determine a set of percentile values P corresponding to said numerical value NTS and an associated set of percentile indices p; and
l. determining whether to accept or reject said hypothesis based on a probability estimated from said set of percentile indices p to be used to accept or reject said hypothesis. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17)
-
-
18. A method for analyzing an original statistical data set, the original statistical data set having a size, a dimension and a distribution in accordance with a specified probability distribution, the method comprising:
-
generating a plurality of random data sets, each random data set having the size, the dimension and the distribution of the original statistical data set;
calculating a plurality of numerical values of test statistics corresponding to the plurality of random data sets, each numerical value being calculated according to a test statistic formula;
determining a relationship between the plurality of numerical values and a numerical value of a test statistic of the original data set, calculated in accordance with the test statistic formula;
determining a plurality of percentile values based on the plurality of numerical values with the numerical value of the test statistic of the random data sets; and
determining a plurality of percentile indices corresponding to the plurality of percentile values;
wherein the relationship between the plurality of numerical values to the numerical value of the test statistic of the original data set is determined based on the plurality of percentile values and the corresponding percentile indices. - View Dependent Claims (19)
-
Specification