Method and apparatus for significance testing and confidence interval construction based on user-specified distributions
First Claim
1. An apparatus for analyzing statistical data, the apparatus comprising a computing device for executing computer readable code and having an input device;
- a storage device in communication with the computing device; and
a programming code reading device in communication with the computing device, which reads computer executable code, the computer executable code causing the computing device to;
receive a set of original statistical data and store the statistical data set in the storage device;
calculate a numerical value corresponding to the statistical data set according to a test statistic formula;
receive a probability distribution relating to the statistical data set;
generate a plurality of random data sets of at least the same size and dimension as the statistical data set and distributed according to the probability distribution;
calculate a numerical value corresponding to each of the plurality of random data sets according to the test statistic formula to produce a corresponding plurality of numerical values; and
compare the numerical value calculated from the statistical data set to the plurality of numerical values calculated from the plurality of random data sets to determine a relationship between them.
0 Assignments
0 Petitions
Accused Products
Abstract
A computer and computer implemented method and program product for analyzing statistical data in which the data to be analyzed need not be transformed into a “Normal” distribution, thus avoiding introduction of error. Generally, the computer first determines a test statistic (formula) and associated null hypothesis. Then the distribution from which the original data arose, consistent with the null hypothesis, is defined. The computer then produces numerous randomly-generated data sets of the identical size and dimensions of the original statistical data set, according to the distribution defined above. A numerical value of the test statistic is computed from the test statistic formula for each randomly generated data set and stored in a vectored array. The numerical value of the test statistic computed from the original statistical data is then compared with the array and the associated percentile determined. With this information, the significance of the numerical value of the test statistic derived from the original data can be determined and the null hypothesis may be rejected, and if so, at what level of significance. Embodiments of the invention may likewise be used in alternative statistical applications, including computation of confidence intervals and likelihood ratios.
-
Citations
20 Claims
-
1. An apparatus for analyzing statistical data, the apparatus comprising a computing device for executing computer readable code and having an input device;
- a storage device in communication with the computing device; and
a programming code reading device in communication with the computing device, which reads computer executable code, the computer executable code causing the computing device to;
receive a set of original statistical data and store the statistical data set in the storage device;
calculate a numerical value corresponding to the statistical data set according to a test statistic formula;
receive a probability distribution relating to the statistical data set;
generate a plurality of random data sets of at least the same size and dimension as the statistical data set and distributed according to the probability distribution;
calculate a numerical value corresponding to each of the plurality of random data sets according to the test statistic formula to produce a corresponding plurality of numerical values; and
compare the numerical value calculated from the statistical data set to the plurality of numerical values calculated from the plurality of random data sets to determine a relationship between them. - View Dependent Claims (2, 3, 4, 5, 6)
- a storage device in communication with the computing device; and
-
7. A method for analyzing statistical data, comprising:
-
collecting a set of original data;
calculating a numerical value corresponding to the statistical data set according to a specified test statistic formula;
specifying a probability distribution relating to the statistical data set;
generating a plurality of random data sets of at least the same size and dimension as the statistical data set and distributed according to the probability distribution;
calculating a numerical value corresponding to each of the plurality of random data sets according to the test statistic formula to produce a corresponding plurality of numerical values;
calculating a plurality of percentile values and corresponding percentile indices from the plurality of numerical values; and
comparing the numerical value calculated from the statistical data set to at least one of the plurality of percentile values to determine a relationship between them. - View Dependent Claims (8, 9, 10, 11)
-
-
12. A method for analyzing an original statistical data set, the original statistical data set having a size, a dimension and a distribution in accordance with a specified probability distribution, the method comprising:
-
generating a plurality of random data sets, each random data set having the size, the dimension and the distribution as the original statistical data set;
calculating a plurality of numerical values of test statistics corresponding to the plurality of random data sets, each numerical value being calculated according to a test statistic formula; and
determining a relationship between the plurality of numerical values and a numerical value of a test statistic of the original data set, calculated in accordance with the test statistic formula. - View Dependent Claims (13)
-
-
14. A method for testing validity of a prediction model based on an original data set, comprising:
-
deriving the prediction model;
specifying a test statistic formula relating to the derived prediction model;
computing a numerical value NTS of the test statistic using the test statistic formula and the original data set;
specifying a probability distribution relating to the original data set;
creating a plurality of random data sets RDB(i) using randomly generated data, in which i is a positive integer;
computing a plurality of numerical values TS(i) of the test statistic corresponding to the plurality of random data sets RDB(i), and storing each numerical value TS(i) in a numerical test statistic array; and
comparing the numerical value NTS with the numerical test statistic array to determine a non-empty set of percentile values corresponding to the numerical value NTS and an associated non-empty set of percentile indices. - View Dependent Claims (15, 16, 17, 18, 19, 20)
-
Specification