Method and apparatus for significance testing and confidence interval construction based on user-specified distributions

US 20040236776A1
Filed: 06/29/2004
Published: 11/25/2004
Est. Priority Date: 06/15/2000
Status: Abandoned Application

First Claim

Patent Images

1. An apparatus for analyzing statistical data, the apparatus comprising a computing device for executing computer readable code and having an input device;

a storage device in communication with the computing device; and

a programming code reading device in communication with the computing device, which reads computer executable code, the computer executable code causing the computing device to;

receive a set of original statistical data and store the statistical data set in the storage device;

calculate a numerical value corresponding to the statistical data set according to a test statistic formula;

receive a probability distribution relating to the statistical data set;

generate a plurality of random data sets of at least the same size and dimension as the statistical data set and distributed according to the probability distribution;

calculate a numerical value corresponding to each of the plurality of random data sets according to the test statistic formula to produce a corresponding plurality of numerical values; and

compare the numerical value calculated from the statistical data set to the plurality of numerical values calculated from the plurality of random data sets to determine a relationship between them.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A computer and computer implemented method and program product for analyzing statistical data in which the data to be analyzed need not be transformed into a “Normal” distribution, thus avoiding introduction of error. Generally, the computer first determines a test statistic (formula) and associated null hypothesis. Then the distribution from which the original data arose, consistent with the null hypothesis, is defined. The computer then produces numerous randomly-generated data sets of the identical size and dimensions of the original statistical data set, according to the distribution defined above. A numerical value of the test statistic is computed from the test statistic formula for each randomly generated data set and stored in a vectored array. The numerical value of the test statistic computed from the original statistical data is then compared with the array and the associated percentile determined. With this information, the significance of the numerical value of the test statistic derived from the original data can be determined and the null hypothesis may be rejected, and if so, at what level of significance. Embodiments of the invention may likewise be used in alternative statistical applications, including computation of confidence intervals and likelihood ratios.

Citations

20 Claims

1. An apparatus for analyzing statistical data, the apparatus comprising a computing device for executing computer readable code and having an input device;
- a storage device in communication with the computing device; and
  
  a programming code reading device in communication with the computing device, which reads computer executable code, the computer executable code causing the computing device to;
  
  receive a set of original statistical data and store the statistical data set in the storage device;
  
  calculate a numerical value corresponding to the statistical data set according to a test statistic formula;
  
  receive a probability distribution relating to the statistical data set;
  
  generate a plurality of random data sets of at least the same size and dimension as the statistical data set and distributed according to the probability distribution;
  
  calculate a numerical value corresponding to each of the plurality of random data sets according to the test statistic formula to produce a corresponding plurality of numerical values; and
  
  compare the numerical value calculated from the statistical data set to the plurality of numerical values calculated from the plurality of random data sets to determine a relationship between them.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The apparatus of claim 1, in which the relationship between the numerical value calculated from the statistical data set and the plurality of numerical values calculated from the plurality of random data sets is defined by a null hypothesis, which null hypothesis is accepted or rejected based on the relationship.
  - 3. The apparatus of claim 1, in which the relationship between the numerical value calculated from the statistical data set and the plurality of numerical values calculated from the plurality of random data sets is indicative of a confidence interval, which confidence interval is defined by certain of the plurality of numerical values calculated from the plurality of random data sets.
  - 4. The apparatus of claim 1, in which the plurality of random data sets is generated using random processes expressed in a Monte Carlo technique.
  - 5. The apparatus of claim 1, in which the computing device receives and stores a plurality of probability distributions in the computer data storage device and determines the probability distribution by comparing the original data set with the plurality of stored probability distributions.
  - 6. The apparatus of claim 1, in which the probability distribution of the statistical data set is derived by the computing device.

7. A method for analyzing statistical data, comprising:
- collecting a set of original data;
  
  calculating a numerical value corresponding to the statistical data set according to a specified test statistic formula;
  
  specifying a probability distribution relating to the statistical data set;
  
  generating a plurality of random data sets of at least the same size and dimension as the statistical data set and distributed according to the probability distribution;
  
  calculating a numerical value corresponding to each of the plurality of random data sets according to the test statistic formula to produce a corresponding plurality of numerical values;
  
  calculating a plurality of percentile values and corresponding percentile indices from the plurality of numerical values; and
  
  comparing the numerical value calculated from the statistical data set to at least one of the plurality of percentile values to determine a relationship between them.
- View Dependent Claims (8, 9, 10, 11)
- - 8. The method of claim 7, in which the relationship between the numerical value calculated from the statistical data set and the plurality of numerical values calculated from the plurality of random data sets is defined by a null hypothesis, which null hypothesis is accepted or rejected based on the relationship.
  - 9. The method of claim 7, in which the relationship between the numerical value calculated from the statistical data set and the plurality of numerical values calculated from the plurality of random data sets is indicative of membership in a confidence interval, which confidence interval is defined by certain of the plurality of numerical values calculated from the plurality of random data sets.
  - 10. The method of claim 7, in which the plurality of random data sets is generated using random processes expressed in a Monte Carlo technique.
  - 11. The method of claim 7, in which the steps are implemented by a computing apparatus comprising a computing device having an input device, a data storage device in communication with the computing device, and programming code readable by the computing device.

12. A method for analyzing an original statistical data set, the original statistical data set having a size, a dimension and a distribution in accordance with a specified probability distribution, the method comprising:
- generating a plurality of random data sets, each random data set having the size, the dimension and the distribution as the original statistical data set;
  
  calculating a plurality of numerical values of test statistics corresponding to the plurality of random data sets, each numerical value being calculated according to a test statistic formula; and
  
  determining a relationship between the plurality of numerical values and a numerical value of a test statistic of the original data set, calculated in accordance with the test statistic formula.
- View Dependent Claims (13)
- - 13. The method for analyzing the original statistical data set according to claim 12, in which the relationship between the plurality of numerical values and the numerical value corresponding to the original statistical data set tests whether the original statistical data set is characterized by at least one factor that is not based on chance.

14. A method for testing validity of a prediction model based on an original data set, comprising:
- deriving the prediction model;
  
  specifying a test statistic formula relating to the derived prediction model;
  
  computing a numerical value NTS of the test statistic using the test statistic formula and the original data set;
  
  specifying a probability distribution relating to the original data set;
  
  creating a plurality of random data sets RDB(i) using randomly generated data, in which i is a positive integer;
  
  computing a plurality of numerical values TS(i) of the test statistic corresponding to the plurality of random data sets RDB(i), and storing each numerical value TS(i) in a numerical test statistic array; and
  
  comparing the numerical value NTS with the numerical test statistic array to determine a non-empty set of percentile values corresponding to the numerical value NTS and an associated non-empty set of percentile indices.
- View Dependent Claims (15, 16, 17, 18, 19, 20)
- - 15. The method for testing validity of a prediction model according to claim 14, in which creating the plurality of random data sets RDB(i) comprises using randomly generated data according to a Monte Carlo technique.
  - 16. The method of claim 14, in which the prediction model is derived from at least observations of a time series made before a time t.
  - 17. The method of claim 16, further comprising determining the validity of the prediction model by comparing predictions of the time series made by the prediction model with observations of the time series made after the time t.
  - 18. The method of claim 14, further comprising modifying the prediction model based on the determined validity of the prediction model.
  - 19. The method of claim 14, in which the prediction model is selected from among at least two previously derived prediction models.
  - 20. The method of claim 14, in which the prediction model is derived from at least observed values of variables other than time series.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Terrence B. Peace
Original Assignee
Terrence B. Peace
Inventors
Peace, Terrence B.

Application Number

US10/878,410
Publication Number

US 20040236776A1
Time in Patent Office

Days
Field of Search
US Class Current

707/100
CPC Class Codes

G06F 17/18 for evaluating statistical ...

Y10S 707/99943 Generating database or data...

Method and apparatus for significance testing and confidence interval construction based on user-specified distributions

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus for significance testing and confidence interval construction based on user-specified distributions

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links