Automated generator of optimal models for the statistical analysis of data
First Claim
1. A process for producing, from sample data tables, an accurate statistical model, including choice of significant covariates and correlations between model covariates, the process comprising:
- a. providing a sample data table listing either (a) the recorded occurrences of one of two or more possible events, (b) the recorded number of occurrences of a possible event, and (c) the recorded measurements of a set of variables;
b. generating statistical models fitting the sample data table;
c. solving for optimal parameters of each statistical model considered;
d. using model test statistics and the number of degrees of freedom in each model to assess the suitability of models, to arrive at a complete ordering of the models, and to determine which additional models to build, solve, and test;
e. providing a statistical model that has the highest observed ordering, and thus most closely fits the sample data table;
f. providing average table values, including the possibility of values in table entries where no sample data occurred, based on that model that attained the highest ordering when fit to the sample data table.
1 Assignment
0 Petitions
Accused Products
Abstract
Provided is an automated process for producing accurate statistical models from sample data tables. The process solves for the optimal parameters of each statistical model considered, computes test statistics and degrees of freedom in the model, and uses these test statistics and degrees of freedom to establish a complete ordering of the statistical models. In cases where the sample data table is sufficiently small, the process constructs and analyzes all reasonable statistical models that might fit the data table provided. In cases where the number of possible models is prohibitively high, the process begins by constructing and solving more general models and then constructs and solves those more detailed models that are similar to those general models that achieved the highest ordering. In either of these two cases, the process arrives at a statistical model that is highest in the ordering and is thus deemed most suitable to model the sample data table. The result of this process is a statistical model deemed to be most suitable to model the sample data table and a set of average table values produced by this resulting model. This resulting table may include modeled values for table entries for which no initial data was supplied.
This invention finds application in the area of credit scoring, where covariates such as age, profession, gender, and credit history are used to determine the likelihood that an individual will default on a loan. It also finds application in analyzing the effectiveness of many types of tools as they are used in various environments (e.g., the effectiveness of radar when used in different weather conditions). It also finds application in the area of insurance, where one wishes to estimate the future number of claims against a specific insurance policy based on a database of past insurance claims.
36 Citations
6 Claims
-
1. A process for producing, from sample data tables, an accurate statistical model, including choice of significant covariates and correlations between model covariates, the process comprising:
-
a. providing a sample data table listing either (a) the recorded occurrences of one of two or more possible events, (b) the recorded number of occurrences of a possible event, and (c) the recorded measurements of a set of variables;
b. generating statistical models fitting the sample data table;
c. solving for optimal parameters of each statistical model considered;
d. using model test statistics and the number of degrees of freedom in each model to assess the suitability of models, to arrive at a complete ordering of the models, and to determine which additional models to build, solve, and test;
e. providing a statistical model that has the highest observed ordering, and thus most closely fits the sample data table;
f. providing average table values, including the possibility of values in table entries where no sample data occurred, based on that model that attained the highest ordering when fit to the sample data table. - View Dependent Claims (2, 3, 4, 5, 6)
-
Specification