Standardized Modeling Suite
First Claim
1. A non-transitory computer-readable storage medium having computer-executable program instructions stored thereon that when executed by a processor, cause the processor to perform steps comprising:
- (i) collecting data from a plurality of sources, wherein the collected data includes data related to a plurality of independent variables;
(ii) storing the collected data;
(iii) performing a plurality of checks on the collected data, wherein the plurality of checks include handling missing values, eliminating outlier values, and eliminating highly correlated independent variables in the collected data;
(iv) analyzing the checked data to determine one or more of the plurality of independent variables to be used in predicting a target variable;
(v) generating a model to estimate the target variable using the determined independent variables; and
(vi) generating a plurality of graphs related to governance and risk compliance of the model, wherein the plurality of graphs include a variance inflation factor.
1 Assignment
0 Petitions
Accused Products
Abstract
An enhanced modeling tool associated with an entity may facilitate end-to-end modeling of problems in any application space. The enhanced modeling tool may collect modeling data from a variety of sources, check the collected data, find the best predictor variables for a given target variable, estimate the model, implement the model, and validate the model. The output of each of these steps may be in a standardized format to allow other steps to directly incorporate the output. An additional feature of the system may include a reporting capability that generates supporting documents related to model governance and risk compliance.
21 Citations
20 Claims
-
1. A non-transitory computer-readable storage medium having computer-executable program instructions stored thereon that when executed by a processor, cause the processor to perform steps comprising:
-
(i) collecting data from a plurality of sources, wherein the collected data includes data related to a plurality of independent variables; (ii) storing the collected data; (iii) performing a plurality of checks on the collected data, wherein the plurality of checks include handling missing values, eliminating outlier values, and eliminating highly correlated independent variables in the collected data; (iv) analyzing the checked data to determine one or more of the plurality of independent variables to be used in predicting a target variable; (v) generating a model to estimate the target variable using the determined independent variables; and (vi) generating a plurality of graphs related to governance and risk compliance of the model, wherein the plurality of graphs include a variance inflation factor. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A computer-assisted method comprising:
-
collecting data from a plurality of data sources, wherein the collected data includes data related to a plurality of independent variables; storing the collected data in a datastore associated with the computer; using a processor associated with the computer, performing a plurality of checks on the collected data, wherein the plurality of checks include replacing missing values in the collected data with a median value of one of the plurality of independent variables, removing outlier values that are below a 0.5 percentile or above a 99.5 percentile for each of the plurality of independent variables in the collected data, and normalizing each of the plurality of independent variables; using the processor, analyzing the checked data to identify a subset of the plurality of independent variables for inclusion in a model for a dependent variable, wherein the analysis includes transforming categorical variables into numerical variables; and using the processor, generating the model using the identified subset of independent variables. - View Dependent Claims (15, 16, 17)
-
-
18. An apparatus comprising:
-
a data collection module configured to collect data from a plurality of sources, wherein the collected data includes data related to a plurality of independent variables; a data checking module configured to perform a plurality of checks on the collected data, wherein the plurality of checks include filling in missing values in the collected data with a median value of one of the plurality of independent variables, removing outlier values, removing highly correlated independent variables, and normalizing at least one of the plurality of independent variables; a data analysis module configured to identify a first subset of the plurality of independent variables for inclusion in a model for a dependent variable by correlating at least one of the plurality of independent variables with the dependent variable through a plurality of statistics, wherein the data analysis module is further configure to transform categorical variables into numerical variables; a data reduction module configured to identify a second subset of strongest independent variables from the first subset by boot screening the first subset of the plurality of independent variables; and a modeling module configured to estimate the model based on the identified second subset of strongest independent variables. - View Dependent Claims (19, 20)
-
Specification