Method and computer program product for using data mining tools to automatically compare an investigated unit and a benchmark unit
First Claim
1. A processor implemented method of comparing an investigated entity to a reference entity, the method comprising:
- augmenting a plurality of data points that correspond to a variable or characteristic of the investigated entity or the reference entity by creating a target variable whose value is indicative of whether the respective data point is associated with the investigated entity or the reference entity;
performing logistic regression using said processor upon the augmented data points with the target variable used as a dependent variable in performing the logistic regression;
receiving from the logistic regression a plurality of standardized values of regression coefficients for the submitted variables;
identifying variables whose standardized values exceed a specified threshold and are thereby considered significant;
determining a difference between the standardized value of the investigated entity and the standardized values of the reference entity for at least one of the identified variables in order to compare the investigated entity to the reference entity; and
providing an output based upon a comparison of the investigated entity to the reference entity.
2 Assignments
0 Petitions
Accused Products
Abstract
Sources of operational problems in business transactions often show themselves in relatively small pockets of data, which are called trouble hot spots. Identifying these hot spots from internal company transaction data is generally a fundamental step in the problem'"'"'s resolution, but this analysis process is greatly complicated by huge numbers of transactions and large numbers of transaction variables to analyze. A suite of practical modifications are provided to data mining techniques and logistic regressions to tailor them for finding trouble hot spots. This approach thus allows the use of efficient automated data mining tools to quickly screen large numbers of candidate variables for their ability to characterize hot spots. One application is the screening of variables which distinguish a suspected hot spot from a reference set.
-
Citations
20 Claims
-
1. A processor implemented method of comparing an investigated entity to a reference entity, the method comprising:
-
augmenting a plurality of data points that correspond to a variable or characteristic of the investigated entity or the reference entity by creating a target variable whose value is indicative of whether the respective data point is associated with the investigated entity or the reference entity; performing logistic regression using said processor upon the augmented data points with the target variable used as a dependent variable in performing the logistic regression; receiving from the logistic regression a plurality of standardized values of regression coefficients for the submitted variables; identifying variables whose standardized values exceed a specified threshold and are thereby considered significant; determining a difference between the standardized value of the investigated entity and the standardized values of the reference entity for at least one of the identified variables in order to compare the investigated entity to the reference entity; and providing an output based upon a comparison of the investigated entity to the reference entity. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A computer program product, disposed on a computer readable medium, for comparing an investigated entity to a reference entity, the computer program product comprising instructions for causing a processor to:
-
augment a plurality of data points that correspond to a variable or characteristic of the investigated entity or the reference entity by creating a target variable whose value is indicative of whether the respective data point is associated with the investigated entity or the reference entity; perform logistic regression upon the augmented data points with the target variable used as a dependent variable in performing the logistic regression; receive from the logistic regression a plurality of standardized values of regression coefficients for the submitted variables; identify variables whose standardized values exceed a specified threshold and are thereby considered significant; determine a difference between the standardized values of the investigated entity and the standardized values of the reference entity for at least one of the identified variables in order to compare the investigated entity to the reference entity; and provide an output based upon a comparison of the investigated entity to the reference entity. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
-
Specification