METHOD AND COMPUTER PROGRAM PRODUCT FOR USING DATA MINING TOOLS TO AUTOMATICALLY COMPARE AN INVESTIGATED UNIT AND A BENCHMARK UNIT

US 20090112917A1
Filed: 10/15/2008
Published: 04/30/2009
Est. Priority Date: 12/05/2005
Status: Active Grant

First Claim

Patent Images

1. A method of comparing an investigated entity to a reference entity, the method comprising:

augmenting a plurality of data points that correspond to a variable or characteristic of the investigated entity or the reference entity by creating a target variable whose value is indicative if whether the respective data point is associated with the investigated entity or the reference entity;

performing logistic regression upon the augmented data points with the target variable used as a dependent variable in performing the logistic regression;

receiving from the logistic regression a plurality of standardized values of regression coefficients for the submitted variables; and

identifying variables whose standardized values exceed a specified threshold and are thereby considered significant.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Sources of operational problems in business transactions often show themselves in relatively small pockets of data, which are called trouble hot spots. Identifying these hot spots from internal company transaction data is generally a fundamental step in the problem'"'"'s resolution, but this analysis process is greatly complicated by huge numbers of transactions and large numbers of transaction variables to analyze. A suite of practical modifications are provided to data mining techniques and logistic regressions to tailor them for finding trouble hot spots. This approach thus allows the use of efficient automated data mining tools to quickly screen large numbers of candidate variables for their ability to characterize hot spots. One application is the screening of variables which distinguish a suspected hot spot from a reference set.

14 Citations

View as Search Results

21 Claims

1. A method of comparing an investigated entity to a reference entity, the method comprising:
- augmenting a plurality of data points that correspond to a variable or characteristic of the investigated entity or the reference entity by creating a target variable whose value is indicative if whether the respective data point is associated with the investigated entity or the reference entity;
  
  performing logistic regression upon the augmented data points with the target variable used as a dependent variable in performing the logistic regression;
  
  receiving from the logistic regression a plurality of standardized values of regression coefficients for the submitted variables; and
  
  identifying variables whose standardized values exceed a specified threshold and are thereby considered significant.

2. A method, comprising:
- receiving a diagnostic data set comprising a plurality of observational values for a plurality of diagnostic variables corresponding to an investigated unit and a benchmark unit;
  
  determining a plurality of logistic regression coefficients based on the diagnostic data set, each logistic regression coefficient corresponding to at least one diagnostic variable of the plurality of diagnostic variables; and
  
  selecting a subset of most significant diagnostic variables from the plurality of diagnostic variables based on the plurality of logistic regression coefficients in reference to significance criteria.
- View Dependent Claims (3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
- - 3. The method of claim 2, further comprising:
    - generating at least one interaction variable between elements of the subset of most significant diagnostic variables.
  - 4. The method of claim 3, wherein the interaction variable is generated as a cross product of elements of the subset of most significant diagnostic variables.
  - 5. The method of claim 2, further comprising:
    - extracting the plurality of diagnostic variables from a superset of diagnostic variables based on a decision tree analysis.
  - 6. The method of claim 5, wherein the decision tree analysis is only performed when the number of elements of the superset of diagnostic variables exceeds a threshold.
  - 7. The method of claim 2, wherein the selecting a subset of most significant diagnostic variables further comprises:
    - deriving a plurality of test statistic values corresponding to each of the plurality of logistic regression coefficients; and
      
      wherein the selecting a subset of most significant diagnostic variables is based on the plurality of test statistic values in reference to the significance criteria.
  - 8. The method of claim 7, wherein the significance criteria is based on an inspection of relative magnitudes of the plurality of test statistic values.
  - 9. The method of claim 7, wherein the significance criteria is based on comparison of each of the plurality of test statistic values to a threshold level.
  - 10. The method of claim 7, wherein the test statistic values comprise t-values.
  - 11. The method of claim 10, wherein the significance criteria is based on comparison of each t-value with corresponding values in a look-up table.
  - 12. The method of claim 7, wherein the test statistic values comprise p-values derived from t-values.
  - 13. The method of claim 12, wherein a particular diagnostic variable is selected as a most significant diagnostic variable if a corresponding p-value is less than 5%.
  - 14. The method of claim 2, further comprising:
    - identifying the investigated unit as an aberrant unit based on observed values for the most significant diagnostic variables.
  - 15. The method of claim 14, wherein identifying the investigated unit is based on a probability determined from the observed values for the most significant diagnostic variables.
  - 16. The method of claim 2, wherein determining a plurality of logistic regression coefficients further comprises:
    - separating the diagnostic data set into first and second diagnostic data sets depending on whether the plurality of observational values contained therein correspond to the investigated unit or to the benchmark unit respectively;
      
      deriving first and second mean value sets from the first and second diagnostic data sets respectively;
      
      determining a covariance matrix between the first and second diagnostic data sets; and
      
      determining the plurality of logistic regression coefficients based on a product of an inverse of the covariance matrix and a difference between the first and second mean value sets.
  - 17. The method of claim 16, further comprising:
    - trimming outliers in the covariance matrix; and
      
      transforming the covariance matrix to near symmetry.
  - 18. The method of claim 2, wherein the determining a plurality of logistic regression coefficients further comprises:
    - creating a target variable whose value is one if a data point is associated with the investigated unit and zero otherwise; and
      
      submitting the diagnostic data set to a logistic regression component of a data mining module, with the target variable designated as a dependent variable.
  - 19. The method of claim 2, wherein the plurality of diagnostic variables include at least a repair time, a repair type, a repair location, and a time of day.

20. A method, comprising:
- receiving a diagnostic data set comprising a plurality of observational values for a plurality of diagnostic variables corresponding to an investigated unit and a benchmark unit;
  
  separating the diagnostic data set into first and second diagnostic data sets depending on whether the plurality of observational values contained therein corresponds to the investigated unit or to the benchmark unit respectively;
  
  deriving first and second mean value sets from the first and second diagnostic data sets respectively;
  
  determining a covariance between the first and second diagnostic data sets;
  
  determining a plurality of logistic regression coefficients based on a product of an inverse of the covariance matrix and a difference between the first and second mean value sets;
  
  deriving a plurality of t-values values corresponding to each of the plurality of logistic regression coefficients;
  
  selecting a subset of most significant diagnostic variables from the plurality of diagnostic variables based on the plurality of t-values in reference to significance criteria; and
  
  generating at least one interaction variable as a cross product between elements of the subset of most significant diagnostic variables.

21. A computer program product, comprising:
- processor readable instructions stored in the computer program product, wherein the processor readable instructions are issuable by a processor to;
  
  receive a diagnostic data set comprising a plurality of observational values for a plurality of diagnostic variables corresponding to an investigated unit and a benchmark unit;
  
  determine a plurality of logistic regression coefficients based on the diagnostic data set, each logistic regression coefficient corresponding to at least one diagnostic variable of the plurality of diagnostic variables; and
  
  select a subset of most significant diagnostic variables from the plurality of diagnostic variables based on the plurality of logistic regression coefficients in reference to significance criteria.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Verizon Patent and Licensing Incorporated (Verizon Communications Inc.)
Original Assignee
Verizon Services Corporation (Verizon Communications Inc.)
Inventors
Drew, James Howard

Granted Patent

US 7,970,785 B2
Time in Patent Office

Days
Field of Search
US Class Current

1/1
CPC Class Codes

G06Q 10/0639   Performance analysis of emp...

G06Q 90/00   Systems or methods speciall...

Y10S 707/99936   Pattern matching access

METHOD AND COMPUTER PROGRAM PRODUCT FOR USING DATA MINING TOOLS TO AUTOMATICALLY COMPARE AN INVESTIGATED UNIT AND A BENCHMARK UNIT

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

14 Citations

21 Claims

Specification

Solutions

Use Cases

Quick Links

METHOD AND COMPUTER PROGRAM PRODUCT FOR USING DATA MINING TOOLS TO AUTOMATICALLY COMPARE AN INVESTIGATED UNIT AND A BENCHMARK UNIT

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

14 Citations

21 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links