Analysis of data in cause and effect relationships

US 5,850,339 A
Filed: 10/31/1996
Issued: 12/15/1998
Est. Priority Date: 10/31/1996
Status: Expired due to Fees

First Claim

Patent Images

1. In conjunction with a repeated process wherein a plurality of independent input process variables results in a dependent output variable having either of exactly two outcomes, a method for implementation in a computer for evaluating a data set which comprises a plurality of records each corresponding to a single operation of the process and each record including the respective values of the independent variables and the outcome of the dependent variable for that single operation of the process, wherein each independent variable can be either numeric or categoric, the method determining a combination of a specific number of the independent variables and boundaries defining an included region of values for each of said specific number of independent variables which most likely results in a specific outcome of the dependent variable, the method comprising the steps of:

a) for each independent numeric variablea1) determining its range of values;

a2) selecting an initial boundary within the determined range;

a3) calculating a score where the included region is on each side of the initial boundary, wherein each said score is a measure of the frequency of occurrence of the specific outcome of the dependent variable when said each independent numeric variable has a value in the respective included region;

a4) selecting the side of the initial boundary which resulted in the higher score to define an initial included region of values for said each independent numeric variable;

a5) iteratively adjusting the boundary of the included region so as to alter the size of the included region and calculating the score based upon the altered included region for each boundary adjustment; and

a6) selecting as the final boundary that boundary which provided the highest score;

b) for each independent categoric variableb1) calculating a score for each value of said each independent categoric variable, wherein each said score is a measure of the frequency of occurrence of the specific outcome of the dependent variable when said each independent categoric variable has said each value; and

b2) selecting that value which provided the highest score;

c) ranking all the independent variables in order of their scores;

d) identifying the specific number of the independent variables which have the highest scores; and

e) providing as an output a list of the identified independent variables ande1) the included region identified by the final boundary for each independent numeric variable; and

e2) the selected value for each independent categoric variable.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method for analyzing a data set and determining the independent input variables and the values of those variables which are most associated with a specific outcome. Independent and dependent variables may be either numeric (continuous) or categoric (discrete); numeric variables need not be of a specific distribution type. First, each individual independent variable is ranked based on a score. Scoring is done by first determining the number of records in the data set having each of four possible conditions--independent variable in or out of range in combination with dependent variable in or out of range. These values are put into an equation. Iterative processes are used until a high score is found. Subsequently, combinations of variables and values of independent variables are evaluated using the score to determine the combinations most likely to be associated with a specific outcome or range of values of the dependent variable. A use for this method is the determination of manufacturing variables and their values which tend to result in unacceptable product.

Citations

18 Claims

1. In conjunction with a repeated process wherein a plurality of independent input process variables results in a dependent output variable having either of exactly two outcomes, a method for implementation in a computer for evaluating a data set which comprises a plurality of records each corresponding to a single operation of the process and each record including the respective values of the independent variables and the outcome of the dependent variable for that single operation of the process, wherein each independent variable can be either numeric or categoric, the method determining a combination of a specific number of the independent variables and boundaries defining an included region of values for each of said specific number of independent variables which most likely results in a specific outcome of the dependent variable, the method comprising the steps of:
- a) for each independent numeric variablea1) determining its range of values;
  
  a2) selecting an initial boundary within the determined range;
  
  a3) calculating a score where the included region is on each side of the initial boundary, wherein each said score is a measure of the frequency of occurrence of the specific outcome of the dependent variable when said each independent numeric variable has a value in the respective included region;
  
  a4) selecting the side of the initial boundary which resulted in the higher score to define an initial included region of values for said each independent numeric variable;
  
  a5) iteratively adjusting the boundary of the included region so as to alter the size of the included region and calculating the score based upon the altered included region for each boundary adjustment; and
  
  a6) selecting as the final boundary that boundary which provided the highest score;
  
  b) for each independent categoric variableb1) calculating a score for each value of said each independent categoric variable, wherein each said score is a measure of the frequency of occurrence of the specific outcome of the dependent variable when said each independent categoric variable has said each value; and
  
  b2) selecting that value which provided the highest score;
  
  c) ranking all the independent variables in order of their scores;
  
  d) identifying the specific number of the independent variables which have the highest scores; and
  
  e) providing as an output a list of the identified independent variables ande1) the included region identified by the final boundary for each independent numeric variable; and
  
  e2) the selected value for each independent categoric variable.
- View Dependent Claims (2, 3, 4, 5)
- - 2. The method according to claim 1 wherein each score is calculated as:
    - A times B times (1-C);
      
      where;
      
      A is the fraction of all records within the included region which have the specific outcome;
      
      B is the fraction of all records having the specific outcome which are in the included region; and
      
      C is the fraction of all records not in the included region which have the specific outcome.
  - 3. The method according to claim 1 wherein when all identified independent variables are numeric, the method comprises the further step of:
    - f) iteratively adjusting the final boundaries of all identified independent variables and calculating the score for the combination of independent variables and values which define the included region for each boundary adjustment to optimize the score.
  - 4. The method according to claim 1 wherein when the identified independent variables are both numeric and categoric, the method comprises the further step of:
    - f) iteratively adjusting the final boundaries of all identified numeric independent variables and calculating the score for the combination of independent variables and values which define the included region for each boundary adjustment to optimize the score.
  - 5. The method according to claim 1 wherein when all identified independent variables are categoric, the method comprises the further steps of:
    - f) determining the number of records with the specific outcome of the dependent variable for all combinations of identified independent variables and values;
      
      g) calculating a score for each combination; and
      
      h) selecting the combination having the highest score.

6. In conjunction with a repeated process wherein a plurality of independent input process variables results in a dependent output variable having either of exactly two outcomes, a method for implementation in a computer for evaluating a data set which comprises a plurality of records each corresponding to a single operation of the process and each record including the respective values of the independent variables and the outcome of the dependent variable for that single operation of the process, wherein each independent variable can be either numeric or categoric, the method determining a combination of a specific number of the independent variables and boundaries defining an included region of values for each of said specific number of independent variables which most likely results in a specific outcome of the dependent variable, the method comprising the steps of:
- a) for each independent numeric variablea1) determining its range of values;
  
  a2) selecting an initial boundary within the determined range;
  
  a3) calculating a score where the included region is on each side of the initial boundary, wherein each said score is a measure of the frequency of occurrence of the specific outcome of the dependent variable when said each independent numeric variable has a value in the respective included region;
  
  a4) selecting the side of the initial boundary which resulted in the higher score to define an initial included region of values for said each independent numeric variable;
  
  a5) iteratively adjusting the boundary of the included region so as to alter the size of the included region and calculating the score based upon the altered included region for each boundary adjustment; and
  
  a6) selecting as the final boundary that boundary which provided the highest score;
  
  b) for each independent categoric variableb1) calculating a score for each value of said each independent categoric variable, wherein each said score is a measure of the frequency of occurrence of the specific outcome of the dependent variable when said each independent categoric variable has said each value; and
  
  b2) selecting that value which provided the highest score;
  
  c) selecting combinations of the specific number of the independent variables;
  
  d) for each selected combination iteratively adjusting the final boundaries of the numeric independent variables and calculating the score for the combination of independent variables and values which define the included region for each boundary adjustment to optimize the score; and
  
  e) providing as an output a list of the independent variables in the combination which provided the highest score ande1) the included region identified by the final boundary for each independent numeric variable in the highest scoring combination; and
  
  e2) the selected value for each independent categoric variable in the highest scoring combination.
- View Dependent Claims (7)
- - 7. The method according to claim 6 wherein each score is calculated as:
    - A times B times (1-C);
      
      where;
      
      A is the fraction of all records within the included region which have the specific outcome;
      
      B is the fraction of all records having the specific outcome which are in the included region; and
      
      C is the fraction of all records not in the included region which have the specific outcome.

8. A method implementable in a computer for controlling a manufacturing process to increase the likelihood of occurrence of a specific one of exactly two outcomes of the process if said specific outcome is acceptable or to decrease the likelihood of occurrence of said specific one of exactly two outcomes of the process if said specific outcome is unacceptable, wherein for each operation of the process a plurality of independent input variables results in a dependent output variable having either of said exactly two outcomes, the method determining a combination of a specific number of the independent variables and boundaries defining an included region of values for each of said specific number of independent variables which most likely results in said specific outcome of the dependent variable, the method comprising the steps of:
- a) generating a data set which comprises a plurality of records each corresponding to a single operation of the process and each record including the respective values of the independent variables and the outcome of the dependent variable for that single operation of the process, wherein each independent variable can be either numeric or categoric;
  
  b) for each independent numeric variableb1) determining its range of values;
  
  b2) selecting an initial boundary within the determined range;
  
  b3) calculating a score where the included region is on each side of the initial boundary, wherein each said score is a measure of the frequency of occurrence of the specific outcome of the dependent variable when said each independent numeric variable has a value in the respective included region;
  
  b4) selecting the side of the initial boundary which resulted in the higher score to define an initial included region of values for said each independent numeric variable;
  
  b5) iteratively adjusting the boundary of the included region so as to alter the size of the included region and calculating the score based upon the altered included region for each boundary adjustment; and
  
  b6) selecting as the final boundary that boundary which provided the highest score;
  
  c) for each independent categoric variablec1) calculating a score for each value of said each independent categoric variable, wherein each said score is a measure of the frequency of occurrence of the specific outcome of the dependent variable when said each independent categoric variable has said each value; and
  
  c2) selecting that value which provided the highest score;
  
  d) ranking all the independent variables in order of their scores;
  
  e) identifying the specific number of the independent variables which have the highest scores; and
  
  f) controlling the processf1) if said specific outcome is acceptable by limiting each identified independent numeric variable to the included region identified by the final boundary for that independent numeric variable and by limiting each independent categoric variable to the selected value for that independent categoric variable;
  
  orf2) if said specific outcome is unacceptable by limiting each identified independent numeric variable to be outside the included region identified by the final boundary for that independent numeric variable and by limiting each identified independent categoric variable to not have the selected value for that independent categoric variable.
- View Dependent Claims (9, 10, 11, 12)
- - 9. The method according to claim 8 wherein each score is calculated as:
    - A times B times (1-C);
      
      where;
      
      A is the fraction of all records within the included region which have the specific outcome;
      
      B is the fraction of all records having the specific outcome which are in the included region; and
      
      C is the fraction of all records not in the included region which have the specific outcome.
  - 10. The method according to claim 8 wherein when all identified independent variables are numeric, the method comprises the further step of:
    - g) iteratively adjusting the final boundaries of all identified independent variables and calculating the score for the combination of independent variables and values which define the included region for each boundary adjustment to optimize the score.
  - 11. The method according to claim 8 wherein when the identified independent variables are both numeric and categoric, the method comprises the further step of:
    - g) iteratively adjusting the final boundaries of all identified numeric independent variables and calculating the score for the combination of independent variables and values which define the included region for each boundary adjustment to optimize the score.
  - 12. The method according to claim 8 wherein when all identified independent variables are categoric, the method comprises the further steps of:
    - g) determining the number of records with the specific outcome of the dependent variable for all combinations of identified independent variables and values;
      
      h) calculating a score for each combination; and
      
      i) selecting the combination having the highest score.

13. A method for implementation in a computer to evaluate a data set for a repeated process, said data set comprising a plurality of records each corresponding to a respective operation of the process and containing the values of a plurality of input independent variables and the resultant one of exactly two possible outcomes of a dependent output variable, the method comprising the steps of:
- a) inputting said data set into a work area of the computer memory;
  
  b) iteratively calculating a single variable score for a value range for each of said independent variables in which each said single variable score is proportional to a frequency of occurrence of a specific outcome of said dependent variable and adjusting the value range to optimize the calculated score;
  
  c) iteratively calculating a double variable score for a pair of value ranges for each combination of two of said independent variables in which each said double variable score is proportional to a frequency of occurrence of said specific outcome of said dependent variable and adjusting the pair of value ranges to optimize the calculated score;
  
  d) iteratively calculating a triple variable score for a trio of value ranges for each combination of three of said independent variables in which each said triple variable score is proportional to a frequency of occurrence of said specific outcome of said dependent variable and adjusting the trio of value ranges to optimize the calculated score;
  
  e) determining the largest said single variable score and the associated one of said independent variables;
  
  f) determining the largest said double variable score and the associated two of said independent variables;
  
  g) determining the largest said triple variable score and the associated three of said independent variables; and
  
  h) outputting said scores and associated said value ranges of said independent variables;
  
  whereby said process can be controlled by varying said value ranges of said independent variables to vary the likelihood of occurrence of said specific outcome of said dependent variable.
- View Dependent Claims (14, 15, 16)
- - 14. The method according to claim 13 wherein said independent variables can be either numeric or categoric and said dependent variable is categoric.
  - 15. The method according to claim 14 wherein said value ranges of said independent variables which are numeric are determined by iteratively adjusting the ends of said value ranges while calculating said single variable score and while calculating said double variable score and while calculating said triple variable score to thereby achieve the approximately highest value of said single variable score, said double variable score and said triple variable score.
  - 16. The method according to claim 15 wherein said double variable score is calculated using those two independent variables having the two highest single variable scores and said triple variable score is calculated using those three independent variables having the three highest single variable scores, whereby computational efficiency is achieved by calculating said double variable score and said triple variable score only for those combinations of independent variables most likely to result in relatively higher scores.

17. In conjunction with a repeated process wherein a plurality of independent input process variables results in a dependent output variable having either of exactly two outcomes, a method for implementation in a computer for evaluating a data set which comprises a plurality of records each corresponding to a single operation of the process and each record including the respective values of the independent variables and the outcome of the dependent variable for that single operation of the process, wherein each independent variable can be either numeric or categoric, the method determining a combination of a specific number of the independent variables and boundaries defining an included region of values for each of said specific number of independent variables which most likely results in a specific outcome of the dependent variable, the method comprising the steps of:
- a) for each independent numeric variablea1) determining its range of values;
  
  a2) selecting a pair of initial boundaries within the determined range;
  
  a3) calculating a score where the included region is between the pair of initial boundaries, wherein said score is a measure of the frequency of occurrence of the specific outcome of the dependent variable when said each independent numeric variable has a value in the respective included region;
  
  a4) iteratively adjusting the pair of boundaries within the range so as to alter the size of the included region and calculating the score based upon the altered included region for each boundary adjustment; and
  
  a5) selecting as the final boundaries that pair of boundaries which provided the highest score;
  
  b) for each independent categoric variableb1) calculating a score for each value of said each independent categoric variable, wherein each said score is a measure of the frequency of occurrence of the specific outcome of the dependent variable when said each independent categoric variable has said each value; and
  
  b2) selecting that value which provided the highest score;
  
  c) selecting combinations of the specific number of the independent variables;
  
  d) for each selected combination iteratively adjusting the final pair of boundaries of the numeric independent variables and calculating the score for the combination of independent variables and values which define the included region for each boundary adjustment to optimize the score; and
  
  e) providing as an output a list of the independent variables in the combination which provided the highest score ande1) the included region identified by the final pair of boundaries for each independent numeric variable in the highest scoring combination; and
  
  e2) the selected value for each independent categoric variable in the highest scoring combination.
- View Dependent Claims (18)
- - 18. The method according to claim 17 wherein each score is calculated as:
    - A times B times (1-C);
      
      where;
      
      A is the fraction of all records within the included region which have the specific outcome;
      
      B is the fraction of all records having the specific outcome which are in the included region; and
      
      C is the fraction of all records not in the included region which have the specific outcome.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Philip M. Giles
Original Assignee
Philip M. Giles
Inventors
Giles, Philip M.
Primary Examiner(s)
Elmore, Reba I.
Assistant Examiner(s)
PATEL, RAMESH B

Application Number

US08/741,717
Time in Patent Office

775 Days
Field of Search

364/191, 364/148, 364/151, 364/153, 364/158, 364/164, 364/194, 364/145, 364/148.06, 364/147, 364/152, 364/154, 364/411, 395/13, 395/11, 395/51, 395/600
US Class Current

700/52
CPC Class Codes

G05B 17/02 electric

G05B 23/0205 by means of a monitoring sy...

Analysis of data in cause and effect relationships

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

Analysis of data in cause and effect relationships

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links