Analysis of data in cause and effect relationships
First Claim
1. In conjunction with a repeated process wherein a plurality of independent input process variables results in a dependent output variable having either of exactly two outcomes, a method for implementation in a computer for evaluating a data set which comprises a plurality of records each corresponding to a single operation of the process and each record including the respective values of the independent variables and the outcome of the dependent variable for that single operation of the process, wherein each independent variable can be either numeric or categoric, the method determining a combination of a specific number of the independent variables and boundaries defining an included region of values for each of said specific number of independent variables which most likely results in a specific outcome of the dependent variable, the method comprising the steps of:
- a) for each independent numeric variablea1) determining its range of values;
a2) selecting an initial boundary within the determined range;
a3) calculating a score where the included region is on each side of the initial boundary, wherein each said score is a measure of the frequency of occurrence of the specific outcome of the dependent variable when said each independent numeric variable has a value in the respective included region;
a4) selecting the side of the initial boundary which resulted in the higher score to define an initial included region of values for said each independent numeric variable;
a5) iteratively adjusting the boundary of the included region so as to alter the size of the included region and calculating the score based upon the altered included region for each boundary adjustment; and
a6) selecting as the final boundary that boundary which provided the highest score;
b) for each independent categoric variableb1) calculating a score for each value of said each independent categoric variable, wherein each said score is a measure of the frequency of occurrence of the specific outcome of the dependent variable when said each independent categoric variable has said each value; and
b2) selecting that value which provided the highest score;
c) ranking all the independent variables in order of their scores;
d) identifying the specific number of the independent variables which have the highest scores; and
e) providing as an output a list of the identified independent variables ande1) the included region identified by the final boundary for each independent numeric variable; and
e2) the selected value for each independent categoric variable.
0 Assignments
0 Petitions
Accused Products
Abstract
A method for analyzing a data set and determining the independent input variables and the values of those variables which are most associated with a specific outcome. Independent and dependent variables may be either numeric (continuous) or categoric (discrete); numeric variables need not be of a specific distribution type. First, each individual independent variable is ranked based on a score. Scoring is done by first determining the number of records in the data set having each of four possible conditions--independent variable in or out of range in combination with dependent variable in or out of range. These values are put into an equation. Iterative processes are used until a high score is found. Subsequently, combinations of variables and values of independent variables are evaluated using the score to determine the combinations most likely to be associated with a specific outcome or range of values of the dependent variable. A use for this method is the determination of manufacturing variables and their values which tend to result in unacceptable product.
-
Citations
18 Claims
-
1. In conjunction with a repeated process wherein a plurality of independent input process variables results in a dependent output variable having either of exactly two outcomes, a method for implementation in a computer for evaluating a data set which comprises a plurality of records each corresponding to a single operation of the process and each record including the respective values of the independent variables and the outcome of the dependent variable for that single operation of the process, wherein each independent variable can be either numeric or categoric, the method determining a combination of a specific number of the independent variables and boundaries defining an included region of values for each of said specific number of independent variables which most likely results in a specific outcome of the dependent variable, the method comprising the steps of:
-
a) for each independent numeric variable a1) determining its range of values; a2) selecting an initial boundary within the determined range; a3) calculating a score where the included region is on each side of the initial boundary, wherein each said score is a measure of the frequency of occurrence of the specific outcome of the dependent variable when said each independent numeric variable has a value in the respective included region; a4) selecting the side of the initial boundary which resulted in the higher score to define an initial included region of values for said each independent numeric variable; a5) iteratively adjusting the boundary of the included region so as to alter the size of the included region and calculating the score based upon the altered included region for each boundary adjustment; and a6) selecting as the final boundary that boundary which provided the highest score; b) for each independent categoric variable b1) calculating a score for each value of said each independent categoric variable, wherein each said score is a measure of the frequency of occurrence of the specific outcome of the dependent variable when said each independent categoric variable has said each value; and b2) selecting that value which provided the highest score; c) ranking all the independent variables in order of their scores; d) identifying the specific number of the independent variables which have the highest scores; and e) providing as an output a list of the identified independent variables and e1) the included region identified by the final boundary for each independent numeric variable; and e2) the selected value for each independent categoric variable. - View Dependent Claims (2, 3, 4, 5)
-
-
6. In conjunction with a repeated process wherein a plurality of independent input process variables results in a dependent output variable having either of exactly two outcomes, a method for implementation in a computer for evaluating a data set which comprises a plurality of records each corresponding to a single operation of the process and each record including the respective values of the independent variables and the outcome of the dependent variable for that single operation of the process, wherein each independent variable can be either numeric or categoric, the method determining a combination of a specific number of the independent variables and boundaries defining an included region of values for each of said specific number of independent variables which most likely results in a specific outcome of the dependent variable, the method comprising the steps of:
-
a) for each independent numeric variable a1) determining its range of values; a2) selecting an initial boundary within the determined range; a3) calculating a score where the included region is on each side of the initial boundary, wherein each said score is a measure of the frequency of occurrence of the specific outcome of the dependent variable when said each independent numeric variable has a value in the respective included region; a4) selecting the side of the initial boundary which resulted in the higher score to define an initial included region of values for said each independent numeric variable; a5) iteratively adjusting the boundary of the included region so as to alter the size of the included region and calculating the score based upon the altered included region for each boundary adjustment; and a6) selecting as the final boundary that boundary which provided the highest score; b) for each independent categoric variable b1) calculating a score for each value of said each independent categoric variable, wherein each said score is a measure of the frequency of occurrence of the specific outcome of the dependent variable when said each independent categoric variable has said each value; and b2) selecting that value which provided the highest score; c) selecting combinations of the specific number of the independent variables; d) for each selected combination iteratively adjusting the final boundaries of the numeric independent variables and calculating the score for the combination of independent variables and values which define the included region for each boundary adjustment to optimize the score; and e) providing as an output a list of the independent variables in the combination which provided the highest score and e1) the included region identified by the final boundary for each independent numeric variable in the highest scoring combination; and e2) the selected value for each independent categoric variable in the highest scoring combination. - View Dependent Claims (7)
-
-
8. A method implementable in a computer for controlling a manufacturing process to increase the likelihood of occurrence of a specific one of exactly two outcomes of the process if said specific outcome is acceptable or to decrease the likelihood of occurrence of said specific one of exactly two outcomes of the process if said specific outcome is unacceptable, wherein for each operation of the process a plurality of independent input variables results in a dependent output variable having either of said exactly two outcomes, the method determining a combination of a specific number of the independent variables and boundaries defining an included region of values for each of said specific number of independent variables which most likely results in said specific outcome of the dependent variable, the method comprising the steps of:
-
a) generating a data set which comprises a plurality of records each corresponding to a single operation of the process and each record including the respective values of the independent variables and the outcome of the dependent variable for that single operation of the process, wherein each independent variable can be either numeric or categoric; b) for each independent numeric variable b1) determining its range of values; b2) selecting an initial boundary within the determined range; b3) calculating a score where the included region is on each side of the initial boundary, wherein each said score is a measure of the frequency of occurrence of the specific outcome of the dependent variable when said each independent numeric variable has a value in the respective included region; b4) selecting the side of the initial boundary which resulted in the higher score to define an initial included region of values for said each independent numeric variable; b5) iteratively adjusting the boundary of the included region so as to alter the size of the included region and calculating the score based upon the altered included region for each boundary adjustment; and b6) selecting as the final boundary that boundary which provided the highest score; c) for each independent categoric variable c1) calculating a score for each value of said each independent categoric variable, wherein each said score is a measure of the frequency of occurrence of the specific outcome of the dependent variable when said each independent categoric variable has said each value; and c2) selecting that value which provided the highest score; d) ranking all the independent variables in order of their scores; e) identifying the specific number of the independent variables which have the highest scores; and f) controlling the process f1) if said specific outcome is acceptable by limiting each identified independent numeric variable to the included region identified by the final boundary for that independent numeric variable and by limiting each independent categoric variable to the selected value for that independent categoric variable;
orf2) if said specific outcome is unacceptable by limiting each identified independent numeric variable to be outside the included region identified by the final boundary for that independent numeric variable and by limiting each identified independent categoric variable to not have the selected value for that independent categoric variable. - View Dependent Claims (9, 10, 11, 12)
-
-
13. A method for implementation in a computer to evaluate a data set for a repeated process, said data set comprising a plurality of records each corresponding to a respective operation of the process and containing the values of a plurality of input independent variables and the resultant one of exactly two possible outcomes of a dependent output variable, the method comprising the steps of:
-
a) inputting said data set into a work area of the computer memory; b) iteratively calculating a single variable score for a value range for each of said independent variables in which each said single variable score is proportional to a frequency of occurrence of a specific outcome of said dependent variable and adjusting the value range to optimize the calculated score; c) iteratively calculating a double variable score for a pair of value ranges for each combination of two of said independent variables in which each said double variable score is proportional to a frequency of occurrence of said specific outcome of said dependent variable and adjusting the pair of value ranges to optimize the calculated score; d) iteratively calculating a triple variable score for a trio of value ranges for each combination of three of said independent variables in which each said triple variable score is proportional to a frequency of occurrence of said specific outcome of said dependent variable and adjusting the trio of value ranges to optimize the calculated score; e) determining the largest said single variable score and the associated one of said independent variables; f) determining the largest said double variable score and the associated two of said independent variables; g) determining the largest said triple variable score and the associated three of said independent variables; and h) outputting said scores and associated said value ranges of said independent variables; whereby said process can be controlled by varying said value ranges of said independent variables to vary the likelihood of occurrence of said specific outcome of said dependent variable. - View Dependent Claims (14, 15, 16)
-
-
17. In conjunction with a repeated process wherein a plurality of independent input process variables results in a dependent output variable having either of exactly two outcomes, a method for implementation in a computer for evaluating a data set which comprises a plurality of records each corresponding to a single operation of the process and each record including the respective values of the independent variables and the outcome of the dependent variable for that single operation of the process, wherein each independent variable can be either numeric or categoric, the method determining a combination of a specific number of the independent variables and boundaries defining an included region of values for each of said specific number of independent variables which most likely results in a specific outcome of the dependent variable, the method comprising the steps of:
-
a) for each independent numeric variable a1) determining its range of values; a2) selecting a pair of initial boundaries within the determined range; a3) calculating a score where the included region is between the pair of initial boundaries, wherein said score is a measure of the frequency of occurrence of the specific outcome of the dependent variable when said each independent numeric variable has a value in the respective included region; a4) iteratively adjusting the pair of boundaries within the range so as to alter the size of the included region and calculating the score based upon the altered included region for each boundary adjustment; and a5) selecting as the final boundaries that pair of boundaries which provided the highest score; b) for each independent categoric variable b1) calculating a score for each value of said each independent categoric variable, wherein each said score is a measure of the frequency of occurrence of the specific outcome of the dependent variable when said each independent categoric variable has said each value; and b2) selecting that value which provided the highest score; c) selecting combinations of the specific number of the independent variables; d) for each selected combination iteratively adjusting the final pair of boundaries of the numeric independent variables and calculating the score for the combination of independent variables and values which define the included region for each boundary adjustment to optimize the score; and e) providing as an output a list of the independent variables in the combination which provided the highest score and e1) the included region identified by the final pair of boundaries for each independent numeric variable in the highest scoring combination; and e2) the selected value for each independent categoric variable in the highest scoring combination. - View Dependent Claims (18)
-
Specification