Fuzzy expert system for interpretable rule extraction from neural networks
First Claim
1. A method for interpretable rule extraction from neural networks comprising the steps of:
- a. providing a neural network having a latent variable space and an error rate, said neural network further including a sigmoid activation function having an adjustable gain parameter λ
;
b. iteratively adjusting the adjustable gain parameter λ
to minimize the error rate of the neural network, producing an estimated minimum gain parameter value λ
est;
c. using the estimated minimum gain parameter value λ
est and a set of training data to train the neural network; and
d. projecting the training data onto the latent variable space to generate output clusters having cluster membership levels and cluster centers, with said cluster membership levels being determined as a function of proximity with respect to said cluster centers.
1 Assignment
0 Petitions
Accused Products
Abstract
An method and apparatus for extracting an interpretable, meaningful, and concise rule set from neural networks is presented. The method involves adjustment of gain parameter, λ and the threshold, Tj for the sigmoid activation function of the interactive-or operator used in the extraction/development of a rule set from an artificial neural network. A multi-stage procedure involving coarse and fine adjustment is used in order to constrain the range of the antecedents of the extracted rules to the range of values of the inputs to the artificial neural network. Furthermore, the consequents of the extracted rules are provided based on degree of membership such that they are easily understandable by human beings. The method disclosed may be applied to any pattern recognition task, and is particularly useful in applications such as vehicle occupant sensing and recognition, object recognition, gesture recognition, and facial pattern recognition, among others.
-
Citations
19 Claims
-
1. A method for interpretable rule extraction from neural networks comprising the steps of:
-
a. providing a neural network having a latent variable space and an error rate, said neural network further including a sigmoid activation function having an adjustable gain parameter λ
;
b. iteratively adjusting the adjustable gain parameter λ
to minimize the error rate of the neural network, producing an estimated minimum gain parameter value λ
est;
c. using the estimated minimum gain parameter value λ
est and a set of training data to train the neural network; and
d. projecting the training data onto the latent variable space to generate output clusters having cluster membership levels and cluster centers, with said cluster membership levels being determined as a function of proximity with respect to said cluster centers. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
a. the neural network provided in step a of claim 1 further includes a plurality of inputs and an output, and wherein the latent variable space of the neural network further includes at least one latent variable node having an activation point;
b. the iterative adjustment of adjustable gain parameter λ
in step b of claim 1 is further defined by the sub-steps of;
i. providing a validation data set;
ii. setting an initial gain parameter value λ
init, a current gain parameter value λ
curr, a final gain parameter value λ
final, a gain incrementing value Δ
λ
, and an estimated minimum gain parameter value λ
est;
iii. setting the current gain parameter value λ
curr equal to the initial gain parameter value λ
init;
iv. setting the estimated minimum gain parameter value λ
est equal to the initial gain parameter value λ
init;
v. training the neural network using the current gain parameter value λ
curr to provide a trained neural network;
vi. inputting the validation data set into the trained neural network to generate an output data set;
vii. comparing the output data set generated by the trained neural network to the validation data set to determine the prediction error rate of the trained neural network;
viii. resetting the current gain parameter value λ
curr equal to the current gain parameter value λ
curr plus the gain incrementing value Δ
λ
;
ix. after each repetition of steps v through ix, setting the estimated minimum gain parameter value λ
est equal to whichever of the current value of the estimated minimum gain parameter value λ
est and the current gain parameter value λ
curr generated a lesser prediction error rate; and
x. repeating steps v through ix of the present claim until the current gain parameter value λ
curr is equal to the final gain parameter value λ
final; and
c. the estimated minimum gain parameter value λ
est used to train the neural network is the estimated minimum gain parameter value λ
est resulting after sub-step ix of step b of the present claim; and
d. the projecting of the training data onto the latent variable space of step d of claim 1 is performed to set the activation points of the latent variable nodes to generate output clusters having cluster membership levels and cluster centers, with said cluster membership levels being determined as a function of proximity with respect to said cluster centers.
-
-
3. A method for interpretable rule extraction from neural networks as set forth in claim 2, further including the step of fine-tuning the adjustable gain parameter λ
- by performing, after step b of claim 2, at least one repetition of the sub-steps of;
i. setting the initial gain parameter value λ
init equal to the estimated minimum gain parameter value λ
est minus the gain incrementing value Δ
λ
from step b;
ii. setting the final gain parameter value λ
final, equal to the estimated minimum gain parameter value λ
est plus the gain incrementing value Δ
λ
from step b;
iii. generating a new gain incrementing value Δ
λ
, with the new gain incrementing value Δ
λ
being smaller than the previous gain incrementing value Δ
λ
;
iv. setting the current gain parameter value λ
curr equal to the initial gain parameter value λ
init; and
v. repeating sub-steps iv through ix of step b of claim 2;
vi. using the value of the estimated minimum gain parameter value λ
est resulting from the step of fine-tuning the adjustable gain parameter λ
in step c of claim 1 for training the neural network.
- by performing, after step b of claim 2, at least one repetition of the sub-steps of;
-
4. A method for interpretable rule extraction from neural networks as set forth in claim 1, wherein the neural network provided in step a of claim 1 further includes a plurality i of input nodes Xi for receiving inputs having a plurality N input features and a plurality j of hidden layer nodes Hj with each of the plurality j of hidden layer nodes Hj corresponding to one of a plurality j of rules, with one of a plurality j of rules including a plurality of antecedents A, and the sigmoid activation function f(x) is of the form:
-
where λ
represents the adjustable gain parameter;
Wij represents the weight between the plurality i of input nodes Xi, and a plurality j of hidden layer nodes Hj; and
where each of the plurality of antecedents A of each one of the plurality j of rules is of the form;
where N represents the input features of the inputs i;
λ
est represents the estimated minimum gain parameter value; and
Wij represents the weight between the plurality i of input nodes Xi, and a plurality j of hidden layer nodes Hj.
-
-
5. A method for interpretable rule extraction from neural networks as set forth in claim 1, wherein the clusters and cluster membership levels generated in step d of claim 1 are provided with linguistic labels.
-
6. A method for interpretable rule extraction from neural networks as set forth in claim 3, wherein:
-
a. the sigmoid activation function of the neural network provided in step a of claim 1 further includes an adjustable bias threshold Tj, b. between steps a and c of claim 1, is included the additional step of iteratively adjusting the adjustable bias threshold Tj to minimize the error rate of the neural network, producing an estimated minimum bias threshold Tj,est; and
c. the estimated minimum bias threshold Tj,est is used along with the estimated minimum gain parameter value λ
est in step c of claim 1 to train the neural network.
-
-
7. A method for interpretable rule extraction from neural networks as set forth in claim 6, wherein the clusters and cluster membership levels generated in step d of claim 1 are provided with linguistic labels.
-
8. A method for interpretable rule extraction from neural networks as set forth in claim 6, wherein step b of claim 6 is further defined by the steps of:
-
a. adjusting the adjustable bias threshold Tj by the sub-steps of;
i. setting an initial bias threshold value Tj,init, a current bias parameter value Tj,curr, a final bias parameter value Tj,final, a bias incrementing value Δ
Tj, and an estimated minimum bias parameter value Tj,est;
ii. setting the current bias parameter value Tj,curr equal to the initial bias threshold value Tj,init;
iii. setting the estimated minimum bias parameter value Tj,est equal to the initial bias threshold value Tj,init;
iv. training the neural network using the current bias parameter value Tj,curr to provide a trained neural network;
v. inputting the validation data set into the trained neural network to generate an output data set;
vi. comparing the output data set generated by the trained neural network to the validation data set to determine the prediction error rate of the trained neural network;
vii. resetting the current bias parameter value Tj,curr equal to the current bias parameter value Tj,curr plus the bias incrementing value Δ
Tj;
viii. after each repetition of sub-steps v through vii of step b of the present claim, setting the estimated minimum bias parameter value Tj,est equal to whichever of the current value of the estimated minimum bias parameter value Tj,est and the current bias parameter value Tj,curr generated a lesser prediction error rate; and
ix. repeating sub-steps iv through viii of the present claim until the current bias parameter value Tj,curr is equal to the final bias parameter value Tj,final; and
b. the estimated minimum bias threshold Tj,est used along with the estimated minimum gain parameter value λ
est in step c of claim 1 to train the neural network is that from sub-step viii of the present claim.
-
-
9. A method for interpretable rule extraction from neural networks as set forth in claim 8, further including the step of fine-tuning the adjustable bias threshold Tj by performing, after step a of claim 8, at least one repetition of the sub-steps of:
-
a. setting the initial bias threshold value Tj,init equal to the estimated minimum bias parameter value Tj,est minus the bias incrementing value Δ
Tj from step a of claim 8;
b. setting the final bias parameter value Tj,final, equal to the estimated minimum bias parameter value Tj,est bias incrementing value Δ
Tj from step a of claim 8;
c. generating a new bias incrementing value Δ
Tj, with the new bias incrementing value Δ
Tj being smaller than the previous bias incrementing value Δ
Tj;
d. setting the current bias parameter value Tj,curr equal to the initial bias threshold value Tj,init; and
e. repeating sub-steps iv through viii of step a of claim 8;
f. using the value of the estimated minimum bias parameter value Tj,est from step a of claim 8 along with the estimated minimum gain parameter value λ
est developed in step c of claim 1 to train the neural network provided in step a of claim 1.
-
-
10. A method for interpretable rule extraction from neural networks as set forth in claim 8, wherein the neural network provided in step a of claim 1 further includes a plurality i of input nodes Xi, and a plurality j of hidden layer nodes Hj with each of the plurality j of hidden layer nodes Hj corresponding to one of a plurality j rules, with one of a plurality j rules including a plurality of antecedents A, and the sigmoid activation function f(x) is of the form:
-
where λ
represents the adjustable gain parameter, Wij represents the weight between the plurality i of input nodes Xi, and a plurality j of hidden layer nodes Hj; and
where Tj represents the adjustable bias threshold; and
where each of the plurality of antecedents A of each rule is of the form;
where Tj,est represents the adjustable bias threshold, where N represents the input features of the inputs;
λ
est represents the estimated minimum gain parameter value λ
est; and
Wij represents the weight between the plurality i of input nodes Xi, and a plurality j of hidden layer nodes Hj.
-
-
11. A method for interpretable rule extraction from neural networks as set forth in claim 10, wherein the output clusters and cluster membership levels generated in step d of claim 1 are provided with linguistic labels.
-
12. A fuzzy rule set developed by the method of claim 1.
-
13. A fuzzy rule set developed by the method of claim 5.
-
14. A fuzzy rule set developed by the method of claim 6.
-
15. A fuzzy rule set developed by the method of claim 7.
-
16. An apparatus for interpretable rule extraction from neural networks comprising:
-
a. a neural network having a latent variable space and an error rate, said neural network further including a sigmoid activation function having an adjustable gain parameter λ
, with the gain parameter λ
iteratively adjusted to minimize the error rate of the neural network, and to produce an estimated minimum gain parameter value λ
est;
b. a set of training data used, along with the estimated minimum gain parameter value λ
est, to train the neural network; and
c. output clusters generated by projection of the training data set onto the latent variable space of the neural network, each of said output clusters having cluster membership levels and cluster centers with the cluster membership levels determined as a function of proximity with respect to the cluster centers. - View Dependent Claims (17, 18, 19)
-
Specification