Protecting cognitive systems from gradient based attacks through the use of deceiving gradients
First Claim
1. A method, in a data processing system comprising a processor and a memory, the memory comprising instructions which are executed by the processor to specifically configure the processor to implement a hardened neural network, the method comprising:
- configuring the hardened neural network executing in the data processing system to introduce noise in internal feature representations of the hardened neural network, wherein the noise introduced in the internal feature representations diverts gradient computations associated with a loss surface of the hardened neural network;
configuring the hardened neural network executing in the data processing system to implement a merge layer of nodes that combine outputs of adversarially trained output nodes of the hardened neural network with output nodes of the hardened neural network trained based on the introduced noise;
receiving, by the hardened neural network, input data for classification by the hardened neural network;
processing, by the hardened neural network, the input data to generate classification labels for the input data and thereby generate augmented input data; and
outputting, by the hardened neural network, the augmented input data to a computing system for processing of the augmented input data to perform a computing operation.
1 Assignment
0 Petitions
Accused Products
Abstract
Mechanisms are provided for providing a hardened neural network. The mechanisms configure the hardened neural network executing in the data processing system to introduce noise in internal feature representations of the hardened neural network. The noise introduced in the internal feature representations diverts gradient computations associated with a loss surface of the hardened neural network. The mechanisms configure the hardened neural network executing in the data processing system to implement a merge layer of nodes that combine outputs of adversarially trained output nodes of the hardened neural network with output nodes of the hardened neural network trained based on the introduced noise. The mechanisms process, by the hardened neural network, input data to generate classification labels for the input data and thereby generate augmented input data which is output to a computing system for processing to perform a computing operation.
27 Citations
20 Claims
-
1. A method, in a data processing system comprising a processor and a memory, the memory comprising instructions which are executed by the processor to specifically configure the processor to implement a hardened neural network, the method comprising:
-
configuring the hardened neural network executing in the data processing system to introduce noise in internal feature representations of the hardened neural network, wherein the noise introduced in the internal feature representations diverts gradient computations associated with a loss surface of the hardened neural network; configuring the hardened neural network executing in the data processing system to implement a merge layer of nodes that combine outputs of adversarially trained output nodes of the hardened neural network with output nodes of the hardened neural network trained based on the introduced noise; receiving, by the hardened neural network, input data for classification by the hardened neural network; processing, by the hardened neural network, the input data to generate classification labels for the input data and thereby generate augmented input data; and outputting, by the hardened neural network, the augmented input data to a computing system for processing of the augmented input data to perform a computing operation. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A computer program product comprising a computer readable storage medium having a computer readable program stored therein, wherein the computer readable program, when executed on a data processing system, causes the data processing system to:
-
configure a hardened neural network executing in the data processing system to introduce noise in internal feature representations of the hardened neural network, wherein the noise introduced in the internal feature representations diverts gradient computations associated with a loss surface of the hardened neural network; configure the hardened neural network executing in the data processing system to implement a merge layer of nodes that combine outputs of adversarially trained output nodes of the hardened neural network with output nodes of the hardened neural network trained based on the introduced noise; receive, by the hardened neural network, input data for classification by the hardened neural network; process, by the hardened neural network, the input data to generate classification labels for the input data and thereby generate augmented input data; and output, by the hardened neural network, the augmented input data to a computing system for processing of the augmented input data to perform a computing operation. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19)
-
-
20. An apparatus comprising:
-
at least one processor; and at least one memory coupled to the at least one processor, wherein the at least one memory comprises instructions which, when executed by the at least one processor, cause the at least one processor to; configure a hardened neural network executing on the at least one processor to introduce noise in internal feature representations of the hardened neural network, wherein the noise introduced in the internal feature representations diverts gradient computations associated with a loss surface of the hardened neural network; configure the hardened neural network executing on the at least one processor to implement a merge layer of nodes that combine outputs of adversarially trained output nodes of the hardened neural network with output nodes of the hardened neural network trained based on the introduced noise; receive, by the hardened neural network, input data for classification by the hardened neural network; process, by the hardened neural network, the input data to generate classification labels for the input data and thereby generate augmented input data; and output, by the hardened neural network, the augmented input data to a computing system for processing of the augmented input data to perform a computing operation.
-
Specification