Protecting cognitive systems from gradient based attacks through the use of deceiving gradients

US 10,657,259 B2
Filed: 11/01/2017
Issued: 05/19/2020
Est. Priority Date: 11/01/2017
Status: Active Grant

First Claim

Patent Images

1. A method, in a data processing system comprising a processor and a memory, the memory comprising instructions which are executed by the processor to specifically configure the processor to implement a hardened neural network, the method comprising:

configuring the hardened neural network executing in the data processing system to introduce noise in internal feature representations of the hardened neural network, wherein the noise introduced in the internal feature representations diverts gradient computations associated with a loss surface of the hardened neural network;

configuring the hardened neural network executing in the data processing system to implement a merge layer of nodes that combine outputs of adversarially trained output nodes of the hardened neural network with output nodes of the hardened neural network trained based on the introduced noise;

receiving, by the hardened neural network, input data for classification by the hardened neural network;

processing, by the hardened neural network, the input data to generate classification labels for the input data and thereby generate augmented input data; and

outputting, by the hardened neural network, the augmented input data to a computing system for processing of the augmented input data to perform a computing operation.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Mechanisms are provided for providing a hardened neural network. The mechanisms configure the hardened neural network executing in the data processing system to introduce noise in internal feature representations of the hardened neural network. The noise introduced in the internal feature representations diverts gradient computations associated with a loss surface of the hardened neural network. The mechanisms configure the hardened neural network executing in the data processing system to implement a merge layer of nodes that combine outputs of adversarially trained output nodes of the hardened neural network with output nodes of the hardened neural network trained based on the introduced noise. The mechanisms process, by the hardened neural network, input data to generate classification labels for the input data and thereby generate augmented input data which is output to a computing system for processing to perform a computing operation.

27 Citations

View as Search Results

20 Claims

1. A method, in a data processing system comprising a processor and a memory, the memory comprising instructions which are executed by the processor to specifically configure the processor to implement a hardened neural network, the method comprising:
- configuring the hardened neural network executing in the data processing system to introduce noise in internal feature representations of the hardened neural network, wherein the noise introduced in the internal feature representations diverts gradient computations associated with a loss surface of the hardened neural network;
  
  configuring the hardened neural network executing in the data processing system to implement a merge layer of nodes that combine outputs of adversarially trained output nodes of the hardened neural network with output nodes of the hardened neural network trained based on the introduced noise;
  
  receiving, by the hardened neural network, input data for classification by the hardened neural network;
  
  processing, by the hardened neural network, the input data to generate classification labels for the input data and thereby generate augmented input data; and
  
  outputting, by the hardened neural network, the augmented input data to a computing system for processing of the augmented input data to perform a computing operation.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method of claim 1, wherein configuring the hardened neural network executing in the data processing system to introduce noise in internal feature representations of the neural network comprises introducing noise into each class of a classification operation performed by the neural network during training of the neural network.
  - 3. The method of claim 2, wherein configuring the hardened neural network executing in the data processing system to introduce noise in internal feature representations of the neural network comprises introducing at least one noisy region in the loss surface in association with a cluster of each class, and wherein gradients in the loss surface in association with the cluster of each class have a direction pointing towards the at least one noisy region.
  - 4. The method of claim 1, wherein configuring the hardened neural network executing in the data processing system to introduce noise in internal feature representations of the neural network comprises:
    - training a neural network, based on original training data, to classify input data samples into a plurality of different classes; and
      
      performing subsequent training of the neural network to generate the hardened neural network that is protected from adversarial input generation by diverting gradient calculations associated with the loss surface of the neural network.
  - 5. The method of claim 4, wherein the subsequent training comprises:
    - training the neural network, with regard to a first set of output nodes of the neural network, based on a first set of training data corresponding to data samples of the original training data;
      
      training the neural network, with regard to the first set of output nodes of the neural network, based on a second set of training data corresponding to noisy data samples generated from the first set of training data with first size perturbations introduced into the data samples; and
      
      training the neural network, with regard to a second set of output nodes of the neural network, based on a third set of training data corresponding to adversarial data samples generated from the first set of training data with second size perturbations, larger than the first size perturbations, introduced into the data samples.
  - 6. The method of claim 5, wherein the first size perturbations and second size perturbations are introduced into the data samples of the first set of training data based on a fast gradient sign function, and wherein the first size perturbations have a smaller multiplier in the fast gradient sign function than the second size perturbations.
  - 7. The method of claim 5, wherein training the neural network, with regard to the first set of output nodes of the neural network, based on the second set of training data comprises training the neural network to purposefully misclassify data samples in the second set of training data.
  - 8. The method of claim 7, wherein training the neural network to purposefully misclassify data samples in the second set of training data comprises utilizing a confusion matrix data structure to identify an alternative classification to a correct classification for data samples in the second set of training data.
  - 9. The method of claim 8, wherein the confusion matrix data structure comprises, for each data sample in the original training data, a count of a number of times the data sample is misclassified into an incorrect class by the neural network, and wherein utilizing the confusion matrix data structure to identify an alternative classification to the correct classification for data samples in the second set of training data comprises selecting, for each data sample in the second set of training data, a class having a lowest count.
  - 10. The method of claim 1, wherein the computing system is a cognitive system and wherein the computing operation is a cognitive operation.

11. A computer program product comprising a computer readable storage medium having a computer readable program stored therein, wherein the computer readable program, when executed on a data processing system, causes the data processing system to:
- configure a hardened neural network executing in the data processing system to introduce noise in internal feature representations of the hardened neural network, wherein the noise introduced in the internal feature representations diverts gradient computations associated with a loss surface of the hardened neural network;
  
  configure the hardened neural network executing in the data processing system to implement a merge layer of nodes that combine outputs of adversarially trained output nodes of the hardened neural network with output nodes of the hardened neural network trained based on the introduced noise;
  
  receive, by the hardened neural network, input data for classification by the hardened neural network;
  
  process, by the hardened neural network, the input data to generate classification labels for the input data and thereby generate augmented input data; and
  
  output, by the hardened neural network, the augmented input data to a computing system for processing of the augmented input data to perform a computing operation.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19)
- - 12. The computer program product of claim 11, wherein configuring the hardened neural network executing in the data processing system to introduce noise in internal feature representations of the neural network comprises introducing noise into each class of a classification operation performed by the neural network during training of the neural network.
  - 13. The computer program product of claim 12, wherein configuring the hardened neural network executing in the data processing system to introduce noise in internal feature representations of the neural network comprises introducing at least one noisy region in the loss surface in association with a cluster of each class, and wherein gradients in the loss surface in association with the cluster of each class have a direction pointing towards the at least one noisy region.
  - 14. The computer program product of claim 11, wherein configuring the hardened neural network executing in the data processing system to introduce noise in internal feature representations of the neural network comprises:
    - training a neural network, based on original training data, to classify input data samples into a plurality of different classes; and
      
      performing subsequent training of the neural network to generate the hardened neural network that is protected from adversarial input generation by diverting gradient calculations associated with the loss surface of the neural network.
  - 15. The computer program product of claim 14, wherein the subsequent training comprises:
    - training the neural network, with regard to a first set of output nodes of the neural network, based on a first set of training data corresponding to data samples of the original training data;
      
      training the neural network, with regard to the first set of output nodes of the neural network, based on a second set of training data corresponding to noisy data samples generated from the first set of training data with first size perturbations introduced into the data samples; and
      
      training the neural network, with regard to a second set of output nodes of the neural network, based on a third set of training data corresponding to adversarial data samples generated from the first set of training data with second size perturbations, larger than the first size perturbations, introduced into the data samples.
  - 16. The computer program product of claim 15, wherein the first size perturbations and second size perturbations are introduced into the data samples of the first set of training data based on a fast gradient sign function, and wherein the first size perturbations have a smaller multiplier in the fast gradient sign function than the second size perturbations.
  - 17. The computer program product of claim 15, wherein training the neural network, with regard to the first set of output nodes of the neural network, based on the second set of training data comprises training the neural network to purposefully misclassify data samples in the second set of training data.
  - 18. The computer program product of claim 17, wherein training the neural network to purposefully misclassify data samples in the second set of training data comprises utilizing a confusion matrix data structure to identify an alternative classification to a correct classification for data samples in the second set of training data.
  - 19. The computer program product of claim 18, wherein the confusion matrix data structure comprises, for each data sample in the original training data, a count of a number of times the data sample is misclassified into an incorrect class by the neural network, and wherein utilizing the confusion matrix data structure to identify an alternative classification to the correct classification for data samples in the second set of training data comprises selecting, for each data sample in the second set of training data, a class having a lowest count.

20. An apparatus comprising:
- at least one processor; and
  
  at least one memory coupled to the at least one processor, wherein the at least one memory comprises instructions which, when executed by the at least one processor, cause the at least one processor to;
  
  configure a hardened neural network executing on the at least one processor to introduce noise in internal feature representations of the hardened neural network, wherein the noise introduced in the internal feature representations diverts gradient computations associated with a loss surface of the hardened neural network;
  
  configure the hardened neural network executing on the at least one processor to implement a merge layer of nodes that combine outputs of adversarially trained output nodes of the hardened neural network with output nodes of the hardened neural network trained based on the introduced noise;
  
  receive, by the hardened neural network, input data for classification by the hardened neural network;
  
  process, by the hardened neural network, the input data to generate classification labels for the input data and thereby generate augmented input data; and
  
  output, by the hardened neural network, the augmented input data to a computing system for processing of the augmented input data to perform a computing operation.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Lee, Taesung, Molloy, Ian M., Tejani, Farhan
Primary Examiner(s)
Arani, Taghi T
Assistant Examiner(s)
White, Joshua R

Application Number

US15/800,697
Publication Number

US 20190130110A1
Time in Patent Office

930 Days
Field of Search

None
US Class Current
CPC Class Codes

G06F 21/57   Certifying or maintaining t...

G06F 2221/034   Test or assess a computer o...

G06N 3/02   Neural networks

G06N 3/044   Recurrent networks, e.g. Ho...

G06N 3/047   Probabilistic or stochastic...

G06N 3/08   Learning methods

G06N 5/022   Knowledge engineering; Know...

G06N 5/041   Abduction

Protecting cognitive systems from gradient based attacks through the use of deceiving gradients

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

27 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Protecting cognitive systems from gradient based attacks through the use of deceiving gradients

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

27 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links