AUTOMATIC GENERATION OF TRAINING DATA FOR ANOMALY DETECTION USING OTHER USER'S DATA SAMPLES

US 20170061322A1
Filed: 08/31/2015
Published: 03/02/2017
Est. Priority Date: 08/31/2015
Status: Active Grant

First Claim

Patent Images

1. A method, comprising:

for a system or application used by a plurality of users, providing an access to a memory device storing user data samples for all users of the plurality of users;

selecting a target user from among the plurality of users; and

using a processor on a computer and using data samples for the target user and data samples for other users of the plurality of users, generating a normal sample data set and an abnormal (anomalous) sample data set to serve as a training data set for training a model for an anomaly detection monitor for the target user.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method (and structure) generates a classifier for an anomalous detection monitor for a target user on a system or application used by a plurality of users and includes providing an access to a memory device storing user data samples for all users of the plurality of users. A target user is selected from among the plurality of users. Data samples for the target user and data samples for other users of the plurality of users are used to generate a normal sample data set and an abnormal (anomalous) sample data set to serve as a training data set for training a model for an anomaly detection monitor for the target user.

Citations

20 Claims

1. A method, comprising:
- for a system or application used by a plurality of users, providing an access to a memory device storing user data samples for all users of the plurality of users;
  
  selecting a target user from among the plurality of users; and
  
  using a processor on a computer and using data samples for the target user and data samples for other users of the plurality of users, generating a normal sample data set and an abnormal (anomalous) sample data set to serve as a training data set for training a model for an anomaly detection monitor for the target user.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. The method of claim 1, wherein the data samples of the target user form a cluster of data points in a data space and wherein the target user'"'"'s cluster of data points provides a reference for the generating of the normal sample data set and for the generating of the abnormal sample data set for the target user.
  - 3. The method of claim 2, wherein:
    - the target user'"'"'s cluster of data points serves as a basis to derive the target user'"'"'s normal sample data set; and
      
      the target user'"'"'s abnormal sample data set is derived from samples from low density areas of the other users'"'"' data samples relative to the target user'"'"'s cluster of data points.
  - 4. The method of claim 2, wherein a local outlier factor (LOF) function is used for generating the abnormal sample data sets for the target user.
  - 5. The method of claim 4, wherein the normal sample data set for the target user is generated by one of:
    - using the target user'"'"'s own data samples without modification; and
      
      executing the LOF function processing on the target user'"'"'s own data samples to identify and eliminate outlier samples from the target user'"'"'s data samples.
  - 6. The method of claim 5, wherein the abnormal sample data set for the target user is generated by one of:
    - a boundary sampling (LowLOFAll) processing, in which samples from all other users'"'"' data are selected that have lowest LOF scores from samples having LOF scores above a threshold value;
      
      a boundary sampling per user (LowLOFUser) processing, in which samples from each of other users'"'"' data are selected that have lowest LOF scores;
      
      an outlier sampling (HighLOFAll) processing, in which samples from all other users'"'"' data are selected that have highest LOF scores; and
      
      an outlier sampling per user (HighLOFUser) processing, in which samples from each of other users'"'"' data are select that have highest LOF scores.
  - 7. The method of claim 4, further comprising receiving as inputs from an administrator/operator at least one of:
    - a selected method for processing from among a plurality of alternative methods;
      
      threshold information; and
      
      a desired total number of samples to be generated for the abnormal sample set.
  - 8. The method of claim 1, further comprising generating and implementing the anomalous detection monitor for the target user.
  - 9. The method of claim 1, wherein the target user comprises a first target user and wherein, upon completing the generating of the normal sample data set and the anomalous sample data set for the first target user, a second target user is selected from the plurality of users and the first target user becomes another other user of the plurality of users for purpose of generating the normal sample data set and the anomalous sample data set for the second target user.
  - 10. The method of claim 1, as embodied in a set of computer-readable instructions stored on a non-transitory storage medium.
  - 11. The method of claim 10, wherein the non-transitory storage medium comprises one of:
    - a random access memory (RAM) on a computer currently executing the method;
      
      a memory device on a computer storing the set of computer-readable instructions as an application program that can be selectively executed or that can be selectively downloaded to another computer via a network;
      
      a memory device on a computer storing the set of computer-readable instructions as an application program that can be selectively executed as a cloud service; and
      
      a standalone memory device that can be inserted into a I/O device or port on a computer to upload the computer-readable instructions onto the computer.

12. An apparatus, comprising:
- a memory device; and
  
  a processor having access to the memory device, the memory device storing a series of machine-readable instructions to execute a method of generating a normal sample data set and an abnormal (anomalous) sample data set to serve as a classifier for training a model for an anomalous detection monitor for a target user, the target user being one of a plurality of users sharing a system or application,wherein the method comprises;
  
  providing an access to a memory device storing user data samples for all users of the plurality of users;
  
  selecting a target user from among the plurality of users; and
  
  using the processor to generate the normal sample data set and the abnormal sample data set using data samples for the target user and data samples for other users of the plurality of users.
- View Dependent Claims (13, 14, 15, 16, 17)
- - 13. The apparatus of claim 12, wherein:
    - the data samples of the target user form a cluster of data points in a data space;
      
      the target user'"'"'s cluster of data points provides a reference for the generating of the normal sample data set and for the generating of the abnormal sample data set for the target; and
      
      a local outlier factor (LOF) function is used for generating the normal sample data set and the abnormal sample data sets for the target user, as based on the target user'"'"'s cluster of data points.
  - 14. The apparatus of claim 13, further comprising:
    - an input device permitting an operator/administrator to input values for parameters related to the generating of the normal and abnormal sample data sets; and
      
      a display device permitting the operator/administrator to view results of the generating of the normal and abnormal sample data sets.
  - 15. The apparatus of claim 14, wherein a threshold parameter a processing of the LOF function for a target user and a set of other users in the plurality of users can be determined based on viewing the results on the display device.
  - 16. The apparatus of claim 12, as comprising a server in a network.
  - 17. The apparatus of claim 12, as executing in a cloud environment.

18. An anomaly detector, as executed by a processor on a computer, the anomaly detector comprising a monitor for detecting anomalous behavior by any user of a plurality of users sharing a system or application, the anomaly detector comprising:
- an input receiving data related to a current operation of the system or application by the users;
  
  a monitor module for each user as a target user, the monitor module for each target user executing a model of the target user to detect whether the target user'"'"'s current operation of the system or application comprises anomalous behavior; and
  
  an output to provide an alert signal if any user is detected as demonstrating anomalous behavior,wherein the model for each target user is developed from a classifier based on a normal sample data set and an abnormal sample data set for the target user, andwherein data samples for the target user and data samples for other users of the plurality of users are used to generate the normal sample data set and the abnormal sample data set to serve as a classifier for training the model for the anomalous detection monitor module for the target user.
- View Dependent Claims (19, 20)
- - 19. The anomaly detector of claim 18, wherein:
    - the data samples of the target user form a cluster of data points in a data space;
      
      the target user'"'"'s cluster of data points provides a reference for the generating of the normal sample data set and for the generating of the abnormal sample data set for the target; and
      
      a local outlier factor (LOF) function is used for generating the normal sample data set and the abnormal sample data sets for the target user, as based on the target user'"'"'s cluster of data points.
  - 20. The anomaly detector of claim 18, as implemented on one of:
    - a server on a network; and
      
      a cloud service.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Arkose Labs Holdings Incorporated
Original Assignee
International Business Machines Corporation
Inventors
CHARI, Suresh N., MOLLOY, Ian Michael, PARK, Youngja

Granted Patent

US 10,147,049 B2
Time in Patent Office

Days
Field of Search
US Class Current

1/1
CPC Class Codes

G06F 21/55   Detecting local intrusion o...

G06F 21/554   involving event detection a...

G06F 2221/034   Test or assess a computer o...

G06N 20/00   Machine learning

G06N 20/20   Ensemble learning

H04L 63/1425   Traffic logging, e.g. anoma...

AUTOMATIC GENERATION OF TRAINING DATA FOR ANOMALY DETECTION USING OTHER USER'S DATA SAMPLES

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

AUTOMATIC GENERATION OF TRAINING DATA FOR ANOMALY DETECTION USING OTHER USER'S DATA SAMPLES

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links