ACTIVE LEARNING USING A DISCRIMINATIVE CLASSIFIER AND A GENERATIVE MODEL TO DETECT AND/OR PREVENT MALICIOUS BEHAVIOR

US 20090099988A1
Filed: 10/12/2007
Published: 04/16/2009
Est. Priority Date: 10/12/2007
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

automatically classifying each of multiple entries into one of multiple categories using a multi-class classifier;

collecting entries that are ambiguously classified;

collecting entries that do not fit a model of the automatically classified category of the entry;

ranking collected entries;

selecting at least some of the collected entries based on the ranking;

presenting at least some of the selected entries to a human analyst for labeling;

receiving an indication of a category from the human analyst for each of at least some of the presented entries; and

improving the multi-class classifier and one or more models based on the indicated labels.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A malicious behavior detection/prevention system, such as an intrusion detection system, is provided that uses active learning to classify entries into multiple classes. A single entry can correspond to either the occurrence of one or more events or the non-occurrence of one or more events. During a training phase, entries are automatically classified into one of multiple classes. After classifying the entry, a generated model for the determined class is utilized to determine how well an entry corresponds to the model. Ambiguous classifications along with entries that do not fit the model well for the determined class are selected for labeling by a human analyst The selected entries are presented to a human analyst for labeling. These labels are used to further train the classifier and the models. During an evaluation phase, entries are automatically classified using the trained classifier and a policy associated with determined class is applied.

Citations

20 Claims

1. A method comprising:
- automatically classifying each of multiple entries into one of multiple categories using a multi-class classifier;
  
  collecting entries that are ambiguously classified;
  
  collecting entries that do not fit a model of the automatically classified category of the entry;
  
  ranking collected entries;
  
  selecting at least some of the collected entries based on the ranking;
  
  presenting at least some of the selected entries to a human analyst for labeling;
  
  receiving an indication of a category from the human analyst for each of at least some of the presented entries; and
  
  improving the multi-class classifier and one or more models based on the indicated labels.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1 wherein at least some of the entries correspond to multiple actions.
  - 3. The method of claim 2 wherein an entry is ambiguously classified if an uncertainty score indicates a high level of uncertainty in the classification.
  - 4. The method of claim 1 wherein the method is performed multiple times.
  - 5. The method of claim 1 further comprising distributing the improved classifier to a disparate organization.
  - 6. The method of claim 1 wherein an entry corresponds to multiple events occurring within a predetermined time window.
  - 7. The method of claim 1 further comprising:
    - automatically classifying an entry into one of multiple categories using the improved multi-class classifier; and
      
      applying an associated policy based on the automatically classified category of the entry.
  - 8. The method of claim 1 wherein an entry corresponds with at least one of multiple network actions, multiple actions associated with accessing a data store, multiple financial transactions, and multiple actions associated with accessing a physical location.

9. An intrusion detection/prevention system comprising:
- a memory;
  
  an event acquiring component that receives an indication of multiple events;
  
  a clustering component that aggregates multiple events together into a single entry;
  
  a classifier component that automatically classifies at least some of the indicated entries into multiple event classes using one or more classifiers;
  
  multiple event models, one event model for each of multiple entry classes;
  
  an anomaly detection component that utilizes an event model for an entry class to detect potential anomalies within that class;
  
  a human labeling component that selects one or more entries for a human analyst to label, indicates at least some of the selected entries to a human analyst and receives an indication of an event class from the human analyst, the selected entries indicated are at least one of a potential anomaly or an ambiguously classified entry; and
  
  a training component that trains the one or more classifiers using one or more events that are classified by the human analyst.
- View Dependent Claims (10, 11, 12, 13)
- - 10. The intrusion detection system of claim 9 wherein the one or more classifiers include logistic regression classifiers.
  - 11. The intrusion detection system of claim 9 wherein the multiple event models are Bayes models.
  - 12. The intrusion detection system of claim 9 wherein the intrusion detection/prevention system is at least one of a building-based intrusion detection/prevention system, a network-based intrusion detection/prevention system, and a host-based intrusion detection/prevention system.
  - 13. The intrusion detection system of claim 9 further comprising an evaluation component that utilizes the trained one or more classifiers to classify the entry and applies an associated policy for the class of the entry.

14. A computer-readable storage medium comprising a trained classifier that was previously trained by performing the method of:
- receiving an indication of multiple entries;
  
  for each of multiple iterations,automatically classifying each entry into one of multiple classes using a classifier;
  
  for each of at least some classes,updating a model for the class; and
  
  utilizing the model to detect potential anomalies within the class;
  
  selecting one or more entries to be labeled by a human user;
  
  indicating each of the selected entries to a human user;
  
  receiving an indication of a label for each of the selected entries from the human user; and
  
  training the classifier using the indicated labels.
- View Dependent Claims (15, 16, 17, 18, 19, 20)
- - 15. The computer-readable storage medium of claim 14 further comprising instructions, when executed, that performs the method comprising:
    - receiving an indication of an entry, the entry corresponding to one or more events;
      
      automatically classifying the entry into one of multiple classes using the trained classifier; and
      
      applying a policy associated with the class the entry was classified into.
  - 16. The computer-readable storage medium of claim 15 wherein the applying of the policy associated with the class the entry was classified into is one of generating one or more alerts or automatically taking corrective action associated with the entry.
  - 17. The computer-readable storage medium of claim 14 wherein the entries correspond with at least one of network activity or access to a data source.
  - 18. The computer-readable storage medium of claim 14 wherein the selecting of one or more entries to be labeled includes determining a certainty score corresponding to a level of uncertainty of a classification of an entry and determining an anomaly score corresponding to a degree that an entry is similar to the class model.
  - 19. The computer-readable storage medium of claim 14 wherein at least some of the entries correspond to multiple events.
  - 20. The computer-readable storage medium of claim 14 wherein the classifier is a multi-class classifier.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Platt, John C., Shilman, Michael, Kravis, Joseph L., Stokes, Jack W.

Granted Patent

US 7,941,382 B2
Time in Patent Office

Days
Field of Search
US Class Current

706/20
CPC Class Codes

G06F 15/16 Combinations of two or more...

ACTIVE LEARNING USING A DISCRIMINATIVE CLASSIFIER AND A GENERATIVE MODEL TO DETECT AND/OR PREVENT MALICIOUS BEHAVIOR

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

ACTIVE LEARNING USING A DISCRIMINATIVE CLASSIFIER AND A GENERATIVE MODEL TO DETECT AND/OR PREVENT MALICIOUS BEHAVIOR

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links