Method and system for training a big data machine to defend

US 9,904,893 B2
Filed: 12/16/2016
Issued: 02/27/2018
Est. Priority Date: 04/02/2013
Status: Active Grant

First Claim

Patent Images

1. A method for training a big data machine to defend an enterprise system comprising:

retrieving log lines belonging to one or more log line parameters from one or more enterprise system data sources and from incoming data traffic to the enterprise system;

computing one or more features from the log lines;

wherein computing one or more features includes one or more statistical processes;

applying the one or more features to an adaptive rules model;

wherein the adaptive rules model comprises one or more identified threat labels;

further wherein applying the one or more features to the adaptive rules model comprises;

blocking one or more features that has one or more identified threat labels;

generating a features matrix from said applying the one or more features to the adaptive rules model;

executing at least one detection method from a first group of statistical outlier detection methods and at least one detection method from a second group of statistical outlier detection methods on one or more features matrix, to identify statistical outliers;

wherein the first group of statistical outlier detection methods includes a matrix decomposition-based outlier process, a replicator neural networks process and a joint probability process andthe second group of statistical outlier detection methods includes a matrix decomposition-based outlier process, a replicator neural networks process and a joint probability process;

wherein the at least one detection method from the first group of statistical outlier detection methods and the at least one detection method from the second group of statistical outlier detection methods are different;

generating an outlier scores matrix from each detection method of said first and second group of statistical outlier detection methods;

converting each outlier scores matrix to a top scores model;

combining each top scores model using a probability model to create a single top scores vector;

generating a GUI (Graphical User Interface) output of at least one of;

an output of the single top scores vector and the adaptive rules model;

labeling the said output to create one or more labeled features matrix;

creating a supervised learning module with the one or more labeled features matrix to update the one or more identified threat labels for performing at least one of;

further refining the adaptive rules model for identification of statistical outliers; and

preventing access by categorized threats by detecting new threats in real time and reducing the time elapsed between threat detection of the enterprise system.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Disclosed herein are a method and system for training a big data machine to defend, retrieve log lines belonging to log line parameters of a system'"'"'s data source and from incoming data traffic, compute features from the log lines, apply an adaptive rules model with identified threat labels produce a features matrix, identify statistical outliers from execution of statistical outlier detection methods, and may generate an outlier scores matrix. Embodiments may combine a top scores model and a probability model to create a single top scores vector. The single top scores vector and the adaptive rules model may be displayed on a GUI for labeling of malicious or non-malicious scores. Labeled output may be transformed into a labeled features matrix to create a supervised learning module for detecting new threats in real time and reducing the time elapsed between threat detection of the enterprise or e-commerce system.

Citations

19 Claims

1. A method for training a big data machine to defend an enterprise system comprising:
- retrieving log lines belonging to one or more log line parameters from one or more enterprise system data sources and from incoming data traffic to the enterprise system;
  
  computing one or more features from the log lines;
  
  wherein computing one or more features includes one or more statistical processes;
  
  applying the one or more features to an adaptive rules model;
  
  wherein the adaptive rules model comprises one or more identified threat labels;
  
  further wherein applying the one or more features to the adaptive rules model comprises;
  
  blocking one or more features that has one or more identified threat labels;
  
  generating a features matrix from said applying the one or more features to the adaptive rules model;
  
  executing at least one detection method from a first group of statistical outlier detection methods and at least one detection method from a second group of statistical outlier detection methods on one or more features matrix, to identify statistical outliers;
  
  wherein the first group of statistical outlier detection methods includes a matrix decomposition-based outlier process, a replicator neural networks process and a joint probability process andthe second group of statistical outlier detection methods includes a matrix decomposition-based outlier process, a replicator neural networks process and a joint probability process;
  
  wherein the at least one detection method from the first group of statistical outlier detection methods and the at least one detection method from the second group of statistical outlier detection methods are different;
  
  generating an outlier scores matrix from each detection method of said first and second group of statistical outlier detection methods;
  
  converting each outlier scores matrix to a top scores model;
  
  combining each top scores model using a probability model to create a single top scores vector;
  
  generating a GUI (Graphical User Interface) output of at least one of;
  
  an output of the single top scores vector and the adaptive rules model;
  
  labeling the said output to create one or more labeled features matrix;
  
  creating a supervised learning module with the one or more labeled features matrix to update the one or more identified threat labels for performing at least one of;
  
  further refining the adaptive rules model for identification of statistical outliers; and
  
  preventing access by categorized threats by detecting new threats in real time and reducing the time elapsed between threat detection of the enterprise system.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method of claim 1, wherein computing one or more features from the log lines includes activity tracking and activity aggregation.
  - 3. The method of claim 1, wherein the output of the single top scores vector comprises less than 100 single outlier scores.
  - 4. The method of claim 1, wherein labeling the output further includes classifying the severity of the threat.
  - 5. The method of claim 1, wherein the adaptive rules comprises malicious activities, non-malicious or any predetermined label.
  - 6. The method of claim 1, wherein the method is repeated daily over a specified time frame.
  - 7. The method of claim 6, wherein the specified time frame comprises at least 2 days.
  - 8. The method of claim 1, wherein the one or more log line parameters comprises at least one of:
    - user ID (Identification), session, IP (Internet Protocol) address, and URL (Uniform Resource Locator) query.
  - 9. The method of claim 1, wherein the one or more enterprise or e-commerce system data sources comprises at least one of:
    - web server access logs, firewall logs, DNS (Domain Name System) logs, forward proxy logs, external threat feeds, AV (Anti-Virus) logs, user logon audits, DLP (Data Loss Prevention) logs, LB (Load Balancer) logs, IPS (Intrusion Prevent System)/IDS (Intrusion Detection System) logs, black listed URLs, black listed IP addresses, and black listed referrers.
  - 10. The method of claim 1, wherein the one or more features comprises at least one of:
    - user session duration, length of user URL query, number of characters of user URL query, number of digits of user URL query, number of punctuations of user URL query, number of requests in user session, average time between clicks in user session, user session click rate, percentage of image requests in user sessions, percentage of 4xx responses in user session, percentage of 3xx in user sessions, percentage of 2xx responses in user session, percentage of zip responses in user session, percentage of binary responses in user session, percentage of head requests in user session, number of checkouts, number of credit cards added, number of promo codes added, number of gift cards added, number of times items were shipped overnight, number of times new shipping address was added, number of login failures, number of login successes, number of password resets, and total number of requests.

11. An apparatus for training a big data machine to defend an enterprise system, the apparatus comprising:
- one or more hardware processors;
  
  system memory coupled to the one or more processors;
  
  one or more non-transitory memory units coupled to the one or more processors; and
  
  threat identification and detection code stored on the one or more non-transitory memory units that when executed by the one or more processors are configured to perform a method, comprising;
  
  retrieving log lines belonging to one or more log line parameters from one or more enterprise system data sources and from incoming data traffic to the enterprise system;
  
  computing one or more features from the log lines;
  
  wherein computing one or more features includes one or more statistical processes;
  
  applying the one or more features to an adaptive rules model;
  
  wherein the adaptive rules model comprises one or more identified threat labels;
  
  further wherein the applying the one or more features to the adaptive rules model comprises;
  
  blocking one or more features that has one or more identified threat labels, investigating one or more features, or a combination thereof;
  
  generating a features matrix from said applying the one or more features to the adaptive rule model;
  
  executing at least one detection method from a first group of statistical outlier detection methods and at least one detection method from a second group of statistical outlier detection methods on one or more features matrix, to identify statistical outliers;
  
  wherein the first group of statistical outlier detection methods includes a matrix decomposition-based outlier process, a replicator neural networks process and a joint probability density process and the second group of statistical outlier detection methods includes a matrix decomposition-based outlier process, a replicator neural networks process and a density-based process;
  
  wherein the at least one detection method from the first group of statistical outlier detection methods and the at least one detection method from the second group of statistical outlier detection methods are different;
  
  generating an outlier scores matrix from each detection method of said first and second group of statistical outlier detection methods;
  
  converting each outlier scores matrix to a top scores model;
  
  combining each top scores model using a probability model to create a single top scores vector;
  
  generating a GUI (Graphical User Interface) output of at least one of;
  
  an output of the single top scores vector and the adaptive rules model;
  
  labeling the said output to create one or more labeled features matrix;
  
  creating a supervised learning model with the one or more labeled features matrix to update the one or more identified threat labels for performing at least one of;
  
  further refining the adaptive rules model; and
  
  preventing access by categorized threats by detecting new threats in real time and reducing the time elapsed between threat detection of the enterprise system.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19)
- - 12. The apparatus of claim 11, wherein computing one or more features from the log lines includes activity tracking and activity aggregation.
  - 13. The apparatus of claim 11, wherein the output of the single top scores vector comprises less than 100 single outlier scores.
  - 14. The apparatus of claim 11, wherein labeling the output further includes classifying the severity of the threat.
  - 15. The apparatus of claim 11, wherein the adaptive rules comprises malicious activities, non-malicious or any predetermined label.
  - 16. The apparatus of claim 11, wherein the method is repeated daily over a specified time frame.
  - 17. The apparatus of claim 11, wherein the specified time frame comprises at least 2 days.
  - 18. The apparatus of claim 11, wherein the one or more log line parameters comprises at least one of:
    - user ID, session, IP address, and URL query.
  - 19. The apparatus of claim 11, wherein the one or more enterprise or e-commerce system data sources comprises at least one of:
    - web server access logs, firewall logs, DNS (Domain Name System) logs, forward proxy logs, external threat feeds, AV logs, user logon audits, DLP logs, LB (Load Balancer) logs, IPS (Intrusion Prevent System)/IDS (Intrusion Detection System) logs, black listed URLs, black listed IP addresses, and black listed referrers.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Corelight, Inc.
Original Assignee
Patternex, Inc. (Corelight, Inc.)
Inventors
Veeramachaneni, Uday, Korrapati, Vamsi, Bassias, Constantinos, Arnaldo, Ignacio, Li, Ke
Primary Examiner(s)
LEE, JASON T

Application Number

US15/382,413
Publication Number

US 20170169360A1
Time in Patent Office

438 Days
Field of Search

726 23
US Class Current
CPC Class Codes

G06F 21/552   involving long-term monitor...

G06F 21/56   Computer malware detection ...

G06N 20/00   Machine learning

G06N 3/045   Combinations of networks

G06N 5/047   Pattern matching networks; ...

G06N 7/01   Probabilistic graphical mod...

H04L 2463/102   applying security measure f...

H04L 63/0263   Rule management

H04L 63/1408   by monitoring network traff...

H04L 63/1416   Event detection, e.g. attac...

H04L 63/1425   Traffic logging, e.g. anoma...

H04L 63/1441   Countermeasures against mal...

H04L 63/20   for managing network securi...

Method and system for training a big data machine to defend

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

Citations

19 Claims

Specification

Solutions

Use Cases

Quick Links

Method and system for training a big data machine to defend

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

19 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links