Method and system for learning representations for log data in cybersecurity

US 10,367,841 B2
Filed: 11/22/2017
Issued: 07/30/2019
Est. Priority Date: 12/16/2016
Status: Active Grant

First Claim

Patent Images

1. A cybersecurity method comprising:

forming a time based series of behavioral features comprising human engineered features by extracting at least one behavioral feature from a first set of log data retrieved over a first time segment, and extracting at least one behavioral feature from a second set of log data retrieved over a second time segment;

analyzing the time based series of behavioral features,wherein said analyzing the time based series of behavioral features comprises using a neural network based system, a dimensionality reduction system, random forest system, or combinations thereof,deriving machine learned features from said time based series of behavioral features through said analyzing the time based series of behavioral features; and

detecting an attack or threat to an enterprise or e-commerce system through said analyzing the time based series of behavioral features,wherein said detecting an attack or threat comprises determining behavioral patterns indicative of said attack or threat based on the combination of said human engineered features and said machine learned features,wherein the time based series of behavioral features is formatted into a time-based matrix, wherein each behavioral feature is associated with an entity and a time segment.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Disclosed is a data analysis and cybersecurity method, which forms a time-based series of behavioral features, and analyzes the series of behavioral features for attack detection, new features derivation, and/or features evaluation. Analyzing the time based series of behavioral features may comprise using a Feed-Forward Neural Networks (FFNN) method, a Convolutional Neural Networks (CNN) method, a Recurrent Neural Networks (RNN) method, a Long Short-Term Memories (LSTMs) method, a principal Component Analysis (PCA) method, a Random Forest pipeline method, and/or an autoencoder method. In one embodiment, the behavioral features of the time-based series of behavioral features comprise human engineered features, and/or machined learned features, wherein the method may be used to learn new features from historic features.

15 Citations

View as Search Results

17 Claims

1. A cybersecurity method comprising:
- forming a time based series of behavioral features comprising human engineered features by extracting at least one behavioral feature from a first set of log data retrieved over a first time segment, and extracting at least one behavioral feature from a second set of log data retrieved over a second time segment;
  
  analyzing the time based series of behavioral features,wherein said analyzing the time based series of behavioral features comprises using a neural network based system, a dimensionality reduction system, random forest system, or combinations thereof,deriving machine learned features from said time based series of behavioral features through said analyzing the time based series of behavioral features; and
  
  detecting an attack or threat to an enterprise or e-commerce system through said analyzing the time based series of behavioral features,wherein said detecting an attack or threat comprises determining behavioral patterns indicative of said attack or threat based on the combination of said human engineered features and said machine learned features,wherein the time based series of behavioral features is formatted into a time-based matrix, wherein each behavioral feature is associated with an entity and a time segment.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, wherein the time-based series of behavioral features is further analyzed for features evaluation.
  - 3. The method of claim 1, wherein each of the at least one the behavioral feature is extracted by activity tracking, activity aggregation, or a combination thereof.
  - 4. The method of claim 1, wherein said analyzing the time based series of behavioral features comprises applying a first method, comprising at least one of a Feed-Forward Neural Network (FFNN), a Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), Principal Component Analysis (PCA), Recurrent Neural Network (RNN), in combination with a second method, comprising Random Forest (RF).
  - 5. The method of claim 1, wherein said analyzing the time based series of behavioral features comprises applying a method based on a Feed-Forward Neural Network (FFNN), a Convolutional Neural Network (CNN), a Recurrent Neural Network (RNN), Long Short-Term Memory (LSTM) network, principal Component Analysis (PCA), a Random Forest pipeline, an autoencoder, or combinations thereof.
  - 6. The method of claim 1, wherein said machine learned features are derived from said human engineered features.
  - 7. The method of claim 1, wherein said forming a time based series of behavioral features comprises extracting behavioral features from at least three time intervals.

8. An apparatus for learning representations of log data for cyber security, the apparatus comprising:
- one or more processors;
  
  a system memory coupled to the one or more processors;
  
  one or more non-transitory memory units coupled to the one or more processors; and
  
  features extraction codes, features formatting codes, and data analysis codes stored on the one or more non transitory memory units, that when executed by the one or more processors, are configured to perform a method, comprising;
  
  forming a time based series of behavioral features for multiple entities by extracting behavioral features from log data retrieved over a first time segment, and extracting behavioral features from log data retrieved over a second time segment, wherein said time based series of behavioral features comprises human engineered features associated with said multiple entities; and
  
  analyzing the time based series of behavioral features,wherein said analyzing the time based series of behavioral features comprises using a neural network based system, a dimensionality reduction system, random forest system, or combinations thereof,deriving machine learned features from said time based series of behavioral features through said analyzing the time based series of behavioral features; and
  
  detecting an attack or potential threat to the enterprise or e-commerce system through said analyzing the time based series of behavioral features,wherein said detecting an attack or potential threat comprises determining behavioral patterns indicative of said attack or potential threat based on the combination of said human engineered features and said machine learned features,wherein the features extraction codes are configured to extract the behavioral features by executing an activity tracking module, an activity aggregation module, or a combination thereof,wherein the time based series of behavioral features is formatted into a time based features matrix by formatting and storing the at least one or more features into the time based features matrix, wherein each feature is associated an entity and time segment.
- View Dependent Claims (9, 10, 11, 12, 13)
- - 9. The apparatus of claim 8, wherein the time-based series of behavioral features is further analyzed for features evaluation.
  - 10. The apparatus of claim 8, further comprising feeding data comprising log lines into the apparatus through a network interface to the one or more non-transitory memory units.
  - 11. The apparatus of claim 8, wherein each of the at least one behavioral feature is associated with a unique entity.
  - 12. The apparatus of claim 8, wherein the data analysis codes are configured to analyze the time based series of behavioral features by a Feed-Forward Neural Network (FFNN), a Convolutional Neural Network (CNN), a Recurrent Neural Network (RNN), Long Short-Term Memory (LSTM) network, principal Component Analysis (PCA), a Random Forest pipeline, an autoencoder, or combinations thereof.
  - 13. The apparatus of claim 8, wherein said machined learned features are derived from said human engineered features.

14. A cybersecurity method comprising:
- extracting at least one behavioral feature from a first set of log data retrieved over a first time segment, and extracting at least one behavioral feature from a second set of log data retrieved over a second time segment;
  
  computing, for multiple entities and over multiple time segments, one or more features from the log lines by activity tracking, activity aggregation, or a combination thereof;
  
  storing the one or more features in a time based series of behavioral features matrix, wherein for each of said entities, a set of features is stored on a per time-segment basis;
  
  analyzing the time-based series of behavioral features matrix using a neural network based system, a dimensionality reduction system, random forest system, or combinations thereof;
  
  deriving machine learned features from said time based series of behavioral features matrix via said analyzing;
  
  detecting a malicious entity by determining behavioral patterns indicative of a malicious status related to said malicious entity based on the combination of the derived machine learned features and said one or more features computed from said log lines.
- View Dependent Claims (15, 16, 17)
- - 15. The method of claim 14, wherein the machine learned features are derived using a method comprising at least one of a Feed-Forward Neural Network (FFNN), a Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), Principal Component Analysis (PCA), and Recurrent Neural Network (RNN).
  - 16. The method of claim 15, further comprising using a random forest classifier for feature evaluation.
  - 17. The method of claim 14, comprising computing said one or more features over at least three time segments.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Corelight, Inc.
Original Assignee
Patternex, Inc. (Corelight, Inc.)
Inventors
Arnaldo, Ignacio, Arun, Ankit, Lam, Mei, Bassias, Costas
Primary Examiner(s)
Rahman, Mahfuzur
Assistant Examiner(s)
Cruz-Franqui, Richard W

Application Number

US15/821,231
Publication Number

US 20180176243A1
Time in Patent Office

615 Days
Field of Search
US Class Current
CPC Class Codes

G06F 21/552   involving long-term monitor...

G06N 20/20   Ensemble learning

G06N 3/044   Recurrent networks, e.g. Ho...

G06N 3/045   Combinations of networks

G06N 3/08   Learning methods

G06N 5/01   Dynamic search techniques; ...

H04L 63/1416   Event detection, e.g. attac...

H04L 63/1425   Traffic logging, e.g. anoma...

Method and system for learning representations for log data in cybersecurity

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

15 Citations

17 Claims

Specification

Solutions

Use Cases

Quick Links

Method and system for learning representations for log data in cybersecurity

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

15 Citations

17 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links