METHOD AND SYSTEM FOR TRAINING A BIG DATA MACHINE TO DEFEND

0Associated
Cases 
0Associated
Defendants 
0Accused
Products 
8Forward
Citations 
0
Petitions 
2
Assignments
First Claim
1. A method for training a big data machine to defend an enterprise system comprising:
 retrieving log lines belonging to one or more log line parameters from one or more enterprise system data sources and from incoming data traffic to the enterprise system;
computing one or more features from the log lines;
wherein computing one or more features includes one or more statistical processes;
applying the one or more features to an adaptive rules model;
wherein the adaptive rules model comprises one or more identified threat labels;
further wherein applying the one or more features to an adaptive rules model comprises;
blocking one or more features that has one or more identified threat labels;
generating a features matrix from said applying the one or more features to an adaptive rule module;
executing at least one detection method from a first group of statistical outlier detection methods and at least one detection method from a second group of statistical outlier detection methods on one or more features matrix, to identify statistical outliers;
wherein the first group of statistical outlier detection methods includes a matrix decompositionbased outlier process, a replicator neural networks process and a joint probability process andthe second group of statistical outlier detection methods includes a matrix decompositionbased outlier process, a replicator neural networks process and a joint probability process;
wherein the at least one detection method from a first group of statistical outlier detection methods and the at least one detection method from a second group of statistical outlier detection methods are different;
generating an outlier scores matrix from each detection method of said first and second group of statistical outlier detection methods;
converting each outlier scores matrix to a top scores model;
combining each top scores model using a probability model to create a single top scores vector;
generating a GUI output of at least one of;
an output of the single top scores vector and the adaptive rules model;
labeling the said output to create one or more labeled features matrix;
creating a supervised learning module with the one or more labeled features matrix to update the one or more identified threat labels for performing at least one of;
further refining adaptive rules model for identification of statistical outliers;
andpreventing access by categorized threats by detecting new threats in real time and reducing the time elapsed between threat detection of the enterprise system.
2 Assignments
0 Petitions
Accused Products
Abstract
Disclosed herein are a method and system for training a big data machine to defend, retrieve log lines belonging to log line parameters of a system'"'"'s data source and from incoming data traffic, compute features from the log lines, apply an adaptive rules model with identified threat labels produce a features matrix, identify statistical outliers from execution of statistical outlier detection methods, and may generate an outlier scores matrix. Embodiments may combine a top scores model and a probability model to create a single top scores vector. The single top scores vector and the adaptive rules model may be displayed on a GUI for labeling of malicious or nonmalicious scores. Labeled output may be transformed into a labeled features matrix to create a supervised learning module for detecting new threats in real time and reducing the time elapsed between threat detection of the enterprise or ecommerce system.
14 Citations
View as Search Results
SPACE AND TIME EFFICIENT THREAT DETECTION  
Patent #
US 20160226895A1
Filed 01/26/2016

Current Assignee
Anomali Inc.

Sponsoring Entity
Anomali Inc.

DYNAMIC SEMANTIC NETWORKS FOR LANGUAGE UNDERSTANDING AND QUESTION ANSWERING  
Patent #
US 20190005090A1
Filed 06/29/2017

Current Assignee
Futurewei Technologies Incorporated

Sponsoring Entity
Futurewei Technologies Incorporated

Space and time efficient threat detection  
Patent #
US 10,230,742 B2
Filed 01/26/2016

Current Assignee
Anomali Inc.

Sponsoring Entity
Anomali Inc.

Computerimplemented process and system employing outlier score detection for identifying and detecting scenariospecific data elements from a dynamic data source  
Patent #
US 10,264,027 B2
Filed 07/28/2017

Current Assignee
Patternex Inc.

Sponsoring Entity
Patternex Inc.

Systems and methods for log and snort synchronized threat detection  
Patent #
US 10,462,170 B1
Filed 11/21/2017

Current Assignee
Alert Logic Incorporated

Sponsoring Entity
Alert Logic Incorporated

Space and time efficient threat detection  
Patent #
US 10,616,248 B2
Filed 01/23/2019

Current Assignee
Anomali Inc.

Sponsoring Entity
Anomali Inc.

Methods and systems for analyzing cybersecurity threats  
Patent #
US 10,685,293 B1
Filed 01/20/2017

Current Assignee
CYBRAICS INC.

Sponsoring Entity
CYBRAICS INC.

Anomaly alert system for cyber threat detection  
Patent #
US 10,701,093 B2
Filed 02/06/2017

Current Assignee
Darktrace Limited

Sponsoring Entity
Darktrace Limited

Information Infrastructure Management Tools with Extractor, Secure Storage, Content Analysis and Classification and Method Therefor  
Patent #
US 20080168135A1
Filed 05/09/2007

Current Assignee
Digital Doors Incorporated

Sponsoring Entity
Digital Doors Incorporated

SYSTEMS AND METHODS FOR PROCESSING DATA FLOWS  
Patent #
US 20080229415A1
Filed 10/29/2007

Current Assignee
Blue Coat Systems Incorporated

Sponsoring Entity
Blue Coat Systems Incorporated

CONTINUOUS ANOMALY DETECTION BASED ON BEHAVIOR MODELING AND HETEROGENEOUS INFORMATION ANALYSIS  
Patent #
US 20120137367A1
Filed 11/08/2010

Current Assignee
Sunrise Series 54 of Allied Security Trust I

Sponsoring Entity
Sunrise Series 54 of Allied Security Trust I

Class discovery for automated discovery, attribution, analysis, and risk assessment of security threats  
Patent #
US 8,418,249 B1
Filed 11/10/2011

Current Assignee
Narus Inc.

Sponsoring Entity
Narus Inc.

LEARNING INFORMATION ON USAGE BY A USER, OF ONE OR MORE DEVICE(S), FOR CUMULATIVE INFERENCE OF USER'S SITUATION  
Patent #
US 20130318584A1
Filed 11/01/2012

Current Assignee
Qualcomm Inc.

Sponsoring Entity
Qualcomm Inc.

METHODS AND APPARATUS FOR DETECTING A VOICE COMMAND  
Patent #
US 20140278435A1
Filed 03/12/2013

Current Assignee
Nuance Communications Inc.

Sponsoring Entity
Nuance Communications Inc.

19 Claims
 1. A method for training a big data machine to defend an enterprise system comprising:
retrieving log lines belonging to one or more log line parameters from one or more enterprise system data sources and from incoming data traffic to the enterprise system; computing one or more features from the log lines; wherein computing one or more features includes one or more statistical processes; applying the one or more features to an adaptive rules model; wherein the adaptive rules model comprises one or more identified threat labels; further wherein applying the one or more features to an adaptive rules model comprises;
blocking one or more features that has one or more identified threat labels;generating a features matrix from said applying the one or more features to an adaptive rule module; executing at least one detection method from a first group of statistical outlier detection methods and at least one detection method from a second group of statistical outlier detection methods on one or more features matrix, to identify statistical outliers; wherein the first group of statistical outlier detection methods includes a matrix decompositionbased outlier process, a replicator neural networks process and a joint probability process and the second group of statistical outlier detection methods includes a matrix decompositionbased outlier process, a replicator neural networks process and a joint probability process; wherein the at least one detection method from a first group of statistical outlier detection methods and the at least one detection method from a second group of statistical outlier detection methods are different; generating an outlier scores matrix from each detection method of said first and second group of statistical outlier detection methods; converting each outlier scores matrix to a top scores model; combining each top scores model using a probability model to create a single top scores vector; generating a GUI output of at least one of;
an output of the single top scores vector and the adaptive rules model;labeling the said output to create one or more labeled features matrix; creating a supervised learning module with the one or more labeled features matrix to update the one or more identified threat labels for performing at least one of; further refining adaptive rules model for identification of statistical outliers; and preventing access by categorized threats by detecting new threats in real time and reducing the time elapsed between threat detection of the enterprise system.  View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
 11. An apparatus for training a big data machine to defend an enterprise system, the apparatus comprising:
one or more processors; system memory coupled to the one or more processors; one or more nontransitory memory units coupled to the one or more processors; and threat identification and detection code stored on the one or more nontransitory memory units that when executed by the one or more processors are configured to perform a method, comprising; retrieving log lines belonging to one or more log line parameters from one or more enterprise system data sources and from incoming data traffic to the enterprise system; computing one or more features from the log lines; wherein computing one or more features includes one or more statistical processes; applying the one or more features to an adaptive rules model; wherein the adaptive rules model comprises one or more identified threat labels; further wherein the applying the one or more features to an adaptive rules model comprises;
blocking one or more features that has one or more identified threat labels, investigating one or more features, or a combination thereof;generating a features matrix from said applying the one or more features to an adaptive rule module; executing at least one detection method from a first group of statistical outlier detection methods and at least one detection method from a second group of statistical outlier detection methods on one or more features matrix, to identify statistical outliers; wherein the first group of statistical outlier detection methods includes a matrix decompositionbased outlier process, a replicator neural networks process and a joint probability density process and the second group of statistical outlier detection methods includes a matrix decompositionbased outlier process, a replicator neural networks process and a densitybased process; wherein the at least one detection method from a first group of statistical outlier detection methods and the at least one detection method from a second group of statistical outlier detection methods are different; generating an outlier scores matrix from each detection method of said first and second group of statistical outlier detection methods; converting each outlier scores matrix to a top scores model; combining each top scores model using a probability model to create a single top scores vector; generating a GUI output of at least one of;
an output of the single top scores vector and the adaptive rules model;labeling the said output to create one or more labeled features matrix; creating a supervised learning model with the one or more labeled features matrix to update the one or more identified threat labels for performing at least one of; further refining adaptive rules model for identification of statistical outliers; and preventing access by categorized threats by detecting new threats in real time and reducing the time elapsed between threat detection of the enterprise system.  View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19)
1 Specification
This application claims benefit to the provisional application No. 62/340,388 filed May 23, 2016.
This application claims priority to the U.S. NonProvisional application Ser. No. 15/258,797 filed Sep. 7, 2016.
This application claims priority to U.S. NonProvisional application Ser. No. 14/532,812, filed Nov. 4, 2014, which claims priority to U.S. Provisional Patent Application No. 61/807,699 filed Apr. 2, 2013.
All of the prior applications are incorporated herein in their entirety by reference.
The present disclosure relates generally to a security analyst driven and machine learning driven computer security system and method for detecting threats by creating statistical rules using statistical outliers for one or more enterprise or ecommerce systems.
Electronic information across networks is a crucial aspect of an enterprise or ecommerce system. However, such electronic information may expose these systems to security threats. Hackers are constantly changing their behavior by figuring out current rules and designing newer attacks that can sidestep detection.
In current technology, information security solutions generally fall into two categories: security analystdriven and unsupervised machine learningdriven. Security analystdriven solutions rely on rules determined by fraud and security experts, and exhibit high rates of undetected attacks. This solution also leads to delays between attack detection and implantation of preventative countermeasures. These delays are both costly and timeconsuming for the enterprise or ecommerce systems.
Unsupervised machine learningdriven solutions can lead to detection of rare or anomalous patterns and may also lead to improved detection of new attacks. However, these solutions trigger more false positive alarms and alerts. These false positives require increased rates of substantial investigative efforts before they are dismissed.
Existing enterprises or ecommerce systems lack labeled threat examples from previous attacks, undercutting the ability to use supervised learning models. Due to the constant changing of an attacker'"'"'s behavior, these models become irrelevant.
As a result, many enterprise and ecommerce systems using existing technology remain exposed to security threats, and improved security systems are needed to provide real time identification of threats.
Another challenge imposed by existing technology is resultant from malicious activities being extremely rare. Attack cases represent a minor fraction of total events, generally <0.1%. To illustrate this fact,
The dearth of malicious activities results in extreme class imbalance when learning a supervised model, and increases the difficulty of the detection process. Not all malicious activities are systematically reported, either because their incident responses were inconclusive, or because they were not detected in the first place. This includes noise into the data, since unreported attacks will be considered legitimate activity. Attack vectors can take a wide variety of shapes. Even when malicious activities are reported, the users are not always aware of the specific vectors involved. Therefore, difficulty arises in developing robust defense strategies that are capable of detecting as many attacks as possible.
Importantly, there is a need for a method and system capable of detecting threats in real time, and collecting analysts'"'"' feedback to improve detection rates over time.
From such information gathering, there is a need for an active learning method that reduces the false positives for the detected threats.
There is, further, a need for a system that incorporates behavioral predictive analytics for network intrusion and internal threat detection.
Now, a method and system capable of addressing realtime security system threats may have application in a broad array of active learning and machine learning applications that are of value and benefit to the information system security professionals. Accordingly, the scope of the present disclosure extends beyond the collecting and detecting of threats.
The present disclosure details an endtoend system that learns over time from feedback from a security analyst, hereafter referred as analyst. The system may include a big data processing system, an outlier detection system, a feedback mechanism, a continuous learning system and a supervised learning module.
The big data processing system comprises a platform that may quantify the features of different entities and compute them from raw data. With highvolume, highvelocity data, this first component requires processing at a challenging scale.
An exemplary outlier detection system may learn a descriptive model of those features extracted from the data via unsupervised learning, using one or more of a joint probability density, matrix decomposition or replicator neural network outlier detection system. To achieve confidence and robustness when detecting rare and extreme events, the system may fuse multiple scores into a final score that indicates how far a certain entity'"'"'s or event'"'"'s probability is from the others.
The feedback mechanism and continuing learning system may incorporate an analyst'"'"'s input through a user interface. The feedback mechanism and continuing learning system may present the top outlier events or entities and ask the analyst to provide input indicating if a specific combination of features is, or is not, malicious. This feedback may then feed into the supervised learning module. The number of outlier events examined and the feedback frequency (e.g. daily or weekly) are both decided by the analyst.
The supervised learning module may receive the analyst'"'"'s feedback, learn a model that predicts whether a new incoming event is normal or malicious, and may continually refine the model as more feedback is gathered.
In some embodiments, the supervised learning module may have access to labeled features from the past, historical labels, even before the detection system is deployed. An additional parameter, d{0,28} may be introduced to represent the number of days for which the labeled examples are available. For each strategy, the total number of detected attacks, the recall, and the area under the area under the receiver operating characteristic curve (AUC) of the deployed classifier may be reported on a monthly basis.
The detection rate of the present disclosure with d=0 and d=28 increases over time, reaching 0.500 and 0.604 respectively at the 12^{th }and final week.
The performance of the classifiers at the end of the 12^{th }week was approximately identical among the three setups of the present disclosure. In the case of d=0, the AUC of the classifier in the final week reached 0.940. The setup of d=28 reached 0.946 of the present disclosure.
The present disclosure may defend against unseen attacks and may be bootstrapped without labeled features. Given enough interactions with the analyst, the present disclosure may reach a performance similar to that obtained when historic attack examples are available.
While the present disclosure is subject to various modifications and alternative forms, specific embodiments thereof are shown by way of example in the drawings and the accompanying detailed description. It should be understood, however, that the drawings and detailed description are not intended to limit the invention to the particular embodiments. This disclosure is instead intended to cover all modifications, equivalents, and alternatives falling within the scope of the present invention as defined by the appended claims.
In light of the present disclosure, here appears a method and system for training a big data machine to defend that properly addresses the seriousness of detecting threats in real time.
The present disclosure provides a method and system for training a big data machine to defend an enterprise system. The method and system provide for retrieving log lines belonging to one or more log line parameters from one or more enterprise data source and from incoming data traffic to the enterprise. The method and system may further provide for computing one or more features from the log lines which includes one or more statistical processes. The one or more features may be applied to an adaptive rules model. The adaptive rules model may comprise one or more identified threat labels.
In some embodiments, applying the one or more features to an adaptive rules model may include a step of blocking one or more features that has one or more identified threat labels, investigating one or more features, or a combination thereof. The output of the one or more features that may not have a labeled threat may be incorporated into a features matrix.
In some embodiments, identification of a set of statistical outliers may include at least one detection method.
In some embodiments, identification of a set of statistical outliers may include at least a second detection method.
In some embodiments, an outlier scores matrix may be generated from each detection method of said first and second group of statistical outlier detection methods.
Embodiments of the present disclosure may convert each outlier scores matrix to a top scores model. Some embodiments may further combine each top scores model using a probability model to create a single top scores vector. Some embodiments may output the single top scores vector and the adaptive rules model via a GUI.
Embodiments of the present disclosure may label the output of the single top scores vector and the adaptive rules model to create one or more labeled features matrix. By creating this labeled features matrix, a supervised learning module may be provided with this matrix to update the one or more identified threat labels.
Embodiments of the present disclosure further refines the adaptive rules model for identification of statistical outliers and prevents access via categorized threats by detecting new threats in real time and reducing the time elapsed between threat detection of the enterprise system.
The novel features believed characteristic of the disclosed subject matter will be set forth in any claims that are filed later. The disclosed subject matter itself, however, as well as a preferred mode of use, further objectives, and advantages thereof, will best be understood by reference to the following detailed description of an illustrative embodiment when read in conjunction with the accompanying drawings, wherein:
Having thus described the invention in general terms, reference will now be made to the accompanying drawings, which are not necessarily drawn to scale, and wherein:
One or more embodiments of the invention are described below. It should be noted that these and any other embodiments are exemplary and are intended to be illustrative of the invention rather than limiting. While the invention is widely applicable to different types of systems, it is impossible to include all the possible embodiments and contexts of the invention in this disclosure. Upon reading this disclosure, many alternative embodiments of the present invention will be apparent to the persons of ordinary skill in the art.
Embodiments of the present invention may process both web logs, firewall logs, or a combination of the two. In a typical enterprise or ecommerce system, logs may be delivered in real, streaming time from widely distributed sources.
Typically, but not exclusively, web log analysis may facilitate the detection of web attacks. Typically, but not exclusively, mining firewall logs may facilitate the prevention of data exfiltration in the enterprise or ecommerce setups.
As shown, processing begins at 110, whereupon log lines belonging to one or more log line parameters from one or more enterprise or ecommerce system data source and/or from incoming data traffic to the enterprise or ecommerce system. The one or more enterprises or ecommerce system data sources comprises at least one of: web server access logs, firewall logs, packet captures per application, active directory logs, DNS logs, forward proxy logs, external threat feeds, AV logs, user logon audits, DLP logs, LB logs, IPS/IDS logs, black listed URLs, black listed IP addresses, and black listed referrers. The one or more log line parameters comprises at least one of: user ID, session, IP address, and URL query.
Process flow continues to 112, wherein one or more features are computed from the grouped log lines. Feature extraction may include activity tracking, activity aggregation, or a combination thereof. As disclosed herein, embodiments performing activity tracking may absorb log stream generated by the platform, identify the entities involved in each log line, e.g. IP address, user etc., and update the corresponding activity records. These activity records may then be calculated and stored according to system guidelines. In one guideline arrangement, activity records are calculated and stored in accordance with a short temporal window. For example, the temporal window over which these activity records may be computed and stored may be in oneminute increments. This way, the computation of behavioral features are computed for different time intervals: −30 minutes, 1 hour, 12 hours and 24 hours. This allows flexibility in analysis.
In a further guideline arrangement, activity records are calculated and stored for streamlined, efficient retrieval of the user data necessary for feature computation. Depending on the definition of the feature, aggregating activity records for a larger time window may include anything from simple counters to complex data structures.
In activity aggregation, computing behavioral features over an interval of time may require two steps, the first step being retrieving all activity records that fall within the given interval. The behavioral descriptors are aggregated over 24 hours and end at the time of the last user activity. This can be graphically represented as a rolling 24hour window for feature computation. The second step is to aggregate minutebyminute activity records as the feature demands. Again, this aggregation step depends on the feature type. In the simplest step, counters, one must merely add all the minutebyminute values together. The more complex case of unique values requires retrieving the unique values of a super set formed by the minutetominute sets.
Continuing the process flow, the one or more features may be applied to an adaptive rules model at block 114. At block 114, an embodiment may compare the one or more features to predetermined adaptive rules of malicious activities, nonmalicious activities or any predetermined rule. A predictive module 116 may block one or more features that has one or more identified threat labels 116a, investigate one or more features 116b, or a combination thereof. Blocking one or more features may prevent a malicious activity by issuing a warning to the system, analyst, or a combination thereof. Investigating one or more features may involve an analyst investigating a labeled feature and determining if the label is correctly or incorrectly labeled, changing the label, or a combination thereof. At block 118 a features matrix may be generated from applying the one or more features to the adaptive rules model. In the features matrix, the one or more features make up the columns and the one or more log line parameters make up the rows. The features matrix, organized or grouped by sessions, comprises at least one of: user session duration, number of requests in user session, average time between clicks in user session, user session click rate, percentage of image requests in user session, percentage of 4xx responses in user session, percentage of 3xx responses in user session, percentage of 2xx responses in user session, percentage of zip responses in user session, percentage of binary responses in user session, and percentage of head requests in user session. The features of a features matrix, organized or grouped by URL queries, comprises at least one of: length of user URL query, number of characters of user URL query, number of digits of user URL query, and number of punctuations of URL query. The features of a features matrix, organized or grouped by user ID, comprises at least one of: number of checkouts, number of credit cards added, number of promo codes added, number of gift cards added, number of times items were shipped overnight, and number of times new shipping address was added. The features of a features matrix, organized or grouped by IP address, comprises at least one of: number of login failures, number of login successes, number of password resets, and total number of requests.
At block 120, process flow continues with performing at least one detection method from a first group of statistical outlier detection methods, and performing at least a second detection method from a second group of statistical outlier detection methods on the one or more features matrix to identify statistical outliers. The first group of statistical outlier detection methods comprises at least one of: matrix decompositionbased outlier process, a replicator neural networks process, and a joint probability density process. The second group of statistical outlier detection methods comprises at least one of: matrix decompositionbased outlier process, a replicator neural networks process and a joint probability density process.
Shown in
Further defining the matrix decompositionbased outlier process, X is a pdimensional dataset. Its covariance matrix Σ can be decomposed as: Σ=P×D×P^{T}, where P is an orthonormal matrix where the columns are the eigenvectors of Σ, and D is the diagonal matrix containing the corresponding eigenvalues λ_{1 }. . . λ_{p}. Graphically, an eigenvector can be seen as a line in 2D space, or a plane in higher dimensionally spaces, while its corresponding eigenvalue indicates how much the data is stretched in that direction. Note that, at this stage, some embodiments may sort the columns of the eigenvector matrix P and eigenvalue matrix D in order of decreasing eigenvalues. In other words, the eigenvectors and their corresponding eigenvalues are sorted in decreasing order of significance: the first eigenvector accounts for the most variance, the second for the secondmost, etc. The projection of the dataset into the principal component space is given by Y=XP. This projection can be performed with a reduced number of principal components. Let Y^{1 }be the projected dataset using the top j principal components: Y^{j}=×P^{j}. In the same way, the reverse projection, from the principal component space to the original space, is given by R^{j}=(P^{j}×(Y^{j})^{T})^{T}, where R^{j }is the reconstructed dataset using the top j principal components. This process is schematically depicted in
The outlier score of point X_{i}=[x_{i1 }. . . x_{ip}] may be defined as:
Note that ev(j) represents the percentage of variance explained with the top j principal components. As stated above, eigenvalues may be sorted in decreasing order of significance; therefore ev(j) will be monotonically increasing. This means that, the higher is j, the most variance will be accounted for within the components from 1 to j. With this outlier score definition, large deviations in the top principal components are not heavily weighted, while deviations in the last principal components are. This way, outliers may present large deviations in the last principal components, and thus may receive high scores.
The second outlier detection process that may be employed by an embodiment includes replicator neural networks. This method is similar to the matrix decompositionbased outlier analysis, in the sense that it also relies on a compressionreconstruction analysis. However, in this case, an analyst may train a multilayer neural network to compress and reconstruct the data in such a way that the bulk of the data is reconstructed accurately, but outlier are not. This way, the reconstruction error can be directly translated into an outlier score.
Replicator Neural Networks (RNN), or autoencoders, are multilayer feedforward neural networks. The input and output layers are composed of a reduced number of nodes. As depicted in
where the input vector x and the output vector r are both pdimensional. Given a trained RNN, the reconstruction error is used as the outlier score. Furthermore, test instances incurring a high reconstruction error are considered outliers.
A further statistical outlier detection method that may be employed by an embodiment is a joint probability density based outlier analysis. This detection method is a technique that fits a multivariate model to the data. This technique results in a joint probability distribution that can be used to detect rare events. The outlier score is simply the probability density of a point in the multidimensional space. To build a multivariate model from marginal distributions which are not all Gaussian, some embodiments may exploit copula functions. A copula framework provides a means of interference after modeling a multivariate joint probability distribution from training data.
A copula function C(u_{1}, . . . u_{m}; Θ) with parameter Θ is a joint probability distribution of m continuous random variables, each of them uniformly distributed in [0,1]. According to Sklar'"'"'s theorem, any copula function that takes probability distributions with marginals F_{i}(x_{i}) as its arguments defines a valid joint distribution with marginals F_{i}(x_{i}). Thus, there may be the ability to construct a joint distribution function for x_{1 }. . . x_{m }with arbitrary marginals as
F(x_{1 }. . . x_{m})=C(F_{1})(x_{1}) . . . F_{m}(x_{m});θ. (4)
the joint probability density function (PDF) may obtained by taking the m^{th }order derivation of equation (4)
where c(•) is the copula density.
A multivariate Gaussian copula forms a statistical model given by:
C_{G}(u_{1 }. . . u_{m};Σ)=F_{G}(Φ^{−1}(u_{1}) . . . Φ^{−1}(u_{m});Σ) (6)
where F_{G }is the cumulative distribution function (CDF) of multivariate normal with zero mean vector and Σ as covariance, and Φ^{−1 }is the inverse of the standard normal.
Let Ψ={Σ, Ψ_{i}}_{i=1 . . . m }be the parameters of a joint probability distribution constructed with a copula and m marginal, Ψ_{i }being the parameter of marginal i^{th}. Given N i.i.d observations of the variables x=(x_{11}, . . . , x_{mN}), the loglikelihood function is:
Parameters Ψ are estimated via maximum loglikelihood:
In one configuration, the first step in modeling copula density is to model the individual distributions for each of the one or more features, x_{i}. In the present invention, each feature may be modeled using a nonparametric kernel densitybased method, described by:
where K(•) is a Gaussian kernel with the bandwidth parameter σ. Using this method together with the other known features addressed by this disclosure, two problems may be encountered. The first problem is that most of the features produce extremely skewed distributions, making it hard to set the bandwidth for the Gaussian kernel. Therefore, an embodiment may set the bandwidth parameter is set using Scott'"'"'s rule of thumb. A second problem addressed by this disclosure may be the that some of the variables are discrete ordinal. For copula functions to be useful, the probability density of u_{i}=F(x_{i}) should be uniform, and for discretevalued variables this condition is not met.
As disclosed, some embodiments may perform one or more statistical outlier detection processes including a joint probability process. In some embodiments this joint probability process may comprise identifying discrete variable distributed features derived from the one or more features and adding white Gaussian Copula noise to the discrete variables. This process overcomes the problem of nonuniformity of the probability density of u_{i}=F(x_{i}). Therefore, some embodiments may add additive white Gaussian noise to x_{i}. This transformation gives a continuous valued feature, given by x_{i}^{c}. In the formulation, noise is added to each feature value given by:
x_{i}^{c}=x_{i}+η(0,n_{p}) (10)
where np is variance of the Gaussian distribution η used to add noise. This value is determined by evaluating n_{p}=P_{s}/SNR, where SNR is the desired signaltonoise ratio. P_{s }is the signal power, estimated based on the distribution of all values for the feature x_{i}. In the depicted configuration, for most of the features, the SNR value may be set to 20. The bottom left plot of
At block 122, process flow continues with generating an outlier scores matrix from each detection method performed. As stated above, in some embodiments, two detection methods may be performed. In other embodiments fewer or further detection methods may be performed to obtain results more readily or more accurately. In some embodiments, each of the outlier scores matrix from the detection methods may be converted to a top scores model, as shown in block 124. The top scores from each of the outlier scores matrix may be combined using a probability model to create a single top scores vector, as shown in block 126.
At block 128, process flow continues with the presentation of the single top scores vector and the adaptive rules model via a graphical user interface. An analyst of the enterprise or ecommerce system may view the top scores vector and the adaptive rules model and may input, via the GUI, label information for the statistical outliers as malicious, nonmalicious, or another analyst defined label. Responsive to inputs, embodiments may incorporate the labeled statistical outliers into a labeled features matrix, as shown in block 130. The labeled features matrix identifies one or more rules for identifying threats to the enterprise or ecommerce system.
In some embodiments, the one or more rules comprises a random forest classifier, learning vector quantization, neural network, and combinations thereof. The one or more rules that may be created are essential behavioral rules based on a multidimensional view of the incoming streamed data and/or batch data. Continuing to block 132, an embodiment may create a supervised learning module using the one or more identified threat labels. In some embodiments, this supervised learning module may detect threats in realtime and block and/or challenge the incoming threat. If the threat is detected, the detected threat may be used to modify the one or more statistical models and/or modify the one or more adaptive rules.
The process 100 may be a continuous daily cycle on the enterprise or ecommerce system. Other embodiments may operate on a different cycle as appreciated by those skilled in the art. As shown in
In some embodiments, apparatus 300 comprises one or more processors 336, system memory 338, and one or more nontransitory memory units 340, all of which may be directly or indirectly coupled to each other.
Streamed data 311, batch data 313, or a combination thereof, may be fed into the apparatus 300 through a network interface 334 to a features extraction module 316 which comprises code stored on the one or more nontransitory memory units that when executed by the one or more processors are configured to parse the streamed data 311, batch data 313, or a combination thereof, by grouping or bunching log lines belonging to one or more log line parameters and then computing one or more features from the grouped log lines.
Some embodiments may compute the one or more features by executing an activity tracking module, an activity aggregation, or a combination thereof. An exemplary activity tracking module may, as the system absorbs the log stream generated by the platform, identify the entities involved in each log line, e.g. IP address, user etc., and update the corresponding activity records.
Activity records may be calculated and stored according to two guidelines. The first guideline is a very short temporal window. For an example, in one embodiment, the temporal window over which these activity records are computed and stored is oneminute increments. In this embodiment, the computation of behavioral features is computed for different time intervals:—minutes, 1 hour, 12 hours and 24 hours. This allows flexibility in analysis.
The second guideline is having a design streamlined toward efficient retrieval of the user data necessary for feature computation. Depending on the definition of the feature, aggregating activity records for a larger time window can require anything from simple counters to complex data structures. In activity aggregation, computing behavioral features over an interval of time may require two steps. The first step is retrieving all activity records that fall within the given interval.
The behavioral descriptors are aggregated over 24 hours and end at the time of the last user activity. This can be graphically represented as a rolling 24hour window for feature computation. The second step is to aggregate minutebyminute activity records as the feature demands. Again, this aggregation step depends on the feature type. In the simplest step, counters, one must merely add all the minutebyminute values together. The more complex case of unique values requires retrieving the unique values of a super set formed by the minutetominute sets.
Streamed data 311 may comprise incoming traffic to an enterprise or ecommerce system. Batch data 313 may comprise web server access logs, firewall logs, packet capture per application, active directory logs, DNS logs, forward proxy logs, external threat feeds, AV logs, user logon audits, DLP logs, LB logs, IPS/IDS logs, black listed URLs, black listed IP addresses, black listed referrers, and combinations thereof. The one or more log line parameters may comprise at least one of: user ID, session, IP address, and URL query. The one or more features may be sent to an adaptive rules model 318 where the adaptive rules model 318 comprises code stored on the one or more nontransitory memory units that, when executed by the one or more processors, are configured to compare the one or more features to predetermined adaptive rules of malicious activities, nonmalicious activities or any predetermined rule and blocking one or more features that has one or more identified threat labels, investigating one or more features, or a combination thereof and further generating a features matrix. In the features matrix, the one or more features make up the columns and the one or more log line parameters make up the rows. The features matrix, organized or grouped by sessions, comprises at least one of: user session duration, number of requests in user session, average time between clicks in user session, user session click rate, percentage of image requests in user session, percentage of 4xx responses in user session, percentage of 3xx responses in user session, percentage of 2xx responses in user session, percentage of zip responses in user session, percentage of binary responses in user session, and percentage of head requests in user session. The features of a features matrix, organized or grouped by URL queries, comprises at least one of: length of user URL query, number of characters of user URL query, number of digits of user URL query, and number of punctuations of URL query. The features of a features matrix, organized or grouped by user ID, comprises at least one of: number of checkouts, number of credit cards added, number of promo codes added, number of gift cards added, number of times items were shipped overnight, and number of times new shipping address was added. The features of a features matrix, organized or grouped by IP address, comprises at least one of: number of login failures, number of login successes, number of password resets, and total number of requests.
An embodiment may include a predictive module 319, which comprises code stored on the one or more nontransitory memory units that when executed by the one or more processors are configured to compare the one or more features to predetermined adaptive rules of malicious activities, nonmalicious activities or any predetermined rule. The predictive module 319 may block one or more features that has one or more identified threat labels, investigate one or more features, or a combination thereof. Blocking one or more features may prevent a malicious activity by issuing a warning to the system, analyst, or a combination thereof. Investigating one or more features may involve an analyst investigating a labeled feature and determining if the label is correctly or incorrectly labeled, changing the label, or a combination thereof.
The features matrix is then sent to an unsupervised learning module 320 which comprises code stored on the one or more nontransitory memory units that when executed by the one or more processors is configured to use two groups of statistical outlier detection methods, such as matrix decompositionbased method, replicator neural networks process, joint probability density process, to identify statistical outliers.
In some embodiments the one or more log line parameters of the features matrix are ranked by the top scores module 322 and rearranged by probability by the outlier probabilities module 324.
In some embodiments, at least one of: the statistical outliers and the adaptive rules model are presented onto a graphical user interface 346, so that an analyst of the enterprise or ecommerce system may manually identify the statistical outliers as malicious, nonmalicious, or another analyst defined label via a keyboard 344 connected to a user input interface 342. The statistical outliers are then labeled as malicious, nonmalicious, or other analyst defined label in order to create one or more labeled features matrix. The one or more labeled features matrix is then sent to a supervised learning module 328 which comprises code stored on the one or more nontransitory memory units that, when executed by the one or more processors, are configured to create from the one or more labeled features matrix, one or more rules for identifying threats to the enterprise or ecommerce system.
The one or more rules may comprise a random forest classifier, learning vector quantization, a neural network, and combinations thereof. The one or more rules that are created are essentially behavioral rules based on a multidimensional view of the incoming streamed data 311 and/or batch data 313. The one or more rules may be sent to one or more threat detectors (not shown) for real time monitoring of the streamed data 311. The one or more rules may also be posted to a cloud server (not shown) or distributed to other third parties to be used in their firewall rules set. In some embodiments, public labelling data may be input into system rules. In some embodiments, labelling of statistical threats may be publicly available. If threats are not detected by the one or more threat detectors, the incoming data traffic is allowed to continue to the enterprise or ecommerce system. If threats are detected by the one or more threat detectors, the incoming data traffic to the enterprise or ecommerce system may be blocked and/or challenged. In some embodiments, if a threat is detected, the detected threat may be used to modify the unsupervised learning module 320 and/or to modify the one or more adaptive rules generated by the adaptive rules model 318.
In another embodiment,
A technique to produce an endtoend system that may combine analyst intelligence with stateoftheart machine learning techniques to detect new attacks and reduce the time elapsed between attack detection and successful prevention has been disclosed. Key advantages for the system are that it overcomes limited analyst bandwidth and the weaknesses of unsupervised learning, and it actively adapts and synthesizes new models.
The benefits and advantages that may be provided by the present invention have been described above with regard to specific embodiments. These benefits and advantages, and any elements or limitations that may cause them to occur or to become more pronounced, are not to be construed as critical, required, or essential features of any or all of the claims. As used herein, the terms “comprises,” “comprising,” or any other variation thereof, are intended to be interpreted as nonexclusively including the elements or limitations which follow those terms. Accordingly, a system, method, or other embodiment that comprises a set of elements is not limited to only those elements, and may include other elements not expressly listed or inherent to the claimed embodiment.
The present disclosure exploits ideas from a wide range of fields, including outlier analysis, ensemble learning, active learning, information security, features analytics and big data computing.