Anomaly detection in dynamically evolving data and systems

0Associated
Cases 
0Associated
Defendants 
0Accused
Products 
31Forward
Citations 
0
Petitions 
1
Assignment
First Claim
1. A method, comprising:
 a) obtaining a data set of network traffic comprising Ndimensional data points from a traffic analyzer, wherein N>
3 and wherein the traffic analyzer is configured to generate a statistics matrix comprising the Ndimensional data points;
b) by a computer,i. processing the statistics matrix into a Markov kernel matrix,ii. processing the Markov kernel matrix to obtain processed data points with a dimension r lower than N, wherein the processing includes finding r discriminating eigenvectors by providing i=1, . . . , r eigenvalues and respective associated eigenvectors, generating for each i=2, . . . , r a respective i^{th }cluster based on the i^{th }eigenvector and generating other respective clusters based on eigenvectors 1, . . . , i−
1, i+1, . . . , r, computing a distance between each respective i^{th }cluster based on the i^{th }eigenvector and each of the other respective clusters based on eigenvectors 1, . . . , i−
1, i−
1, . . . , r, and finding r eigenvalues and associated respective eigenvectors that provide the highest distance, the associated respective eigenvectors that provide the highest distance being the discriminating eigenvectors,wherein the r discriminating eigenvectors thus found form an embedded space in which processed data points with a reduced dimension r form a normal cluster,iii. detecting an abnormal data point in the processed data points with a dimension r lower than N without relying on a signature of a threat and without use of a threshold, andiv. blocking the abnormal data point.
1 Assignment
0 Petitions
Accused Products
Abstract
Detection of abnormalities in multidimensional data is performed by processing the multidimensional data to obtain a reduced dimension embedding matrix, using the reduced dimension embedding matrix to form a lower dimension (of at least 2D) embedded space, applying an outofsample extension procedure in the embedded space to compute coordinates of a newly arrived data point and using the computed coordinates of the newly arrived data point and Euclidean distances to determine whether the newly arrived data point is normal or abnormal.
47 Citations
View as Search Results
IDENTIFYING DEVIATIONS IN DATA  
Patent #
US 20160352767A1
Filed 01/24/2014

Current Assignee
Hewlett Packard Enterprise Development LP

Sponsoring Entity
Hewlett Packard Enterprise Development LP

METHOD AND APPARATUS FOR COMPUTING CELL DENSITY BASED RARENESS FOR USE IN ANOMALY DETECTION  
Patent #
US 20160359685A1
Filed 04/05/2016

Current Assignee
Cisco Technology Incorporated

Sponsoring Entity
Cisco Technology Incorporated

METHOD AND SYSTEM FOR IDENTIFYING UNCORRELATED SUSPICIOUS EVENTS DURING AN ATTACK  
Patent #
US 20170171240A1
Filed 10/13/2016

Current Assignee
Check Point Software Technologies Limited

Sponsoring Entity
Check Point Software Technologies Limited

SCENARIO DRIVEN, TECHNOLOGY AGNOSTIC NETWORK SIMULATION  
Patent #
US 20180123900A1
Filed 10/28/2016

Current Assignee
CA Inc. dba CA Technologies

Sponsoring Entity
CA Inc. dba CA Technologies

USING FREQUENCY ANALYSIS IN ENTERPRISE THREAT DETECTION TO DETECT INTRUSIONS IN A COMPUTER SYSTEM  
Patent #
US 20180176238A1
Filed 12/15/2016

Current Assignee
SAP SE

Sponsoring Entity
SAP SE

Using natural language processing for detection of intended or unexpected application behavior  
Patent #
US 10,015,181 B2
Filed 05/19/2016

Current Assignee
International Business Machines Corporation

Sponsoring Entity
International Business Machines Corporation

Robust representation of network traffic for detecting malware variations  
Patent #
US 10,187,412 B2
Filed 11/19/2015

Current Assignee
Cisco Technology Incorporated

Sponsoring Entity
Cisco Technology Incorporated

Systems and methods for detecting obscure cyclic applicationlayer message sequences in transportlayer message sequences  
Patent #
US 10,200,259 B1
Filed 09/21/2016

Current Assignee
CA Inc. dba CA Technologies

Sponsoring Entity
Symantec Corporation

Security system monitoring techniques by mapping received security score with newly identified security score  
Patent #
US 10,341,369 B2
Filed 03/29/2016

Current Assignee
NCR Corporation

Sponsoring Entity
NCR Corporation

Detecting the status of a mesh node in a wireless mesh network  
Patent #
US 10,362,500 B2
Filed 09/12/2014

Current Assignee
ABB Schweiz AG

Sponsoring Entity
ABB Schweiz AG

Method and system for identifying uncorrelated suspicious events during an attack  
Patent #
US 10,462,160 B2
Filed 10/13/2016

Current Assignee
Check Point Software Technologies Limited

Sponsoring Entity
Check Point Software Technologies Limited

Visualization of data distributed in multiple dimensions  
Patent #
US 10,482,241 B2
Filed 08/24/2016

Current Assignee
SAP SE

Sponsoring Entity
SAP SE

Attack observation apparatus and attack observation method  
Patent #
US 10,491,628 B2
Filed 09/17/2014

Current Assignee
Mitsubishi Electric Corporation

Sponsoring Entity
Mitsubishi Electric Corporation

Method and apparatus for computing cell density based rareness for use in anomaly detection  
Patent #
US 10,505,819 B2
Filed 04/05/2016

Current Assignee
Cisco Technology Incorporated

Sponsoring Entity
Cisco Technology Incorporated

Pattern creation in enterprise threat detection  
Patent #
US 10,530,794 B2
Filed 06/30/2017

Current Assignee
SAP SE

Sponsoring Entity
SAP SE

Using frequency analysis in enterprise threat detection to detect intrusions in a computer system  
Patent #
US 10,530,792 B2
Filed 12/15/2016

Current Assignee
SAP SE

Sponsoring Entity
SAP SE

Alerts based on entities in security information and event management products  
Patent #
US 10,534,908 B2
Filed 12/06/2016

Current Assignee
SAP SE

Sponsoring Entity
SAP SE

Providing semantic connectivity between a java application server and enterprise threat detection system using a J2EE data  
Patent #
US 10,534,907 B2
Filed 12/15/2016

Current Assignee
SAP SE

Sponsoring Entity
SAP SE

Realtime triggering framework  
Patent #
US 10,536,476 B2
Filed 07/21/2016

Current Assignee
SAP SE

Sponsoring Entity
SAP SE

Location enrichment in enterprise threat detection  
Patent #
US 10,542,016 B2
Filed 08/31/2016

Current Assignee
SAP SE

Sponsoring Entity
SAP SE

System and method for clustering files and assigning a maliciousness property based on clustering  
Patent #
US 10,546,143 B1
Filed 11/14/2017

Current Assignee
Support Intelligence Inc.

Sponsoring Entity
Support Intelligence Inc.

Anomaly detection in enterprise threat detection  
Patent #
US 10,552,605 B2
Filed 12/16/2016

Current Assignee
SAP SE

Sponsoring Entity
SAP SE

Identifying deviations in data  
Patent #
US 10,560,469 B2
Filed 01/24/2014

Current Assignee
Hewlett Packard Enterprise Development LP

Sponsoring Entity
Hewlett Packard Enterprise Development LP

Adaptive modeling of data streams  
Patent #
US 10,565,331 B2
Filed 07/27/2017

Current Assignee
International Business Machines Corporation

Sponsoring Entity
International Business Machines Corporation

Method for detecting abnormal load in cloud computing oriented online service  
Patent #
US 10,581,961 B2
Filed 10/17/2017

Current Assignee
Tsinghua University

Sponsoring Entity
Tsinghua University

Realtime push API for log events in enterprise threat detection  
Patent #
US 10,630,705 B2
Filed 09/23/2016

Current Assignee
SAP SE

Sponsoring Entity
SAP SE

System, method, and computer program for detection of anomalous user network activity based on multiple data sources  
Patent #
US 10,645,109 B1
Filed 03/29/2018

Current Assignee
Exabeam Inc.

Sponsoring Entity
Exabeam Inc.

Continuous authentication system and method based on BioAura  
Patent #
US 10,652,237 B2
Filed 02/06/2017

Current Assignee
Purdue Research Foundation

Sponsoring Entity
The Trustees of Princeton University

Snapshot of a forensic investigation for enterprise threat detection  
Patent #
US 10,673,879 B2
Filed 09/23/2016

Current Assignee
SAP SE

Sponsoring Entity
SAP SE

Analysis of complex relationships among information technology securityrelevant entities using a network graph  
Patent #
US 10,681,064 B2
Filed 12/19/2017

Current Assignee
SAP SE

Sponsoring Entity
SAP SE

Intelligent preprocessing of multidimensional timeseries data  
Patent #
US 10,740,310 B2
Filed 03/19/2018

Current Assignee
Oracle International Corporation

Sponsoring Entity
Oracle International Corporation

Method and Apparatus for WholeNetwork Anomaly Diagnosis and Method to Detect and Classify Network Anomalies Using Traffic Feature Distributions  
Patent #
US 20100071061A1
Filed 06/29/2006

Current Assignee
Trustees of Boston University

Sponsoring Entity
Trustees of Boston University

Systems and Methods for Dynamic Anomaly Detection  
Patent #
US 20090234899A1
Filed 03/11/2008

Current Assignee
Paragon Science Inc.

Sponsoring Entity
Paragon Science Inc.

SYSTEM AND METHOD FOR ANALYSIS OF ELECTRONIC INFORMATION DISSEMINATION EVENTS  
Patent #
US 20090241197A1
Filed 03/19/2008

Current Assignee
Forcepoint LLC

Sponsoring Entity
Forcepoint LLC

Diffusion bases methods for segmentation and clustering  
Patent #
US 20080181503A1
Filed 01/30/2007

Current Assignee
Alon Schclar, Amir Zeev Averbuch

Sponsoring Entity
Alon Schclar, Amir Zeev Averbuch

Traffic anomaly analysis for the detection of aberrant network code  
Patent #
US 20070064617A1
Filed 09/16/2005

Current Assignee
Hewlett Packard Enterprise Development LP

Sponsoring Entity
Hewlett Packard Enterprise Development LP

Network status display device and method using traffic flowradar  
Patent #
US 20070206498A1
Filed 11/15/2006

Current Assignee
Electronics and Telecommunications Research Institute

Sponsoring Entity
Electronics and Telecommunications Research Institute

Method for generating a lowdimensional representation of highdimensional data  
Patent #
US 20060045353A1
Filed 09/02/2004

Current Assignee
Mitsubishi Electric Research Laboratories

Sponsoring Entity
Mitsubishi Electric Research Laboratories

Intrusion detection system  
Patent #
US 20060156404A1
Filed 03/03/2006

Current Assignee
Longhorn HD LLC

Sponsoring Entity
STEELCLOUD INC.

Semanticallyaware network intrusion signature generator  
Patent #
US 20060212942A1
Filed 03/21/2005

Current Assignee
Wisconsin Alumni Research Foundation

Sponsoring Entity
Wisconsin Alumni Research Foundation

Methods for performing packet classification  
Patent #
US 20060221967A1
Filed 03/31/2005

Current Assignee
Intel Corporation

Sponsoring Entity
Intel Corporation

Denial of service attacks characterization  
Patent #
US 20030145232A1
Filed 01/31/2002

Current Assignee
Riverbed Technology Incorporated

Sponsoring Entity
Riverbed Technology Incorporated

Systems and methods for network security  
Patent #
US 20030236990A1
Filed 06/03/2002

Current Assignee
Extreme Networks Inc.

Sponsoring Entity
AirDefense Incorporated

Automated benchmarking with self customization  
Patent #
US 5,446,874 A
Filed 12/23/1993

Current Assignee
International Business Machines Corporation

Sponsoring Entity
International Business Machines Corporation

Direct digital control of rubber molding presses  
Patent #
US 4,344,142 A
Filed 08/06/1975

Current Assignee
FederalMogul Worldwide Incorporated

Sponsoring Entity
FederalMogul Corporation

Secure enterprise network  
Patent #
US 8,166,554 B2
Filed 01/25/2005

Current Assignee
VMware Inc.

Sponsoring Entity
VMware Inc.

Identifying significant behaviors within network traffic  
Patent #
US 8,204,974 B1
Filed 08/30/2005

Current Assignee
Sprint Communications Company LP

Sponsoring Entity
Sprint Communications Company LP

18 Claims
 1. A method, comprising:
a) obtaining a data set of network traffic comprising Ndimensional data points from a traffic analyzer, wherein N>
3 and wherein the traffic analyzer is configured to generate a statistics matrix comprising the Ndimensional data points;b) by a computer, i. processing the statistics matrix into a Markov kernel matrix, ii. processing the Markov kernel matrix to obtain processed data points with a dimension r lower than N, wherein the processing includes finding r discriminating eigenvectors by providing i=1, . . . , r eigenvalues and respective associated eigenvectors, generating for each i=2, . . . , r a respective i^{th }cluster based on the i^{th }eigenvector and generating other respective clusters based on eigenvectors 1, . . . , i−
1, i+1, . . . , r, computing a distance between each respective i^{th }cluster based on the i^{th }eigenvector and each of the other respective clusters based on eigenvectors 1, . . . , i−
1, i−
1, . . . , r, and finding r eigenvalues and associated respective eigenvectors that provide the highest distance, the associated respective eigenvectors that provide the highest distance being the discriminating eigenvectors,wherein the r discriminating eigenvectors thus found form an embedded space in which processed data points with a reduced dimension r form a normal cluster, iii. detecting an abnormal data point in the processed data points with a dimension r lower than N without relying on a signature of a threat and without use of a threshold, and iv. blocking the abnormal data point.  View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
 17. A system, comprising:
a) a traffic analyzer for providing a data set of network traffic comprising Ndimensional data points, wherein N>
3 and wherein the traffic analyzer is configured to generate a statistics matrix comprising the Ndimensional data points; andb) a computer program stored on a nontransitory computer readable medium, the computer program dedicated to; i. process the statistics matrix into a Markov kernel matrix, ii. process the Markov kernel matrix to obtain processed data points with a reduced dimension r lower than N by forming an embedded space defined by r discriminating eigenvectors, wherein r≦
N and wherein the discriminating eigenvectors are selected by providing i=1, . . . , r eigenvalues and respective associated eigenvectors, and, for each i=2, . . . , r, by generating a respective i^{th }cluster based on the i^{th }eigenvector and other clusters based on eigenvectors 1, . . . , i−
1, i+1, . . . , M, by computing a distance between each i^{th }cluster and each of the other clusters, and by finding the r eigenvalues and their respective eigenvectors that provide the highest distance to define a normal cluster of processed data points with reduced dimension M,iii detect an abnormal data point in the processed data points with reduced dimension r without relying on a signature of a threat and without use of a threshold, and iv. block the abnormal data point.  View Dependent Claims (18)
1 Specification
This application is a Continuation application of U.S. patent application Ser. No. 12/263,473 filed Nov. 2, 2008 which claims the benefit of U.S. Provisional patent applications 60/985,172 and 60/985,176, both filed Nov. 2, 2007 and both of which are incorporated herein by reference in their entirety.
The invention relates in general to methods and systems for detection of anomalies (abnormalities) that deviate from normal behavior in multidimensional data and more particularly to online (realtime) anomaly detection in such data.
Huge amounts of data are generated by many sources. “Data” refers to a collection of organized information, the result of experience, observation or experiment, other information within a computer system, or a set of premises that may consist of numbers, characters, images, or as measurements of observations. Data usually comes from conversion of physical quantities from observations or measurements into symbols (also called sampling). Data also refers to a collection of numbers, characters, images or outputs from devices that convert physical quantities into symbols. Such data is typically further processed by a human or input into a computer, stored and processed there, or transmitted to another human or computer.
Data is structured in known formats. When data is transferred or received continuously or intermittently in a time dependent fashion, the data is said to be “streamed” in a data stream. “Packetoriented” data refers to a collection of basic units of structured information in a data stream. In communication networks, packet oriented data contain headers and payload. “Connectionoriented” data refers to a collection of packetoriented data.
In many cases, the data is highdimensional (also called multidimensional), where a data dimension n (or “N”)>3. If source (“original” or “raw”) data is described for example by 25 measured parameters (“features”) that are sampled (recorded, measured) in every predetermined time interval (e.g. every minute), then the data is of dimension n=25. Multidimensional data is a collection of data points. A “data point” (also referred to herein as “sample”, “sampled data”, “point”, “vector of observations”, “vector of measurements”) is one unit of data of the original (source) multidimensional data that has the same structure as the original data. A data point maybe expressed by boolean, integer and real characters.
In this invention, “features” refers to the individual measurable properties of the phenomena being observed. Features are usually numeric, but may be structural such as strings. “Feature” is also normally used to denote a piece of information which is relevant for solving the computational task related to a certain application. More specifically, features can refer to specific structures ranging from simple structures to more complex structures such as objects. The feature concept is very general and the choice of features in a particular application may be highly dependent on the specific problem at hand.
In particular, highdimensional data, with all its measured features and available sources of information (e.g. databases), may be classified as heterogeneous highdimensional data or simply as “heterogeneous data”. The term “heterogeneous” means the data includes data points assembled from numbers and characters having different meanings, different scales and possibly different origins or sources. The process of finding similar areas that identify common (similar) trends is called clustering, and these areas are called clusters. Heterogeneous data may change constantly in time, in which case the data is called “heterogeneous dynamic data”.
In known art, highdimensional data is incomprehensible to understand, to draw conclusions from or to find anomalies in that deviate from a “normal” behavior. Throughout this invention, the terms “anomaly”, “abnormality” and “intrusion” are used interchangeably. Similarly, the terms “cluster” and “manifold” are also used interchangeably.
Network Intrusion Detection
Assume for example that an entity such as a network, device, appliance, service, system, subsystem, apparatus, equipment, resource, behavioral profile, inspection machine, performance or the like is monitored per time unit. Assume further that major activities in incoming streamed multidimensional data obtained through the monitoring are recorded, i.e. a long series of numbers and/or characters are recorded in each time unit. The numbers or characters represent different features that characterize the activities in or of the entity. Often, such multidimensional data has to be analyzed to find specific trends (abnormalities) that deviate from “normal” behavior. An intrusion detection system (“IDS”, also referred to as anomaly detection system or “ADS”) is a typical example of a system that performs such analysis.
An intrusion detection system attempts to detect all types of malicious network traffic and malicious computer uses (“attacks”) which cannot be detected by conventional protection means such as firewalls. These attacks may include network attacks against vulnerable services, data driven attacks on applications, host based attacks such as privilege escalation, unauthorized logins and access to sensitive files, malware (viruses, Trojan horses, and worms) and other sophisticated attacks that exploit every vulnerability in the data, system, device, protocol, webclient, resource and the like. A “protocol” (also called communication protocol) in the field of telecommunications is a set of standard rules for data representation, signaling, authentication and error detection required to send information over a communication channel. The communication protocols for digital computer network communication have many features intended to ensure reliable interchange of data over an imperfect communication channel. A communication protocol means basically certain rules so that the system works properly. Communication protocols such TCP/IP and UDP have a clear structure.
A network intrusion detection system (NIDS) tries to detect malicious activities such as DoS, distributed DoS (DDoS), portscans or even attempts to crack into computers by monitoring network traffic while minimizing the rate of false alarms and missdetections. A NIDS operates by scanning all the incoming packets while trying to find suspicious patterns. If, for example, a large number of requests for TCP connections to a very large number of different ports is observed, one can assume that someone is committing a port scan at some of the computers in the network.
A NIDS has unlimited ability to inspect only incoming network traffic. Often, valuable information about an ongoing intrusion can be learned from outgoing or local traffic as well. Some attacks may even be staged from inside the monitored network or network segment (“internal attacks”), and are therefore not regarded as incoming traffic at all. However, they are considered as major threats that have to be treated. Internal attacks can be either intentional or unintentional.
A NIDS has to handle large networks by processing and analyzing packets from and to many (hundreds and thousands) of network devices. In these networks, a human operator is assigned to the task. The operator has to decide if the network functions properly or if some immediate action needs undertaking. However, the operator is incapable of understanding, compiling and processing huge amounts of data or making fast decisions because of the huge volume of data. This problem can be looked at as a data mining problem—finding patterns that deviate from normal behavior in an ocean of numbers and information that is constantly dynamically changed. The operator cannot handle malicious attacks and malicious usage of networks because: these attacks can develop and evolve slowly; more and more protocols in network environments are encrypted; analysis of the payload is impractical due to encryption and privacy violation; there are rapid changes in protocols and there is an avalanche of new protocols and applications (many per year); network applications become more and more masqueraded and thus difficult to identify; identification of unauthorized applications becomes more difficult; there is a growing number of hidden attacks and applications under HTTP and P2P protocols that try to “hijack’ systems, and the like. All of these make it more difficult to detect malicious uses of network systems.
IDS and NIDS have become integral components in security systems. The challenge is to perform online IDS and NIDS without missdetections and false alarms. Throughout the rest of this disclosure, “online” is used among other things to mean an algorithm that can efficiently process the arrival of new samples from high bandwidth networks in realtime. To achieve online intrusion detection, most systems use signatures of intrusions which are developed and assembled manually after a new intrusion is exposed and distributed to the IDS clients. This approach is problematic because these systems detect only alreadyknown intrusions (yesterday'"'"'s attacks) but fail to detect new attacks (zero day attacks). In addition, they do not cover a wide range of high quality, new, sophisticated emerging attacks that exploit many network vulnerabilities.
Similar problems of identifying abnormalities in data are encountered in many network unrelated applications, e.g. in the control or monitoring of a process that requires detection of any unusual occurrences in realtime. One example is the realtime (online) detection of mastitis in dairy farming. Mastitis is expressed by abnormal somatic cell counts (SCC) and its detection may be significantly aided by detection of abnormal SCC counts or of other milk parameters during milking Automatic mastitis detection using different statistical methods is reviewed by David Cavero Pintado, PhD Dissertation, Christian Albrechts University, Kiel, Germany, 2006. In the past, statistical methods were applied to data which included number of milkings, electrical conductivity, milk yield, milk flow rate and SCC. However, such statistical methods fail to provide adequate warning for the appearance or occurrence of mastitis.
The invention provides a framework (methods, system and algorithms) based on diffusion processes and diffusion geometries for finding meaningful geometric descriptions in large, uniform, heterogeneous and distributed multidimensional data captured from different sensors. Eigenfunctions of generated underlying Markov matrices are used to construct diffusion maps (called hereinafter “RLDM” and described exemplarily in R. R. Coifman and S. Lafon, “Diffusion maps”, Applied and Computational Harmonic Analysis, 21(1), 530, 2006 and in US patent application 2006/0004753 A1) or diffusion bases (called hereinafter “AADB” and described exemplarily in A. Schclar and A. Averbuch, “Diffusion bases methods for segmentation and clustering”, U.S. patent application Ser. No. 11/699,359). These, with or without random projections (called hereinafter “RP” and described exemplarily in W. B. Johnson and J. Lindenstrauss, “Extensions of Lipshitz mapping into Hilbert space”, Volume 26 of Contemporary Mathematics, pp. 189206, Amer. Math. Soc., 1984), generate efficient representations of complex high dimensional geometric structures in a lower (reduced) dimension space (also called “embedded space”) for further analysis. An associated family of diffusion distances, obtained by iterating a Markov matrix, defines multiscale (coarsegraining) geometries. The spectral properties of Markov processes are related to their geometric counterparts. The dimensionality of the data is reduced in a nonlinear way to the reduceddimension space where all the sought after information lies. The nonlinear dimension reduction also enables to classify the data and to analyze it in the reduced—dimension space, without violating the integrity and the coherence of the original multidimensional data. The invention enables to find anomalies that deviate from normal behavior in dynamically changing multidimensional data.
Unlike in known art, the classification of multidimensional data points as normal or abnormal is done by the application of an outofsample extension algorithm which provides coordinates (parameterization) for each newly arrived data point in the embedded space. “Outofsample extension” is defined here as the action of providing diffusion coordinates to each newly arrived data point in the embedded space. Thus, the application of outofsample extension enables, upon arrival of each new data point, to determine whether the newly arrived data point in a cluster of normal activities or outside (deviates, abnormality) a cluster. The organization of the empirical observations into simpler lowdimensional structures is enabled by spectral and harmonic analysis of the nonlinear embedding and by the application of the outofsample extension.
In a method of the invention, a large number of observable quantities from the multidimensional input data are organized as data points. In some embodiments, each data point (vector of observation) comprises a plurality (normally larger than 3) of different parameters measured simultaneously in a time unit. The collection of such data points is considered to be a “surrogate to the system” and is organized as a graph in which various vectors of data points are linked by their similarity. The similarity is a measure imposed by the user. A diffusion similarity metrics imposes a similarity relationship between any two data points by computing all combinations among pairs of data points. Clustering of these data points in the similarity metrics characterizes different system regimes, such that all the normal data points are inside “normal” clusters and all abnormal data points are outside the same clusters. Various local criteria of linkage between data points lead to distinct geometries. In these geometries, the user can redefine relevance via a similarity measure, and this way filter away unrelated information. Self organization of network traffic data points is achieved through local similarity modeling. The choice of the eigenfunctions of a normalized similarity matrix provides global organization of the given set of data points. The diffusion maps and diffusion bases embed the data into a lowdimensional space and convert isometrically the (diffusion) relational inference metrics (also called “diffusion similarity matrix”) to a corresponding Euclidean distance.
Diffusion coordinates are assigned to each newly arrived data point without having to recompute the diffusion maps or diffusion bases as new data streams in. The Euclidean distance represents the computed diffusion metrics in the lowdimensional embedding using the diffusion maps or bases. The total computation time scales linearly with the data size and can be updated online. The diffusion maps/bases enable data exploration and perceptualization, since they convert complex similarity chains to an ordinary physical distance in the embedded reduced space, thus providing situational awareness of the state of the system. This performs the outofsample extension procedure, which enables to calculate the coordinates of each arrival of a new data point.
A method of the invention includes two major sequential procedures:
 1. Training (“learning”): Normal activities of incoming multidimensional data are studied and parameters from “normal data” (also called “training data”) are extracted. The training procedure is called once or a very limited number of times during an operation cycle to create an embedding matrix. The embedding matrix is created by finding the geometry (manifold) on which original “normal” multidimensional data resides, followed by dimensionality reduction of the collected normal data. This is a nonlinear transformation from multidimensional data representation to an embedded lower dimension space, which also reveals the underlying features and parameters that govern these original data. The feature extraction procedure, followed by its embedding in lower dimension space, describes faithfully the normal behavior of measured data or monitored system and data. After analysis, each training dataset (or “training set”) represents a typical normal profile of the activities in the incoming multidimensional data. The training procedure clusters the data into “normal” clusters. Since the training procedure is always done offline, it can be updated in the background all the time. Therefore, it supports steady online construction of training data to replace current training data, if the latter deviate from the current training profile. If the training data is partially corrupted, it can still be useful to determine the normal behavior of the incoming multidimensional data. The above training procedure (extraction of parameters and their embedding in lower dimension space) can overcome a situation in which a portion of the training data is corrupted.
 2. Detection: This is the offline or online application of automatic (unsupervised) tools which detect events (anomalies) that deviate from the normal behavior determined in the training procedure. The detection procedure classifies each newly arrived data point as either normal (belonging to a normal cluster derived in the training procedure) or abnormal (representing either intrusion or “strange” behavior). The classification is inventively done by the application of the outofsample extension algorithm, which provides coordinates for each newly arrived data point in the reduced dimension (embedded) space, to decide whether the newly arrived data point is “normal” (located inside a normal cluster) or “abnormal” (located outside a normal cluster). The classification procedure analyzes constantly the behavior of a dynamic data in order to detect anomalies that deviate from their normal behavior.
According to the invention, there is provided a computer implemented method for detection of abnormalities in multidimensional data comprised of multidimensional data points, the method including the steps of: processing the multidimensional data to obtain a reduced dimension embedding matrix; using the reduced dimension embedding matrix to form an embedded space; applying an outofsample extension procedure in the embedded space to compute coordinates of a newly arrived data point; and determining whether the newly arrived data point is normal or abnormal based on its computed coordinates.
According to the invention, there is provided a computer implemented method for detection of abnormalities in Ndimensional data comprised of Ndimensional data points where N>3, the method including the steps of: processing a plurality M of Ndimensional data points into a normal cluster in an embedded space having a dimension D lower than N; and applying an outofsample extension procedure to a newly arrived Ndimensional data point which does not belong to the plurality M, to determine whether the newly arrived Ndimensional data point belongs or does not belong to the normal cluster, thereby classifying the newly arrived Ndimensional data point as, respectively, normal or abnormal.
In some embodiments, the methods are applied to communication network intrusion detection.
In some embodiments, the methods are applied to hyperspectral imaging.
In some embodiments, the methods are applied to semiconductor wafer inspection.
In some embodiments, the multidimensional data includes financial data and the methods are applied to financial markets monitoring.
In some embodiments, the multidimensional data includes financial transaction data and the methods are applied to financial transactions monitoring.
In some embodiments, the multidimensional data includes physiological data and the methods are applied to human health monitoring.
In some embodiments, the methods are applied to mastitis detection.
According to the invention, there is provided an anomaly detection system (ADS) for detection of abnormalities in Ndimensional data comprised of Ndimensional data points where N>3, the ADS running a computer program stored on a computer readable medium, the computer program dedicated to processing a plurality M of Ndimensional data points into a normal cluster in an embedded space having a dimension D lower than N and to classifying a newly arrived Ndimensional data point which does not belong to the plurality M as normal or abnormal by applying an outofsample extension procedure that determines whether the newly arrived Ndimensional data point belongs or does not belong to the normal cluster.
In some embodiments, the multidimensional data includes network traffic data.
In some embodiments, the multidimensional data includes hyperspectral data.
In some embodiments, the multidimensional data includes financial data.
In some embodiments, the multidimensional data includes human physiological data.
In some embodiments, the multidimensional data includes process monitoring data.
In some process monitoring embodiments, the monitored multidimensional data is semiconductor process data.
In some embodiments, the multidimensional data includes data selected from the group consisting of optical data, electrical data, mechanical data, acoustical data, magnetic data, flow data, heat data and a combination thereof.
In some embodiments, the multidimensional data is acquired during milking or another realtime process.
The invention is herein described, by way of example only, with reference to the accompanying drawings, wherein:
It is to be understood that the X, Y, Z coordinates in
In general, the number of discriminating eigenvectors may differ from three, and more specifically may be r, where r≧2. The boxes and the numbers in
This description uses communication networks as exemplary entities, without in any way limiting the scope of the invention to networks alone. A communication network is just one example of an entity having, producing, supplying or transferring data, and the application of the methods and algorithms described in detail below with reference to metadata or packet data in networks is only one exemplary application that can be extended to any type of data of any other entity. “Metadata” is “data about data” of any sort in any medium. An item of metadata may describe an individual data point or content item, or a collection of data including multiple content items and hierarchical levels, for example a database schema.
 1. Matrices with statistical information on metadata (step 312 in
FIG. 3 ). These matrices include the statistics on various metadata values collected by the traffic analyzers. A POTA yields a matrix with the statistical occurrences of different metadata in the processed data (step 328 inFIG. 3 ), e.g. IP, ICMP and TCP. A COTA yields two matrices (step 340 inFIG. 3 ): one with the statistical occurrences on the collection of metadata, e.g. TCP, and another with statistical occurrences of the metadata in the collected data, e.g. HTTP requestresponse pairs.  2. Matrices with statistics of the payload. These matrices include the statistics of the ngram (described exemplarily in C. D. Manning, H. Schütze, Foundations of Statistical Natural Language Processing, MIT Press: 1999, called hereinafter ngram) from the connectionoriented traffic analyzer, i.e. the information collected by the COTA from the HTTP for different types of HTTP data units: request URI, request headers, request body, request payload, response status, response headers, response body and response payload.
In nonnetwork related data, statistical matrices may be generated from incoming multidimensional data of other types, for example (but not limited to) pixels in images; measurements generated by a device (equipment), by a performance monitor, for example one installed in a computer that handles financial transactions and generates data about CPU utilization, queues of transactions, virtual memory usage, disk usages, communication queues, etc; measurements generated by resource profiling which provides utilization data of this resource; measurements of a physical property including an electrical, optical, mechanical, acoustic, magnetic, heat, pressure, flow property; human physiological data including pulse, temperature, heart rate, EEG data, EKG data, blood oxygenation, blood sugar, blood pressure and the like.
 1. Matrices with statistical information on metadata (step 312 in
In step 202, each statistical matrix is normalized to obtain a respective Markov matrix. This can be done using normalization procedures known in the art, or using a specific normalization procedure described in steps 406 in
 1. Offline intrusion detection (OFID). OFID is applied to analyze offline raw multidimensional data. For example, OFID fits postmortem analysis of log data files. In this case, all the data is analyzed and processed at once. Fewer efforts are channeled to improve the efficiency of the algorithms and their operations. Furthermore, the offline data analysis contains all the information needed for performing anomaly detection. Therefore, anomaly detection is performed in a single offline stage that includes two “substeps”: training and detection, performed consecutively.
 2. Online (realtime) intrusion detection (OLID). OLID is applied to analyze realtime raw multidimensional data. For example, it is suitable for analysis and detection of online multidimensional data which is constantly streamed through a network, or of data obtained from realtime measurements in process monitoring. In this case, the data is analyzed and processed continuously. Here, the efficiency of the algorithms and their operation are critical. In contrast with OFID, OLID involves a single offline initial training step (100 in
FIG. 1 ) for a predetermined training period, followed by detection (102106 inFIG. 1 ) applied online. The training generates the infrastructure for the detection (the “normal clusters”). In communications related applications, OLID using metadata is called hereinafter OLIDMD and OLID using payloads is called hereinafter OLIDPL. More details of each process are given next.
OFID
The OFID process is described exemplarily, for ease of understanding, with reference to packet traffic over a communication system such as the Internet. It should be understood that this is a nonlimiting example and that the anomaly processing described herein may be applied to any multidimensional data stream or data from any entity, as discussed above. The original multidimensional data is collected by either POTA or COTA and processed into dense statistical matrices which contain features with different measurement scales. For example, in a communication network, the number of IP packets summarizes all the IP traffic (TCP, UDP, FTP, HTTP, etc.). A TCP session represents a TCP connection between two IP addresses but not the volume of the traffic between them. Therefore, the number of packets is not directly connected to the number of sessions. Thus, these two measurements have different scales with different ranges. Consequently, the two measurements have to be normalized to bring their columns to a common scale before further processing.
First we outline the major steps in the OFID process. Then we provide detailed descriptions of each stage in this outline.
OFID High Level Description

 1. As mentioned, in step 200, the traffic analyzer is applied to the original (source) multidimensional data to produce a matrix whose entries are modified to be the logarithm values of the original entries. Details of step 200 are shown in the flow chart of
FIG. 3 . A packet (data point) is received by the traffic analyzer (step 302). The packet lower protocol headers are parsed first (step 304), followed by parsing of the higher protocol headers (step 306). Next, the statistics of the packet are collected (step 308). The statistical data type is analyzed in step 310. If the data type is metadata, then metadata statistics are computed in step 312. Otherwise, the data type is payload and its statistics are computed in step 314. The traffic oriented type is then checked (step 316) to decide if it is payload or connectionoriented type. If the traffic oriented type was packetoriented type, then step 320 checks to decide if it is training or system mode. If the system mode is “training”, then step 322 updates the global packet statistics and if the training mode is not completed (step 326) it goes to step 302. If the training is completed, step 328 is called to produce final training statistics of the packets. If the system mode is “detection”, then step 324 is called to produce the packet statistics. If the statistical data was connectionoriented, step 318 is called to produce updated statistics of the corresponding session. If the session is not completed, step 302 is called. If the session is completed, the system mode in step 332 checks to decide if it is training or detection mode. If it is training mode, the global statistics of the session is updated (step 336) and, if the training is not completed (step 338), then step 302 is called. If the training is completed (step 338), the final training statistics matrix is produced in step 340. If the system mode (step 332) is detection mode, then the statistics matrix of the connection is produced in step 334. This completes step 200 inFIG. 2 . The same operation described above is used to generate the matrices with the statistical data in the online OLIDMD and OLIDPL algorithms. Since
FIG. 3 describes the OFID procedure, the training and detection procedures appear in the same flowchart. Therefore, in the detection mode, outputs of statistical matrices will be generated in step 334 from COTA and in step 324 from POTA.
 The same operation described above is used to generate the matrices with the statistical data in the online OLIDMD and OLIDPL algorithms. Since
 2. Step 202 is described in more detail in the flow chart of
FIG. 4 . If the statistical data type is metadata, normalization is done in step 406. If it is payload, ngram training is generated in step 408 and the size of the statistical matrix is reduced by consecutive application of coarse graining (step 410) followed by the application of random projection (step 412). The same sequence of operations will be used to generate the statistical matrices in the OLIDMD and OLIDPL algorithms. Each column (feature vector) of the statistical matrix is normalized as follows: a. Pairwise distances between statistical matrix elements are computed to produce a similarity matrix;
 b. The similarity matrix is analyzed via the application of diffusion maps (RLDM) or diffusion bases (AADB). The normalized output matrix from this procedure is described by a selected group of r discriminating eigenvectors of the distances matrix, where r≧2;
 c. Each column vector (which describes a feature) of the normalized output matrix is set to the selected discriminating eigenvectors of the distances matrix.
 3. In step 204, the normalized output matrix is processed using either RLDM or AADB to derive its embedding matrix as follows:
 a. RLDM:
 i. Pairwise distances in the normalized output matrix are computed using one of the following distance metrics:
 1. Euclidean distance;
 2. Weighted Euclidean distance;
 3. Cosine distance;
 4. Mahalanobis distance.
 ii. The distances matrix is analyzed by the application of RLDM to return its first four eigenvectors;
 iii. A selected group of r discriminating eigenvectors, where r≧2, is chosen from the embedding matrix. For twodimensional (2D) or threedimensional (3D) visualization, two or three eigenvectors from the selected group of r eigenvectors may be chosen, respectively;
 i. Pairwise distances in the normalized output matrix are computed using one of the following distance metrics:
 b. AADB:
 i. The normalized output matrix is transposed into a transposed normalized matrix;
 ii. A distances matrix of pairwise distances is computed from the transposed normalized matrix;
 iii. The distances matrix is analyzed by the application of AADB. It returns a selected group of r, r≧2, discriminating eigenvectors of the distances matrix. For 2D or 3D visualization, two or three eigenvectors from the selected group of r eigenvectors may be chosen, respectively;
 iv. The normalized distance matrix (iii) is projected on the selected group of r discriminating eigenvectors;
 v. For 2D or 3D visualization, two or three eigenvectors from the selected group of r discriminating eigenvectors form the embedding matrix.
 a. RLDM:
 4. The identification of abnormal points using the embedding matrix is performed as follows (steps 104106 in
FIG. 1 ): a. The density (the number of points in each data point'"'"'s neighborhood) of each point in the embedding matrix using Euclidean distance is computed;
 b. A histogram of the density values is generated;
 c. All the points in the smallest bin are classified as abnormal (intrusion), while all the other points are classified as normal.
Detailed Description of OFID
 1. As mentioned, in step 200, the traffic analyzer is applied to the original (source) multidimensional data to produce a matrix whose entries are modified to be the logarithm values of the original entries. Details of step 200 are shown in the flow chart of
Processing the MultiDimensional Raw Data:
Let H be a multidimensional data of raw data (packets). Let C be a matrix of size m×n produced from H by the corresponding traffic analyzer (step 316 in
 1. POTA—steps 320, 322, 326, 328 in
FIG. 3 : m is the number of time slices (can for example be a minute) and n is the number of metadata features gathered by the packet oriented traffic analyzer every time slice;  2. COTA—steps 318, 330, 332, 336340 in
FIG. 3 : m is the number of features (for example, TCP connections) and n is the number of metadata features gathered by the connection oriented traffic analyzer (for example, for every TCP connection).
Normalizing Matrix C:
 1. POTA—steps 320, 322, 326, 328 in
We take column l, 1≦l≦n, from matrix C denoted by c^{l}={c_{il}: 1≦i≦m}. We compute for this column vector its pairwise Euclidean distances matrix whose entries are {tilde over (c)}_{ij}^{l}={c_{il}−c_{jl}: i, j=1, . . . , m}. We build a Gaussian kernel
K_{ij}^{l }is symmetric and nonnegative. ε can be determined by the procedure in AADB. Then we normalize C into a Markov transition matrix P_{ij}^{l}. P_{ij}^{l }is known in the art as the normalized graph Laplacian and is constructed as follows:
P_{ij}^{l }is a Markov matrix since Σ_{q=1}^{m}P_{iq}^{l}=1 and P_{ij}^{l}≧0. Since P_{ij}^{l }is a symmetric positive semidefinite kernel, it leads to the following eigendecomposition: P_{ij}^{l}=Σ_{w≧1}^{m}λ_{w}^{l}v_{w}^{l}(c_{il})v_{w}^{l}(c) where λ_{w}^{l }are the eigenvalues and v_{w}^{l }are the eigenvectors. Finally, we build the column 1 of the normalized matrix A by taking the second eigenvector of the eigendecomposition of P^{l}, where a^{l}=v_{2}^{1}. We repeat this for each l, l=1, . . . , n. At the end of this process, the original data in matrix C is replaced by the normalized matrix A.
Processing Normalized Matrix A—Derivation of Embedding Matrix Ψ:
We reduce the dimensionality of the data from n (number of features) to r where usually in highdimensional problems r<<n. This process applies either the RLDM or AADB. We begin with RLDM.
Embedding by RLDM:
We denote the row vector i, 1≦i≦m in the normalized matrix A by {right arrow over (a)}={a_{ik}: 1≦k≦n}. We compute for A its pairwise distances matrix A whose entries are ã_{u }using one of the following distance metrics:
 1. Euclidean distance metric:

 2. Weighted Euclidean distance metric:

 where {right arrow over (w)}={w_{k}: k=1, . . . , n} is a weighting factor vector. The larger is w_{k}, the smaller is the influence of the kth feature on the distance between {right arrow over (a)}_{i }and {right arrow over (a)}_{j}.
 3. Cosine distance metric:

 4. Mahalanobis distance metric:
where Σ is the sample covariance matrix. Σ can also be the features matrix. We build a Gaussian kernel
Since ε is fixed for all entries in Ã, it gives a coarse scaling control. A finer scaling control can be achieved as follows: First, we build the initial Gaussian kernel {tilde over (K)}_{ij }with the fixed scale control ε,
Then, we build a Gaussian kernel with a finer scale control,
This finer scale control provides better and more compact description of the local geometric properties of the pairwise distances matrix Ã. This process is repeated until the scale factor is sufficiently fine and until K_{ij }represents optimally the nature of the local geometry of Ã. K_{ij }is normalized into a matrix P_{ij }by one of the following methods:
 1. Graph Laplacian matrix:

 2. LaplaceBeltrami matrix: First, we compute the graph Laplacian matrix

 We repeat this process and get the LaplaceBeltrami matrix
Since P is a symmetric positive semidefinite kernel, it enables the following eigendecomposition: P_{ij}=Σ_{w≧1}^{m}λ_{w}v_{w}({right arrow over (a)}_{i})v_{w}({right arrow over (a)}_{j}) where λ_{w }are the eigenvalues and v_{w }are the eigenvectors. Finally, we build the embedding matrix Ψ of dimension r. We denote the iβ^{th }column of Ψ by Ψ^{i}. For visualization purpose, we can present the embedding in a 3D view by taking r=3 out of r, r≧4, eigenvectors as a selected group of discriminating eigenvectors of the eigendecomposition of P. One possible option is to choose Ψ^{1}=v_{2}, Ψ^{2}=v_{3}, Ψ^{3}=v_{4}.
Embedding by AADB:
We denote the i^{th }i=1, . . . n, row vector in the transposed normalized matrix A^{T }of A by {right arrow over (a)}_{i}^{T}={a_{ki}: k=1, . . . , m}. We compute for A^{T }its pairwise Euclidean distances matrix Ã^{T }whose entries are a_{ij}^{T}={∥{right arrow over (a)}_{i}^{T}−{right arrow over (a)}_{j}^{T}∥: i, j=1, . . . n}. We build a Gaussian kernel
K_{ij}^{T }is normalized into a graph Laplacian matrix by
Since P_{ij}^{T }is a symmetric positive semidefinite kernel, it leads to the following eigendecomposition: P_{ij}^{T}=Σ_{w≧1}^{n}λ_{w}T_{v}^{w}T({right arrow over (a)}_{i}^{T})v_{w}^{T}({right arrow over (a)}_{j}^{T}) where λ_{w}^{T }are the eigenvalues and v_{w}^{T }are the eigenvectors. Finally, we build the embedding matrix Ψ. We project the normalized matrix A on a selected group of r, r≧2, discriminating eigenvectors of the eigendecomposition of P^{T}. For example, one option is to choose Ψ^{1}=A·v_{1}^{T},Ψ^{2}=A·v_{2}^{T},Ψ^{3}=A·v_{3}.
Identifying Abnormal (Intrusion) Points in Embedding Matrix Ψ (Steps 104106 in
Embedding matrix Ψ is now used to identify the abnormal points in the data. We compute the minimum and maximum values for every column i, i=1, . . . , r in Ψ, denoting them by min_{Ψ}_{i}, and max_{Ψ}_{i}, respectively. We take the row vectors from Ψ, denoting the j^{th }row in Ψ by {right arrow over (Ψ)}^{j}={Ψ_{jl}: l=1, . . . , r}, j=1, . . . , m. We compute for each j the number of row vectors which reside in its neighborhood, i=1, . . . , r, j=1, . . . , m, denoting
where δ is a predetermined scale control of the neighborhood of each point. Then, we count all {right arrow over (Ψ)}^{k}, k=1, . . . , m, that satisfy the condition in Eq. 1. Formally,
φ_{j}={{right arrow over (Ψ)}^{k}: k=1, . . . , m that satisfy R_{1k}^{j }and R_{2k}^{j }and R_{3k}^{j}}.
Let φ={φ_{1}, . . . , φ_{j}, . . . , φ_{m}} and Φ={Φ_{1}, . . . , Φ_{j}, . . . , Φ_{m}} where
is the normalized density vector. The maximum value in Φ is denoted by max_{Φ}. We construct an histogram of Φ denoted by hist_{Φ}. This histogram is divided into β bins of size
Since the majority of the points in me data are normal, all the normal points have a higher number of neighbors and their normalized density value is mapped into the upper bins in hist_{Φ}. Conversely, since the abnormal points are a minority, these points have a smaller number of neighbors and their normalized density value is mapped into the smallest bin. Therefore, all the points in the smallest bin are classified as abnormal points. These points are the sought after intrusions in the processed datasets. Formally, {right arrow over (Ψ)}^{j}, j=1, . . . , m, is an intrusion if
Otherwise, {right arrow over (Ψ)}^{j }is a normal point. The output from this process is an embedding matrix Ψ and a decision mechanism that determines whether each data point (row vector) in this matrix is normal or abnormal. The output can be visualized if it is two or three dimensional.
Next we describe OLIDMD and OLIDPL in more detail. Both are described exemplarily, for ease of understanding, with reference to packet traffic over a communication network such as the Internet.
OLIDMD
The OLIDMD process described below uses the metadata in the matrices generated by the traffic analyzer. A major challenge in the OLID algorithms (OLIDMD and OLIDPL) is how to add many arrivals of new data points in a predetermined time interval to the existing embedding matrix and how to classify them efficiently. New data points are constantly produced by the traffic analyzer. As new data streams in, diffusion coordinates are assigned to each newly arrived data point without having to recompute the diffusion maps/bases. This requirement is critical for NIDS to be efficient and operational in practice, because of the increasing bandwidths of existing networks infrastructures. Since the normalization of each newly arrived data point is critical for the analysis, the normalization is done efficiently after previous data points have already been normalized in the training process, without having to recompute the diffusion maps or bases as new data streams in. In addition, the learned features of the normalized data points are added efficiently to the existing embedding matrix to enlarge its coverage for more normal events.
OLIDMD Algorithms
We describe two OLIDMD algorithms: The first is a straightforward algorithm which is somewhat slower than the second one, but which has one advantage: its embedding matrix is more accurate, robust and covers better normal activities in the multidimensional data, since a newly arrived data point is added and is processed against all the data accumulated so far. The second algorithm fits better online (realtime) situations, because it provides fast processing for a newly arrived data point.
 1. Straightforward Algorithm: The statistics of a newly arrived data point from the multidimensional raw data are added to the statistical matrix as done to the raw data processed by OFID. In other words, the normalization process applied in OFID in the training phase is performed here again on a new population which includes the newly arrived data point. In essence, the OFID process is applied to an extended matrix which contains the original multidimensional data plus the newly added data point. This newly arrived data point is normalized with the rest of the existing source data and is then embedded and detected correctly.
 2. Efficient Algorithm: This algorithm has two steps: offline training (step 100 in
FIG. 1 ) done once from using the training data; and online detection and classification of newly arrived data points (102106 inFIG. 1 ). Both steps involve normalization of features extracted from the multidimensional source data. We recall from the OFID process that the normalization process of a statistical matrix there involves the application of RLDM or AADB to this matrix. This step is needed in order to bring all the features in the matrix to a common normalization scale. A newly arrived data point produced by the traffic analyzer is not normalized. Therefore, its values must be brought to the common normalization scale of the statistical matrix produced in the training procedure. All the columns in the matrix were normalized by RLDM or AADB in the training procedure. The geometric harmonics (GH) methodology (“A tool for multiscale outofsample extension of empirical functions”, Applied and Computational Harmonic Analysis, 21(1), 3152, 2006) is exemplarily applied to each newly arrived data point. However, since RLDM or AADB is applied to every column (feature) in the statistical matrix, GH is applied to every value of the newly arrived data point as well. Therefore, this normalization requires the application of GH according to the number of features. A major advantage of this algorithm is that there is no need to apply the RLDM or AADB to the complete current data (training data plus newly arrived data point) from the beginning, as in the straightforward algorithm. Thus, it is more efficient. Moreover, we suggest alternative normalization procedures replacing the one used in the second step in the OFID algorithm described above. After the second normalization, the embedding matrix, determined in the training procedure, is extended efficiently with the new normalized data point via the application of GH. Finally, the newly arrived data point, now normalized, is classified to be either normal or abnormal according to whether it respectively belongs or not to the training cluster generated in the training procedure.
Outline of the OLIDMD Algorithm
The OLIDMD algorithm has two sequential steps:  1. Training (100 in
FIGS. 1 and 200206 inFIG. 2 ): The training step is based on the OFID algorithm described above. The normalization is replaced with new normalization procedures which do not require the reapplication of RLDM or AADB. The remaining steps are the same as in the OFID algorithm. The output of the training step is the embedding matrix, also called a “baseline profile matrix” for an online detection process. The normalization is applied to each newly arrived data point. After a newly arrived data point is normalized, GH is applied to extend the (reduced) embedding baseline profile matrix with the newly arrived data point, resulting in a new embedding for the new data point. Finally, the extended baseline profile matrix is used to classify the newly arrived data point as either normal or abnormal. The training includes, in more detail: a. Application of the traffic analyzer to produce a matrix of statistical data. The selection process for the features vector was described with reference to step 402 in
FIG. 4 .  b. Each column (feature vector) of the matrix is normalized by one of the following methods:
 i. Gaussian normalization (406 in
FIG. 4 ) as follows: 1. Computation of the standard deviation of the column;
 2. Computation of the Gaussian kernel for each value in the column, using the precomputed standard deviation. Each column (feature vector) in the normalized matrix is the output of the Gaussian kernel;
 3. Saving of the computed Gaussian kernel parameters are a baseline for the online detection step.
 ii. Normalization of the normal probability density function as follows:
 1. Computation of the standard deviation and the mean of the column (feature vector);
 2. Computation of a normalization factor using the precomputed standard deviation;
 3. Computation of a normal probability density function kernel for each value in the column, using the precomputed standard deviation, mean and normalization factor. Each column vector in the normalized matrix is the output from the normal probability density function kernel;
 4. Saving of the computed normal probability density function parameters as a baseline for the online detection step.
 i. Gaussian normalization (406 in
 c. The normalized matrix is processed by the application of RLDM or AADB to derive its embedding matrix (described in the training procedure in OFID) as follows:
 i. Computation of pairwise distances in the normalized matrix;
 ii. Analysis of the distances matrix by the application or AADB, which returns a group of r, r≧2, discriminating eigenvectors. This group is the embedding matrix;
 iii. Saving of the computed embedding matrix as a baseline for the online detection step.
 d. Identification of abnormal points using the embedding as follows:
 i. Computation of the density value for each data point in the embedding matrix (the number of points in its neighborhood);
 ii. Generation of a histogram of the density values;
 iii. Classification of all the data points in the smallest bin as abnormal points while all the other data points are classified as normal;
 iv. Classification of all abnormal data points as intrusions;
 v. Saving of the computed density and histogram parameters as a baseline for the online detection step.
 a. Application of the traffic analyzer to produce a matrix of statistical data. The selection process for the features vector was described with reference to step 402 in
 2. Detection (102104 in
FIG. 1 ): Application of automatic unsupervised tools that enable online detections of problems (anomalies and intrusions). This application classifies each newly arrived data point to be either normal or abnormal. The detection includes, in more detail: a. Online application of the traffic analyzer to produce, in every predetermined time slice, a logarithm value of a new arrived sample (row vector) of statistical data;
 b. Normalization of each value (feature) in the newly arrived row vector according to the saved baseline normalization method parameters (508 in
FIG. 5 ) as follows: i. Computation of a normalization kernel using the corresponding baseline normalization kernel parameters;
 ii. Each value in the normalized row vector is the output of the normalization kernel.
 c. The normalized row vector is processed by the application of GH to derive its embedding vector (514 in
FIG. 5 ) as follows: i. Analysis of the row vector using the baseline embedding matrix (computed and saved in the training step). The analysis returns the matrix extension, which is the new embedding vector of the new processed sample.
 d. Classification of the newly arrived sample as normal or abnormal:
 i. Computation of the density value using the baseline embedding matrix and the baseline density parameters (computed and saved in the training step);
 ii. Placement of the density value in the baseline histogram (also computed and saved in the training step);
 iii. Classification of a point mapped to the smallest bin of the baseline histogram as an abnormal point. If the point is not mapped to the smallest bin, it is classified as a normal point.
Detailed and Formal Description of OLIDMD Algorithm
The next section provides finer mathematical details of the description above.
 1. Training:
 a. Processing the raw training data: Let H be a data of raw data (packets). Let C be a matrix of size m×n that is produced from H by the corresponding traffic analyzer:
 i. POTA: m is the number of time slices (can be for example a minute) and n is the number of metadata features that are gathered every time slice;
 ii. COTA: m is the number of tcp connections and n is the number of metadata features that are gathered for every tcp connection.
 b. Normalization of matrix C: The matrix C is normalized by either Gaussian normalization or by normal probability density function normalization.
 i. Gaussian normalization: Let c^{l}={c_{il}: i=1, . . . , m} be the column l, l=1, . . . , n, in C. The normalized standard deviation
 a. Processing the raw training data: Let H be a data of raw data (packets). Let C be a matrix of size m×n that is produced from H by the corresponding traffic analyzer:
 1. Training:



 is computed for this column vector l. We build the Gaussian kernel





 where K^{l }is a column vector. s^{l}=Σ_{i=1}^{m }K_{i}^{l }is computed for this column vector. The normalized column vector A^{l }is computed as





 A^{l }is normalized already since Σ_{i=1}^{m}A_{i}^{l}=1. The normalization parameters δ^{l }and s^{1 }are saved for the online detection step. We repeat it for each l, l=1, . . . , n. At the end of this process, the original data in the matrix C is replaced by the normalized matrix A.
 ii. Normal probability density function normalization: Let c^{l}={c_{il}: i=1, . . . , m} be the column l, l=1, . . . , n, in C. The normalized standard deviation





 is computed for this column vector l. Its normalization factor is β=δ^{l}√{square root over (2π)}. The normal probability density function kernel becomes





 where K^{l }is a column vector. The normalized column vector A^{l }becomes: A_{i}^{l}=K_{i}^{l}·β^{l}, i=1, . . . , m. The normalization parameters δ^{l},
c ^{l }and β^{l }are saved for the online detection step. The normalization is repeated for each l, l=1, . . . , n. At the end of this process, the original data in the matrix C is replaced by normalized matrix A.
 where K^{l }is a column vector. The normalized column vector A^{l }becomes: A_{i}^{l}=K_{i}^{l}·β^{l}, i=1, . . . , m. The normalization parameters δ^{l},
 c. Processing the normalized matrix A: derivation of its embedding matrix Ψ. The dimensionality of the data is reduced from n (number of features) to a smaller number r where usually r<<n. This process applies RLDM as described above re. OFID. The output of this process is the embedding matrix Ψ, which is saved for the online detection step.
 d. Identification of abnormal (intrusion) points in the embedding Ψ: The embedding matrix Ψ is used to identify the abnormal points in the data. We recall that in OFID we performed the following: computed the minimum and maximum values, denoted by min_{Ψ}_{i }and max_{Ψ}_{i}, respectively, for every column i, i=1, . . . r, in Ψ; built the normalized density vector Φ using the norm of the density values ∥φ∥_{2 }and constructed the histogram that is divided into β bins of size




 All were saved for the online detection step.
The outputs from the training step are the normalization parameters (δl—the normalized standard deviation and s^{l}, l=1, . . . , n−the sum of the Gaussian kernel), the 3D embedding matrix (Ψ) and the parameters (min_{Ψ}_{i }and max_{Ψ}_{i}, i=1, . . . r, ∥φ∥_{2 }and γ) for the decision mechanism that determine whether each point in this matrix is normal or abnormal. These outputs are the baseline parameters for the online detection step next.
 All were saved for the online detection step.
 2. Detection:
 a. Online processing of a new sample: Let P be a row vector of size 1×n produced online by the traffic analyzer in every time slice interval, where n is the number of features gathered in every time slice interval. These features are described in POTA.
 b. Online normalization of sample P: We use the baseline normalization parameters δ^{l }and s^{l }saved in the training step. Two methods are described:
 i. Gaussian normalization: Denote P={p^{1}, . . . , p^{n}}. The Gaussian kernel




 is computed using δ^{l }and s^{l}, l=1, . . . , n. The normalized value A^{l }is constructed as follows:





 The kernel computation and normalization is repeated for each l, l=1, . . . , n. At the end of this process, the original row vector P is replaced by the normalized row vector A={A^{1}, . . . , A_{n}}.
 ii. Normal probability density function normalization: We use baseline parameters δ^{i},
c ^{l }(the mean) and β^{l }(the normalization factor), l=1, . . . , n. Denote P={p^{1}, . . . , p^{n}}. We compute the normal probability density function kernel





 and the normalized value A^{l }as follows: A^{l}=K^{l}·β^{l}. The kernel computation and normalization is repeated for each l, l=1, . . . , n. At the end of this process, the original row vector P is replaced by the normalized row vector A={A^{1}, . . . , A^{n}}.
 c. Processing of normalized matrix A—derivation of embedding matrix Ψ: We start with the baseline embedding matrix Ψ, saved in the training step. The dimensionality of A is reduced from n to a smaller dimension r where usually r<<n. This process uses the application of GH to extend the baseline embedding matrix Ψ with the normalized vector A and obtain an extension of the matrix. This extension is the new embedding vector ψ of the new sample.
 d. Online classification of a newly arrived data point as normal or abnormal using the embedding matrix ψ: Baseline embedding matrix Ψ and the baseline identification parameters min_{Ψ}_{i }and max_{Ψ}_{i}, i=1, . . . , r ∥φ∥_{2 }(the norm of the density values) and γ (the size of the bins in the histogram), saved in the training step, are used to classify the newly arrived data point ψ as normal or abnormal using the new embedding vector ψ. Equation 1 is used to compute for ψ the number of row vectors in Ψ that reside in its neighborhood. Then, all the vectors which satisfy the condition in Eq. 1, are counted and denoted by φ_{ψ}. The normalized density value is computed by




 In the MID algorithm, it was shown that the normalized density value of an abnormal point is mapped into the smallest bin. Therefore, all the points in the smallest bin are classified as abnormal points. These are the sought after intrusions points. Therefore, the new sample is classified as abnormal if it is mapped into the smallest bin. Formally, ψ is an intrusion if Φ_{ψ}≦γ. Otherwise, ψ is a normal point.

In summary, the output of the detection step is at least a 2D embedding vector ψ and a decision mechanism that determines whether each newly arriving new point is normal or abnormal.
OLIDPL
The OLIDPL process described below uses the payload matrices generated by the COTA (step 408 in
 High sparsity—these matrices are very sparse, since usually each sample in the payload contains only dozens of bytes. Therefore, there are very few ngram entries in the 256^{n }rows. The length of the payload is too small compared with the total features space size. In our experiments, most of the samples occupy less than 0.1% of the 256^{n }space.
 High dissimilarity—different normal samples can have very different representations in the 256^{n }space. In our experiments, there is a little overlap between most of the different normal samples in row vectors.
 High homogeneity—most of the normal samples use a very limited space in the 256^{n }space. In our experiments, most of the samples use 15% of the 256^{n }space, whereas 9599% of the 256^{n }space are never used.
There are cases in which comparison between the frequency statistics of ngrams against each other is meaningless since there is a high dissimilarity among them. It is also meaningless to compare between frequency statistics of ngrams and a model. This is true in these cases because the training data is too sparse. One way to overcome the above sparsity issue is to generate sufficient frequency statistics of ngrams to build an accurate training distribution. This is impractical since the training dataset become too large to handle. The OLIDPL algorithm addresses these issues.
Outline of the OLIDPL Algorithm
The OLIDLPL algorithm has two sequential steps:
 1. Training (step 100 in
FIG. 1 and steps 200206 inFIG. 2 ): The training step is based on the training step in the OLIDMD algorithm. We add an additional preprocessing step that addresses the issues described above, and which utilizes the sparsity of the ngram data. The preprocessing step reduces the dimension of the multidimensional data, diminishes the sparsity of the data points and increases the similarity between normal samples (steps 510512 inFIG. 5 ). Moreover, it replaces a large number of training data points with a small group of new data points that best represent the training traffic (step 512 inFIG. 5 ). The rest of the steps follow those in the OLIDMD training step. The output is an embedding matrix (“baseline profile embedding matrix”) to be used in OLIDPL. a. Application of the traffic analyzer to produce a matrix of statistical data. The features of the vector process selection from the traffic payload are described in COTA (step 408
FIG. 4 ).  b. This matrix is preprocessed (step 408 in
FIG. 4 ) as follows: i. Generation of a unified ngram training matrix as follows:
 1. Generation of the statistical training matrix, which describes a unified ngram frequency;
 2. Updating of the unified ngram training matrix with the frequency statistics of every training sample;
 3. The unified ngram training matrix contains the frequency statistics of all the training samples;
 4. Computation of the distribution matrix of the unified matrix and saving of the unified matrix as a baseline for the online detection step.
 ii. Coarse graining of the training samples (step 410 in
FIG. 4 ) as follows: 1. Generation of initial random clusters according to a predetermined number of training representatives;
 2. Computation of new cluster centers based on the random clusters;
 3. Reassignment of the samples according to the new centers;
 4. Repetition of this process until all the training representatives of the training clusters reach steady state and form a training representative matrix;
 5. Saving of the matrix of the training representatives as a baseline for the online detection step.
 iii. Dimension reduction and sparsity removal from the training samples representatives by using RP (step 412 in
FIG. 4 ) as follows: 1. Generation of number of 256^{n }random Gaussian vectors. We use the number 100 in experiments described below;
 2. Division of each training representative, which is a 256^{n }row vector, by the training distribution matrix;
 3. Computation of inner products of the output of step 2 with the random matrix that was computed by RP;
 4. Replacement of each training representative by the logarithm of the output from its random projection;
 5. Saving of the random projected representatives from the training matrix and of the random matrix computed by RP as a baseline for the online detection step.
 i. Generation of a unified ngram training matrix as follows:
 c. The random projected representatives from the training matrix are processed by RLDM or AADB to derive its embedding matrix (step 414 in
FIG. 4 ) as follows: i. Computation of pairwise distances in the training matrix to provide a distances matrix;
 ii. Analysis of the distances matrix by the application of RLDM or AADB, which returns a selected group of r, r≧2, discriminating eigenvectors. This group of selected eigenvectors are the embedding matrix;
 iii. Saving of the embedding matrix as a baseline for the online detection.
 d. Identification of abnormal points in the embedding (step 418 in
FIG. 4 ) as follows: i. Computation of the density value for each point in the embedding matrix (the number of points in its neighborhood);
 ii. Generation of a histogram of the density values;
 iii. Classification of all the data points in the smallest bin as abnormal points while all the other data points are classified as normal;
 iv. Classification of all abnormal points as intrusions;
 v. Saving of the computed density and histogram parameters as a baseline for the online detection step.
 a. Application of the traffic analyzer to produce a matrix of statistical data. The features of the vector process selection from the traffic payload are described in COTA (step 408
 2. Detection (102106 in
FIG. 1 ): The detection step is based on the OLIDMD algorithm. After the newly arrived data point is preprocessed, GH is applied to extend the (reduced) embedding baseline profile matrix with the newly arrived data point. The result is a new embedding for the new data point. The baseline profile matrix is then used to classify the newly arrived data point as either normal or abnormal. a. Online application of the traffic analyzer to produce a logarithm value of each newly arrived data point (row vector) of statistical data (step in
FIG. 5 );  b. The newly arrived row vector (data point) is preprocessed according to the baseline preprocessing algorithm (510 in
FIG. 5 ) as follows: i. The new 256′ row vector is divided by the baseline unified training distribution matrix (510 in
FIG. 5 );  ii. Random vectors are generated from the application of RP. An inner product is computed between a random vector from RP and the normalized vector from step i (512 in
FIG. 5 );  iii. The newly arrived row vector is replaced by the logarithm of the output from its random projection;
 iv. The random projected row vector is the output of the preprocessing step.
 i. The new 256′ row vector is divided by the baseline unified training distribution matrix (510 in
 c. The random projected row vector is processed by GH to derive its embedding vector (step 514 in
FIG. 5 ) as follows: i. The row vector is analyzed by the application of the GH using the baseline embedding matrix (computed and saved in the training step). The analysis returns a matrix extension. The matrix extension is the new embedding vector of the new sample.
 d. Classification of the newly arrived data point as normal or abnormal (step 516 in
FIG. 5 ) as follows: i. A density value, defined above, is computed using the baseline embedding matrix and the baseline density parameters (computed and saved in the training step);
 ii. The density value is placed in the baseline histogram (computed and saved in the training step);
 iii. A point that is mapped to the smallest bin of the baseline histogram is classified as an abnormal point. Otherwise, it is classified as a normal point (step 518 in
FIG. 5 ).
Detailed and Formal Description of the OLIDPL Algorithm
 a. Online application of the traffic analyzer to produce a logarithm value of each newly arrived data point (row vector) of statistical data (step in
 1. Training (step 100 in
This section provides finer mathematical details of the description above.
 1. Training:
 a. Processing the raw training data: Let H be a data of raw data (packets). Let C be a matrix of size m×n that is produced from H by the POTA. m is the number of samples and n is the dimension of the ngram space. Since in our examples we use a ngram representation, there are 256^{n }ngrams possibilities;
 b. Preprocessing of matrix C: We preprocess matrix C by first generating a unified ngram matrix, then coarse graining the training samples and reducing the dimensionality of the training sample representatives. There are two options:
 i. Generation of a unified ngram training matrix U: An empty training matrix U of size 256^{n}, which will contain the frequency of its unified ngram, is allocated. Assume r, r=1, . . . , m, is a row in C, denoted by c^{r}={c_{ri}: i=1, . . . , n}. c^{r }is a row vector that can also be reshaped to a matrix {tilde over (c)}^{r }of size √{square root over (n)}√{square root over (n)}(256^{n }in our case). U is updated with the values of the reshaped row r: U=U+{tilde over (c)}^{r}. This is repeated for each r, r=1, . . . , m. Finally, every zero entry in this matrix is replaced by ε. At the end of this process, U contains the ngram frequency of all the m training samples. We build the distribution matrix Ũ of the unified matrix U by dividing the unified matrix by the number of training samples: Ũ=U/m. Ũ is saved as a baseline for the online detection step;
 ii. Coarse graining of training matrix C: The reduced training matrix A of size p×n, p<<m, is generated. A row r in A is denoted by a^{r}={a_{ri}: 1, . . . , n}, r=1, . . . , p, where p is the number of representatives in the reduced matrix. This matrix represents the reduced matrix C. A group G={g^{1}, . . . , g^{p}} of p empty clusters is built. Assume, s, s=1, . . . , m, is a row in C that is denoted by c^{s}={c_{si}: i=1, . . . , n}. Each c^{s }is randomly assigned to one of the p clusters in G. At the end of this process, each row vector from matrix C is assigned to one of the clusters in G. The reduced training matrix A is updated with the new centers (representatives of each cluster). Let a^{i}=
g ^{i}, i=1, . . . , p.g ^{i }is the averaged vector of all the samples assigned to g^{i}. The samples are reassigned to the clusters in G. according to the minimal distance between c^{s}, s=1, . . . , m, to every new center a^{i}=1, . . . , p. This process is repeated (updating A with the new centers and reassigning the samples) several times until A reached steady state;  iii. Dimension reduction of matrix A: Gaussian random vectors w are generated, where w is the desired dimension of the matrix A. A Gaussian random matrix RP of size 256^{n}×w, denoted by rp^{i}={rp_{ji}: j−1, 256^{n}}, i=1, . . . w where the mean of each random vector is
 1. Training:
the standard deviation
and the variance
is generated. A normalized matrix Ã from the training matrix A, which uses the unified training matrix Ũ is, computed. Assume r, r=1, . . . , p, is a row in A denoted by a^{r}={a_{rj}: j=1, . . . , 256^{n}}. The unified training matrix Ũ is reshaped into a row vector {right arrow over (Ũ)}. The normalized training vector a^{r }is computed as follows:
The computation of ã^{r }repeated for each r, r=1, . . . , p. Ã is the normalized training matrix that reflects the frequency of each training vector according to the unified training matrix. A random projected training matrix A^{RP }is generated. Assume r, r=1, . . . , p, is a row in Ã denoted by ã^{r}={ã_{rj}: j=1, . . . , 256^{n}}. A random projected vector a^{RP}^{r }is constructed as follows: a_{i}^{RP}^{r}=log (ã^{r}·rp^{i}), i=1, . . . , w. The computation of a^{RP}^{r }is repeated for each r, r=1, . . . , p. At the end of this process, the training matrix A is replaced by the logarithm of the normalized and reduced matrix A^{RP}. The Gaussian random matrix RP and the random projected training matrix A^{RP }are saved as baseline for the online detection step.
 c. Processing the reduced matrix A^{RP }derivation of its embedding matrix Ψ: The dimensionality of the data was reduced from w (number of random vectors) to a smaller number r where usually r<<w. This process applies RLDM or AADB to output the embedding matrix Ψ, which is saved for the online detection step.
 d. Identifying abnormal (intrusion) points in embedding matrix Ψ: The embedding matrix Ψ is used to identify the abnormal points in the data as described for OFID. We recall that in OFID, we performed the following: computed the minimum and maximum values, denoted by min_{Ψ}_{i }and max_{Ψ}_{i}, respectively, for every column i, i=1, . . . r, in Ψ; built the normalized density vector Φ using the norm of the density values ∥φ∥_{2 }and constructed the histogram that is divided into β bins of size


 All were saved for the online detection step.

The outputs from the training step are the normalization parameters (δ^{1}−the normalized standard deviation and s^{l}, l=1, . . . , n−the sum of the Gaussian kernel), the 3D embedding matrix (Ψ) and the parameters (min_{Ψ}_{i }and max_{Ψ}_{i}, i=1, . . . r, ∥φ∥_{2 }and γ) for the decision mechanism that determine whether each point in this matrix is normal or abnormal. These outputs are the baseline parameters for the online detection step next.
 2. Detection:
 a. Online production of a new sample: Let P be a row vector of size 1×256^{n }produced by the traffic analyzer online, where n is the ngram dimension;
 b. Preprocessing and dimension reduction in online of a new sample P: The baseline preprocessing parameters, which were saved in the training step, are: Ũ (the unified training matrix) and RP (the Gaussian random matrix). RP is of size 256^{n}×w, where w is the desired reduced dimension, and is denoted rp^{i}={rp_{ji}: j=1, . . . , 256^{n}}, i=1, . . . , w. A normalized vector {tilde over (P)} is constructed from the new sample P using the unified training matrix Ũ as follows: the training matrix Ũ is reshaped into a row vector {tilde over ({right arrow over (U)})} and
 2. Detection:


 is computed. {tilde over (P)} represents the frequency of the new sample according to the unified training matrix. A random projected vector P^{RP }is constructed. It is denoted by p^{RP}^{i}={p^{RP}^{1}, . . . , p^{RP}^{w}}, where p^{RP}^{i}=log ({tilde over (P)}·rp^{i}), i=1, . . . , w. At the end of this process, the original row vector P is replaced by the logarithm of the normalized reduced row vector P^{RP};
 c. Processing the reduced vector P^{RP}—derivation of its embedding vector ψ: The baseline embedding matrix Ψ, which was saved in the training step, is used. The dimensionality of P^{RP }is reduced from w (number of random vectors) to a smaller number r where usually r<<n. This process uses the application of GH, to extend the embedding matrix Ψ with the normalized reduced vector P^{RP}. The output of this process is the extension of the matrix. This extension is the new embedding vector ψ of the new sample;
 d. Online classification of a new sample as normal or abnormal by using the embedding vector ψ: The baseline embedding matrix Ψ and the baseline identification parameters min_{Ψ}_{i }and max_{Ψ}_{i}, i=1, . . . , r, ∥φ∥_{2 }(the norm of the density values) and γ (the size of the bins in the histogram), which were saved in the training step, are used to classify the new point ψ to be normal or abnormal. The normalized density value Φ_{ψ }is used. The new sample ψ is classified as abnormal if its normalized density value is mapped into the smallest bin. Otherwise, ψ is classified as a normal point.

The output from of this online process is a rdimensional embedding vector ψ and a decision mechanism that determines whether the new point is classified as normal or abnormal.
The following section is meant to illustrate, using various examples, the wide scope of the practical uses and applications enabled by the methods of the invention. The methods were applied to detect anomalies in different data, both offline and online. It is to be understood that the examples presented in detail below represent only a small fraction of the applications to which the methods of the invention can be applied. We first provide explanations on the nature of the different data, then present experimental results. In some examples related to networks, our experimental results are compared with results obtained by known anomaly detection methods.
Intrusion Detection Evaluation Data (IDED)

 1. DARPA intrusion detection evaluation dataset. These datasets are the most comprehensive evaluation datasets publicly available for evaluating the performance of intrusion detection systems. In other words, they are used as benchmarks for evaluating and developing intrusion detection systems, and appear in many references.
 2. Governmental networks datasets. These datasets include real traffic captured in realtime from governmental networks, e.g. traffic from a network that consists of hundreds of web servers and from several Internet service provider networks that handle hundreds of users.
 3. Academic network datasets. The data in these datasets were captured from the TelAviv University (TAU) main web server (www.tau.ac.il) and from the TelAviv University elearning web server (www.virtual2002.tau.ac.il). Although these datasets were captured from the same academic network, the type of traffic (content) transferred through these two servers is completely different.
 4. ONIDS simulation network datasets. These datasets were generated in a closed simulation network. In this network, different types of profiles, traffic and attacks were simulated:
 a. HTTP attacks data that includes a variety of the latest HTTP attacks against web servers;
 b. SQL injection attacks data that includes a variety of the latest SQL injection attacks against SQL databases;
 c. Worm attack data that include different stages of worm attacks against multiple users. They include exploitation, infection and propagation.
Following are detailed descriptions of these data.
DARPA Intrusion Detection Evaluation Datasets
Recent (1998, 1999) comprehensive evaluations of the performance of intrusion detection systems were performed on DARPA data (see Lincoln Laboratory, MIT, “DARPA Intrusion Detection Evaluation Datasets”). These evaluations included research on intrusion detection systems and on attacks against UNIX and Windows NT systems and against Cisco Routers. They used a relatively simple network architecture and background traffic designed to describe traffic that is similar to an Air Force base. The 1999 evaluation datasets included many novel aspects. Detections and false alarm rates were carefully measured for more than 18 systems. More than 56 attack types included stealthy and novel new attacks (updated and typical to those years) were used to measure detection rates. Weeks of background traffic were used to measure false alarms and misdetections rates. In addition, a unique intrusion detection corpus was created that included weeks of background traffic and hundreds of labeled and documented attacks. This corpus has been widely distributed and is being used as a benchmark for evaluating and developing intrusion detection systems. There are two evaluation datasets:
 1. The 1998 DARPA Intrusion Detection Evaluation Dataset, which includes nine weeks of network based attacks in the midst of normal flow of background data;
 2. The 1999 DARPA Intrusion Detection Evaluation Dataset, which includes five weeks of network based attacks, where two weeks do not contain any attacks and one week contains selected labeled attacks. The last two weeks contain 201 instances of 56 types of labeled attacks distributed throughout these two weeks.
Below, we concentrate on analyzing some of the network protocols in the DARPA evaluation datasets, without dealing with the content of the data. DoS and Probes attacks were detected.
Governmental Networks Datasets
To develop and verify this invention, we were provided full access to several real operational governmental networks, which suffer from thousands of attacks every day. The evaluated data were:
 Web server network data, which contain traffic from a network, consists of hundreds of web servers. These web servers handle thousands of requests every day;
 Internet user network data, which contain traffic from several Internet service providers and handle hundreds of users (government staff who use these networks to access the Internet).
The governmental networks are protected using several network security tools: signaturebased tools, anomalybased tools, firewalls and proxies and VPNs.
Academic Network Datasets
The academic network datasets were created by two network sniffers (tcpdump programs) located outside TelAviv University (TAU) main switch (gateway). One sniffer captured all the packets between TAU main web server (www.tau.ac.il) and the Internet. The other sniffer captured all the packets traffic between TAU elearning web server (www.virtual2002.tau.ac.il) and the Internet. The academic network data contains mainly HTTP traffic between web clients and web servers. Both sniffers captured the whole content of the packets including the full payload. The outputs from these sniffing processes were two groups of unlabeled data:
 Web server: The TAU web server (www.tau.ac.il) is the main web server of TelAviv University. It accepts TCP connections on port 80 (HTTP). Each dataset was collected during a period of 10 minutes every hour, accumulating to a size of about 250 MB per dataset and about 5000 connections per datasets. The result was a corpus of 48 datasets with a total size of about 12 GB.
 The Virtual TAU web server integrates advanced learning technologies through the Internet to supplement academic instructions. Virtual TAU hosts web sites of 1,500 staff members by presenting about 4,000 courses. These web sites contain an information tree that includes supplemental reading material, bibliographies, Weblinks, learning activities, simulations, multimedia presentations, tests and evaluations, etc. In addition, this site facilitates mainly asynchronous communication tools (e.g., forums, online polls) for the students and the instructors. Currently, about 25,000 students are using Virtual TAU.—This web server accepts TCP connections on port 80 (HTTP). The result is a corpus of 288 datasets with a total size of about 45 GB.
ONIDS Simulation Network Datasets
In order to simulate different types of traffic and attacks that do not appear in the datasets above, we developed a system that simulates the traffic between an internal network and the Internet. The goal of this system is to simulate different attacks from the internal network to the Internet (and vice versa) while there is a normal traffic in the background between these networks.
 ONIDS implements the algorithms in the invention. It operates in a sniffing mode and it is completely passive. It does not affect the traffic between the internal network and the Internet.
 Cisco IDS4215 (Cisco IDS 4215 Sensor, Cisco Systems, Inc.) is the latest standalone IDS appliance from Cisco. It is a signaturebased and anomalybased IDS. It runs Cisco'"'"'s latest intrusion prevention system software (IPS 6.0 (see Cisco IPS 6.0, hereinafter CIS60). It is configured to be in passive mode and it does not affect the traffic between the internal network and the Internet.
 Snort IDS (see M. Roesch, “Snort”, hereinafter SNORTM) is an open source signaturebased network intrusion detection and prevention system capable of performing packet logging and realtime traffic analysis for IP networks.
 Apache ModSecurity IDS (hereinafter MODSEC) is an open source signaturebased intrusion detection and prevention engine for web applications. It is classified as a web application firewall. It is configured to be in passive mode and it does not affect the traffic between the internal network and the Internet.
 The Imperva SecureSphere 5.0 IDS (hereinafter IMPERVA) is a signaturebased and anomalybased IDS.
Since all intrusion detection systems machines are in passive mode, the order of their connections is not important.FIG. 8 also shows an “IDS Management” component, which is a standalone PC connected directly to the management port of each IDS device. It is unseen by either the internal network or the Internet. It is used for management goals only.
In order for our system to simulate large networks with multiple machines in each network, we used several techniques and tools. These tools enabled us to implement large logical (virtual) networks with thousands of virtual machines using several physical machines. The following techniques and tools were implemented and used in the physical machines which contained the internal and the Internet networks:
 VMware (VMware, Inc., hereinafter VMWARE;
 BackTrack is used in both internal network machine and Internet machine;
 Honeyd is a daemon that creates virtual hosts on a network. As a result, our virtual Internet network simulated 16,777,216 virtual machines;
 Arpd is a daemon that listens to ARP requests and answers for IP addresses that are unallocated (ARP spoofing).
 Hping is a commandline oriented TCP/IP packet assembler/analyzer. We use Hping to simulate an internal network;
 Netcat is a network utility for reading from and writing to network connections on either TCP or UDP;
 Nmap (“Network Mapper”) is a commandline utility for network exploration and security auditing. We used Nmap on the internal network machine in order to scan the virtual Internet network using various scanning techniques;
 Scapy is an interactive packet manipulation program. We used Scapy on the internal network machine in order to send special crafted packets to the virtual Internet network.
This collection of tools and techniques enabled us to simulate in our system a virtual internal network with dozens of virtual hosts that communicate with a virtual Internet network with thousands of virtual hosts and servers using only a few physical computers. ONIDS is an anomaly detection system configured to run computer executable instructions implementing the methods and techniques of the present invention. The computer executable instructions can be stored on a computerreadable medium, such as a hard disk or another storage medium. When such a program of instructions is to be executed, it is usually loaded into the random access memory of the computer, thereby configuring the computer to act in accordance with the techniques disclosed herein.FIG. 9 displays the logical (virtual) architecture of our simulation network.
Our system was used to simulate different known HTTP attacks: We compiled several wellknown vulnerabilities in the HTTP service in order to generate the attack data with more than 60 variants of HTTP exploits. These vulnerabilities include CGI scripts, shell scripts, SQL injection, XPath injection, buffer overflows, IIS and Apache vulnerabilities, directory traversal, information gathering, system integrity, suspicious content and input validation error. These attacks were based on HTTP requests that were initiated by the HTTP client. The attacks were embedded in different parts of the HTTP request: the HTTP URI, the HTTP headers and the HTTP payload. Worm attack data was used as evaluation data in the simulation system. Three scans and propagation techniques were simulated in the system in order to scan and propagate worms to target hosts. Three network scenarios were simulated in the system:
 1. Normal—several hosts from the internal network communicate with thousands of hosts and servers from the Internet network.
 2. Single worm—a single host from the internal network is infected by a worm that tries to scan and propagate to other hosts. This is the very first stage of worm propagation in a network. In the background there is normal traffic flow between the other hosts (as in the normal scenario).
 3. Multiple worms—multiple hosts from the internal network are infected by a worm that tries to scan and propagate to many other hosts. This is the second stage of worm propagation in a network (exponential propagation). In the background there is normal traffic flow between the other hosts (as in the normal scenario).
Intrusion Detection Traffic Analyzers (IDTA)
In order to analyze different datasets described in IDED above, we developed two different traffic analyzers. Some of the evaluation datasets included raw multidimensional data that was sniffed and captured by the tcpdump program.
In order to analyze different data described in IDED above, we developed two different traffic analyzers: packetoriented traffic analyzer (POTA) and connectionoriented traffic analyzer (COTA). POTA collects data from every arrival of a packet (step 302 in
Detailed descriptions of POTA and COTA are given next. The features (parameters, measured data) extracted are described. Some of the features are not extracted but computed from other extracted features. This way we enlarge the number of features. In this case, we call a computed feature also computed data and the terms will be used interchangebly.
PacketOriented Traffic Analyzer (POTA)
Some of the evaluation data in IDED, like the DARPA data, contain multiprotocol network traffic. Since each protocol has its own header and fields, the traffic analyzer has to parse and handle each protocol separately (step 304 in
Sequential Application of the POTA Process

 1. Get a new arrival of a packet (step 302 in
FIG. 3 );  2. For each packet, the analyzer parses its IP header and checks the protocol field (of the IP header) for the next level protocol used in the data portion of the datagram (step 304 in
FIG. 3 );  3. The corresponding protocol handler is called (the analyzer handles only icmp and tcp protocols);
 4. Each protocol handler parses the packet headers to extract features accordingly and collects several values;
 5. At every predetermined time interval slice (for example, one minute), the analyzer summarizes the values collected during this time slice and saves the statistics of this time slice. The following data (other data is also possible) is gathered and computed every predefined time slice (step 308 in
FIG. 3 ). The parameters (features) are: number of IP packets, number of icmp packets, ratio between the number of icmp packets and IP packets (not an extracted feature by computed data), number of tcp packets, number of tcp packets with different tcp flags (syn, syn_ack, fin, rst), ratio between the number of tcp packets with syn_ack flags and the number of tcp packets with syn flag (computed data), ratio between the number of tcp packets with rst flag and the number of tcp packets with syn flag, number of tcp connections (sessions), number of completed and uncompleted tcp connections, ratio between the number of the completed tcp connections and the number of the uncompleted tcp connections (computed data).
 1. Get a new arrival of a packet (step 302 in
The output from this process is a statistics matrix (step 324 in
The above choice represents one option. Of course, fewer, more and other features with other relations among the features can be chosen. The choice of the features vector is important and should be connected to the understanding of the “physical” nature of the underlying networks data. More features are useful. No bias is introduced by not preferring one feature over others.
ConnectionOriented Traffic Analyzer (COTA) Process
Some of our data, like the academic network and the governmental networks data, contain HTTP network traffic. As mentioned in POTA, analysis of data fields (features) of each packet, in addition to the headers of the protocol, is necessary to detect contentlevel anomalies. Since the traffic in these data is HTTP, most of the attacks in this protocol occur through the data fields. A COTA handles only the headers and the payload of the HTTP protocol (including the underlying TCP protocol). Extending our system to handle other applicationlayer protocols is possible. In order to analyze the content of the applicationlayer protocol (for example, the HTTP messages), The COTA uses the ngram analysis to statistically model the data.
Sequential Application of the COTA Process

 1. Get a new arrival of a packet (step 302 in
FIG. 3 );  2. For each packet, the analyzer parses its IP header and checks the protocol field (of the IP header) for the next level protocol used in the data portion of the datagram. In case the next level protocol is tcp, the tcp protocol handler is called. Otherwise, this packet is ignored (step 304 in
FIG. 3 );  3. The tcp handler parses the tcp header of the packet and checks its source and destination ports (of the tcp header). If none of them is 80 (the standard HTTP port) this packet is ignored (step 306 in
FIG. 19 );  4. The tcp protocol handler manages a map of tcp connections. Each connection is identified by the following quadruplet: IP address and port of the clientside (the initiator) of the connection and IP address and port of the serverside of the connection. The tcp connection is persistent for HTTP version 1.1;
 5. If a corresponding tcp connection entry (according to the source and destination IP and port of the packet) does not exist for each packet then, a new entry is added to the connections map. This entry holds the following statistics of this specific tcp (HTTP) persistent connection (step 308
FIG. 3 ). The following parameters (features) are collected: a. Total duration of the connection measured by the time the first packet, is seen during this connection and the time the last packet is seen during this connection;
 b. Data duration of the connection measured by the time of the first data packet (packet that contains any payload), is seen during this connection and the time of the last data packet seen during this connection;
 c. number of tcp packets with different tcp flags (syn, syn_ack, fin, rst, cwr, urg, psh and ece), number of tcp packets, number of control tcp packets (packets without payload), number of data tcp packets (packets with payload), number of source (client) packets, number of source control packets, number of source data packets, number of source data bytes, number of destination (server) packets, number of destination control packets, number of destination data packets, number of destination data bytes.
 6. Each packet, which contains the payload data, is passed to the HTTP protocol handler for content analysis (step 318 in
FIG. 3 ). The HTTP protocol handler collects two types of content statistics for each HTTP requestresponse messages pair in this specific HTTP persistent connection, multiple pairs of HTTP requestresponse messages can be transferred during a single HTTP persistent connection) as follows: a. The collected features which are the HTTP metadata statistics are:
 i. number of request packets;
 ii. average sizes of the request packets (computed data);
 iii. standard deviation of the sizes of the request packets (computed data);
 iv. number of response packets;
 v. average sizes of the response packets (computed data);
 vi. standard deviation of the sizes of the response packets (computed data);
 vii. number of bytes in the URI of the client request message;
 viii. number of HTTP headers of the client request message;
 ix. average sizes of the HTTP headers of the request message (computed data);
 x. standard deviation of the sizes of the HTTP headers of the request message (computed data);
 xi. number of bytes in the status line of the response message;
 xii. number of HTTP headers of the response message;
 xiii. average sizes of the HTTP headers of the response message (computed data);
 xiv. standard deviation of the sizes of the HTTP headers of the response message (computed data).
 b. Computation of the HTTP payload statistics. The ngram was tested for n=27 and it is described here for n=2. All the entries below are computed and not extracted—step 314 in
FIG. 3 —as follows: i. Frequency matrix of the ngram ascii of the whole HTTP tcp payload of the client request message;
 ii. Frequency matrix of the ngram ascii of the URI of the client request message;
 iii. Frequency matrix of the ngram ascii of the HTTP headers of the client request message;
 iv. Frequency matrix of the ngram ascii of the HTTP body of the client request message;
 v. Frequency matrix of the ngram ascii of the whole HTTP tcp payload of the server response message;
 vi. Frequency matrix of the ngram ascii of the status line of the entire server response;
 vii. Frequency matrix of the ngram ascii of the HTTP headers of the server response message;
 viii. Frequency matrix of the ngram ascii of the HTTP body of the server response message.
 a. The collected features which are the HTTP metadata statistics are:
 7. At the end of each tcp connection, the COTA summarizes the values collected by the tcp and HTTP handlers during this connection. The tcp handler produces a single series of values whereas the HTTP handler produces multiple series of values (a single series for each requestresponse pair). The traffic analyzer saves these statistics (step 340 in
FIG. 3 ).
The outputs from this process are several statistical matrices:  1. A matrix that describes the statistics of tcp connections: 21 features (values) for every tcp connection;
 2. A matrix that describes the statistics of pairs of requestresponse in HTTP metadata: 14 features for every requestresponse pair;
 3. Matrices that describe the statistics of the frequency of the ngram (in all the computation below n=2) ascii in the HTTP content: 256^{n }features for every request. Separate matrices are used for different data units (URI, headers, body and the whole payload);
 4. Matrices that describe the statistics of the frequency of the ngram ascii in HTTP content: 256^{n }features for every response. Separate matrices are used for different data units (status, headers, body and the whole payload).
For example, a typical academic network dataset provides 10 minutes of data. There are 5002,500 tcp connections per dataset and 1,00015,000 HTTP requestresponse pairs per dataset. These output matrices are the inputs for the intrusion detection processes.
 1. Get a new arrival of a packet (step 302 in
The above choice of features is one option. There can be other options to choose features. As the number of features increases so is the dimensionality of the used data for training and detection procedures.
Syntactic Analysis of the Payload
In the section on the sequential application of the COTA process, we described a generic method for payload analysis. We used the ngram analysis to statistically model the payload of the data. This method is not influenced by the payload load of the application. However, in order to increase payload accuracy analysis, the analysis methodology is adapted to payload type (according to the application that generated it).
In this section, we present a syntactic analysis of the payload. We used this method for the analysis of SQL queries, in order to detect SQL injection attacks. However, it is possible to adapt this method to any type of payload or application (as long it is possible to model the syntax of the payload).
To build our syntactic model for SQLquery, the following steps are taken:

 1. The SQL query is transformed to a new domain, where the alphabet is composed of several tokens instead of alphabetical characters;
 2. The statistics in this domain are computed.
 These steps are described below.
Building the Tokens Model:
 These steps are described below.
We transform an SQL query into a new domain. The new domain alphabet is constructed from elements called tokens. Each symbol of the SQL query is transformed and replaced by its associated token. The model was developed according to certain guidelines:
 1. Characters or words with a similar SQL grammatical meaning should be grouped under the same token;
 2. The use of larger number of tokens enables to get a better accurate description of different queries.
The different tokens with their associated symbols are described in Tables 1 and 2. Table 1 describes tokens classification.
In addition, two location tokens were added: ‘begin’—to mark the beginning of a query, and ‘end’—to mark the end of a query. These two tokens enhance the representation accuracy level. This is due to the fact that sometimes the location of an SQL phrase can indicate whether it is valid or suspicious.
Table 2 describes the location tokens.
Following is an example that demonstrates this process:
 SELECT firstName,lastName,Address WHERE Age<50
The tokensbased representation is as follows:  commandidentifierpuncidentifierpuncidentifiersupportidentifiercompliteral
By adding the location tokens, the final representation becomes:  begincommandidentifierpuncidentifierpuncidentifiersupportidentifiercompliteralend
Statistical Analysis of the Model:
 SELECT firstName,lastName,Address WHERE Age<50
In our proposed method, we apply the ngram model to SQL queries that are represented in the tokens domain. For example, by applying a 2gram on the previously given representation
 begincommandidentifierpuncidentifierpuncidentifiersupportidentifiercompliteralend
we get the token frequency in a 2gram model:
 begincommandidentifierpuncidentifierpuncidentifiersupportidentifiercompliteralend
Application of 3gram model to the same representation produces the following token frequency in a 3gram model:
By increasing n, more rare sequences are identified. This may reduce the number of false negative detections (attacks which are not identified). Unfortunately, this process can also add noise to the detection process and increase the number of false positive detections (valid queries which mistakenly identified as attacks).
The output from this syntactic analysis process contain a matrix that describes the statistics of the frequency of the ngram syntax in SQL queries. Since for the SQL syntactic analysis we defined 12 different tokens, the output matrix contains 12^{n }features for every SQL query. This output matrix is the input for the intrusion detection processes (training and detection). The above feature choice is one option. There can be other options to choose features.
Comparison Between POTA and COTA
The POTA and the COTA handle data from various layers of the OSI layers (see International Standardization Organization (ISO), ISO/IEC 74981:1994—Information technology—Open Systems Interconnection—Basic Reference Model: The Basic Model, 1994).
Table 51 summarizes the usage of each traffic analyzer and the major differences among them.
It is important to mention that these traffic analyzers can be extended to handle many other protocols.
Performance of the OFID and OLIDMD and OLIDPL Algorithms to Networks
This section describes the experimental results from the application of the OFID, OLIDMD and OLIDPL algorithms to different types of network data. The outputs are presented after the derivation of the embedding matrix.
The experimental results were derived from the application of both training and detection steps. The outputs are visualized in a picture. Every attack (intrusion, abnormality) is marked. The X, Y and Zcoordinates in the following figures represent the first, second and third eigenvectors of the first, second and third eigenvalue, respectively, as described in the detailed description of each algorithm. The Zcoordinate is only shown in some figures.
In some of the IDED such as DARPA and ONIDS, the attacks (intrusions and abnormalities) were labeled and documented by the creators of these IDEDs. Thus, we know exactly the starting and the ending time of each attack. This enables us to verify the performance of the algorithms.
In all the figures that show anomalies (abnormalities), which deviate from a normal cluster (normal activity), a group of r, r=2, 3 discriminating eigenvectors, which constitute the embedding matrix, was sufficient to describe and visualize the actual activities. Therefore, in these figures, r is either 2 or 3.
Performance of OFID: Experimental Results
Experimental Results from DARPA Data Using OFID
The OFID was tested on two different weeks from the 1999 DARPA that was described in IDED. One week contained various types and instances of attacks and the others did not contain any attacks. Since the attacks were labeled and documented by DARPA, we know exactly the starting and the ending time of each attack. This enabled to validate the output from the OFID algorithm.
All of the results above, obtained with our OFID algorithm, confirm the events in DARPA'"'"'s data.
Performance of OLIDMD: Experimental Results
Experimental Results on DARPA Data
The OLIDMD was tested on two different weeks from the 1999 DARPA data described in IDED. One week contains various types and instances of attacks and the other week does not contain any attacks. Since the attacks were labeled and documented by DARPA, we know exactly the starting and the ending time of each attack. This enables to validate the output from the OLIDMD algorithm. This section presents the experimental results in the following order:
 1. Each output displays: a) Training step: application of the Gaussian normalization and RLDM to derive the embedding matrix ψ that spans the embedded (lower dimension) space. b) Detection step: online application of the Gaussian normalization and the result from the application of outofsample extension (GH) to determine whether a newly arrived data point is normal or abnormal.
 2. The same as 1 above, but the normal behavior manifold (normal cluster) is displayed using a density function of each point. This presentation provides a visual perception of the normal and abnormal points in the manifold. It gives each point a visual score that shows how close is the point to a normal area and whether it is getting closer to an abnormal area. A point in the highest density area means that it is absolutely normal. A lesser density means that the point is normal but located in a less populated area than the denser one. Smallest (least) density means that the point is normal but located in “sparse” density area indicate that a danger is approaching. The range of densities is shown in
FIG. 22 .FIGS. 23 and 24 display respectively results for a Monday and a Friday of the week without attacks. All the newly arrived points are normal and lie in the main manifold. The rest of the days in this week were also detected correctly (not shown).FIGS. 25, 26 and 27 display respectively results for a Tuesday, Wednesday and Thursday in the week with attacks. All the newly arrived points, which are classified as abnormal, lie outside the main manifold (cluster). The attacks on the rest of the days in this week were also detected correctly (not shown). In more detail, the attacks detected on Tuesday were a: portsweep attack, started at time 45, continued at time 46 to 71 and ended just before time 72, and an ipsweep attack, started just after time 305, continued at time 306 to 330 and ended at time 331. The attacks detected on Wednesday were a satan attack, started at time 243, continued at time 244 and ended at time 245 and an ipsweep attack started at time 738, continued at time 738 to 750 and ended just before time 751. The attacks detected on Thursday were a portsweep attack started at time 170, continued at time 171 to 183 and ended at time 184, a neptune attack started at time 185, continued at time 186 to 187 and ended at time 188, and an ipsweep attack started at time 517, continued at time 518 to 520 and ended just before time 521. In each ofFIGS. 2527 , all the normal points lie on the main manifold.
Experimental Results on Data that Contain Worms
Our online worm detection algorithm was tested on seven datasets described in IDED. We used the first dataset as training data for the training step. We used the other six, single and double datasets, as the test data for the detection step. For every data point in the test data and for every time slice, a newly arrived data point is handled online by the application of the detection step in OLIDMD.
Single Worm Data

 a SYNRST scan against targets that listen to the scanned port—starts at time 2440 and ends at time 2441;
 a SYNACKRST scan against targets that listen to the scanned port—starts at time 2452 and ends at time 2453;
 a FIN scan against targets that listen to the scanned port—starts at time 2464 and ends at time 2467;
 a SYNRST scan against targets that do not listen to the scanned port—starts at time 2475 and ends at time 2477;
 a FIN scan against targets that do not listen to the scanned port—starts at time 2489 and ends at time 2490;
The OLIDMD algorithm correctly detected all worms. All the normal points lie on the main manifold while the only significant outliers are the attacks of the worm.
Multiple Worm Data
Performance of OLIDPL: Experimental Results
OLIDPL was tested on academic network data, ONIDS simulation network data and the governmental network data described in IDED. The inputs were the statistics of the payloads of these data, produced by COTA. Here we present some experimental results on the academic network data, SQL injection data (part of the ONIDS simulation network data) and the governmental network data after the application of OLIDPL.
Experimental Results from the Academic Network Data
The OLIDPL algorithm was tested on data captured from the academic networks during 48 hours starting on 12/28/06 at 00:00 AM and ending on 12/30/06 at 00:00 AM. The data from the first day (12/28/06) was used as the training set for the detection step. We chose 4,000 samples from this day as representative samples for the baseline of the normal behavior of the network activities. The data from second day (12/29/06) was used as the testing set for the detection step. We used 5 hours from this testing set in order to test our anomaly detector. Each hour contained injected HTTP attacks (described in IDED).
 1. Experimental results from content analysis of HTTP URI request: This section presents the experimental results of the training step for HTTP URI content and of the detection step for HTTP URI content.
 a. Training step for HTTP URI content—Application of the preprocessing and the RLDM to the first day with the application of the detection step for abnormal points.
FIG. 32 presents the HTTP URI baseline generated from training of the first day (12/28/06). It includes 4,000 samples, where each sample is a single representative from the coarse grained matrix (step 410 inFIG. 4 ). All the points lie on the same manifold (cluster) and there are no outliers in the training data.  b. Detection step of HTTP URI content using GH:
FIG. 33 presents the output from the detection step from all the samples captured on the second day (12/29/06) between 07:00 AM to 08:00 AM, i.e. 10,750 samples, where each sample represents a single HTTP URI. All the abnormalities were detected.
Experimental Results on SQL Injection Data
 a. Training step for HTTP URI content—Application of the preprocessing and the RLDM to the first day with the application of the detection step for abnormal points.
OLIDPL was tested on data obtained over two days from the SQL injection data. The data from the first day was used as a training set for the detection step. We chose 4,000 samples from this day as representative samples for the baseline of the normal behavior of the network activities. The data from the second day was used as test data for the detection step. We used two hours from this test in order to test our anomaly detector. The two hours contained two SQL injection attacks (described in IDED).
Experimental Results on the Governmental Network Data
The OLIDPL algorithm was tested on data obtained over two days from the governmental network data (described in IDED). These data were captured during 48 hours. The data from the first day was used as the training set for the detection step. The training set includes anomalies and attacks (polluted data). We chose 4,000 samples from this day as representative samples to be a baseline of the normal behavior of the network activities. The data from second day was used as test data for the detection step. About 770,000 HTTP requests took place during this time period. These requests include normal traffic, anomalous traffic and attacks. We used these HTTP requests to test our anomaly detector. The governmental security system uses Snort in order to detect and prevent attacks by malicious Internet users. Therefore, we used the same Snort version with the same set of rules in order to label the data and compare the performance of OLIDPL to Snort.
 1. Snort Results: Snort generated about 17,000 alerts with different levels of severity. There were many false alerts of legitimate traffic and we filtered about 99% of these false alerts. Therefore, the total number of true alerts detected by Snort was about 250. These attacks included several types, for example:
 a. Crosssite scripting attacks
 GET /heb“%20onmouseover=”this.style.textdecoration%20=‘none’;
 “%20onmouseout=”this.style.textdecoration%20=‘underline’;”>
 ministry%20of%20foreign<br>affairs%20(hebrew)</a></td>
 </images/pixel_transparent.gif“></td></tr></images/pixel
 _transparent.gif”></td></tr></table><!locate%20mission>
 <script%20language=“javascript”>%20%20%20%20try{var%20opopu
 p%20=%20
 window.createpopup( );opopup.document.createstylesheet( );}catch%20
 (e)
 %20 {%20//alert(‘top.asp%20error%20occurred(505):%20’%20+%20e.
 message
 %20+%20‘.’);%20}function%20richdropdown( ){%20%20%20%20opo
 pup. document
 .body.innerhtml%20=%20ocontexthtml.innerhtml;%20%20%20%20%2
 0opopup
 .document.stylesheets(0).csstext%20=%20document.stylesheet
 b. IIS 5.0 cross site scripting vulnerability
 POST /_vti_bin/shtml.exe/_vti_rpc HTTP/1.1
 c. Directory traversal attacks
 GET/../../Web/main/document.asp?DocumentID=119140&MissionID=
 5&tem=:HTTP/1.1
 d. Access to a potentially vulnerable web application
 GET/peace/empower.html HTTP/1.0
 e. WEBPHP remote include path
 /index.php?_REQUEST=&_REQUEST%5boption%5d=com_content
 &_REQUEST%5b
 Itemid%5d=1&GLOBALS=&mosConfig_absolute_path=http://disqad.
 za.p1/
 evil/mascansearch/shell.txt? HTTP/1.1
 f. Robots
 GET/robots.txt HTTP/1.1
 a. Crosssite scripting attacks
 2. OLIDPL Results
 a. OLIDPL detected all the real attacks detected by Snort including the crosssite scripting attacks, directory traversal attacks, access to a potentially vulnerable web application and WEBPHP remote include path that were described in the Snort results.
 b. OLIDPL detected more than 680 anomalies that were misdetected by Snort.
Examples of HTTP anomalies that were detected by OLIDPL but were undetected by Snort:
 1. Attempt to get into Google accounts
 GET
 /accounts/SetSID?ssdc=1&sidt=tgy%2BJhQBAAA%3D.Cl0ZSHtr00bXUIC7
 IRphg8iKp65zbplJEToPGcKw9z%2BB121uwD394HLf14V16%2FH6tQT8D
 uOGmNpJfX8ve6YVCzA7kA0LHLHx7q29SB5Xk%2Fv3OPcHuSE4imDP
 %2F4EtAKE2IoSeArbD9iuRA4TmGgfdKpvR70QnPLohe%2BKdV7KdrocU
 2ffi58vs0VXhGY1BuLyYX%2B4%2FCjKDhvpB0FdxnJSaL5C3ukeRjW%2
 FsJm%2Fup%2FmUEbzt8dsS25iTnecPjCsmeLHMCDrQMqwqeHBRLromu2
 4V8w%3D%3D.EKCfEaT8H85n3OBcYQxgcA%3D%3D&continue=http%3
 A%2F%2Fmail.google.com%2Fmail%2F%3Fauth%3DDQAAAHUAAAAlT
 RldzU3HIKgJoWTZs4EjF4wloF3_f8PrfsbkqpSAMZyPSgQjz0
 RN2V3wkUHdzdLvykW33ncfY6YgmvcuP2r2bE6YHrVEQxl0M2DUOt0oX
 9y0qTYFrjh7IuLkSiZnB3kzhvJ9TRj
 gj1M7zDVEiLCBK4TnsITmUPpWaDF2KIQ HTTP/1.1
 GET
 /accounts/Logout2?ilo=1&ils=s.NG&ilc=0&continue=https%3A%2F%2Fww
 w.google.com%2Faccounts%2FServiceLogin%3Fservice%3Dmail%26passiv
 e%3Dtrue%26rm%3Dfalse%26continue%3Dhttp%253A%252F%252Fmail.g
 oogle.com%252Fmail%253Fui%253Dhtml%2526zy%253D1%261tmp1%3Dde
 fault%261tmp1cache%3D2%26h1%3Den&zx=−947374576HTTP/1.1
 2. Suspicious webmail request
 GET /webmail.prodigy.net.mx/cgibin/webmail/
 Mariopergoliniteacordas.mp3?ID=I2rASbNeARRTjIgsdpYSwhYUVDcEt7
 UcZxsiw79OInxUIj12sv&Act_View=1&R_Folder=aW5ib3g=&msgID=4740
 &
 Body=2&filename=Mariopergoliniteacordas.mp3 HTTP/1.1
 3. Crawlers
 GET/crawlers/img/stat_m.php?p=0&rn=0.7413160734538396&c=1&wh=800
 x600&px=24&j=Y&s1=1.3&r=http%3A//www.sotovik.ru/catalog/reviews/No
 kia_N76
 prev.html&fr=0&pg=http%3A//www.sotovik.ru/price_new/%3Faction%3Din
 dex%26firm%3D237%26model%3D2669%26city%3D1%26white %3D1%26t
 ariff%3D21%26priceHigh%3D9999 HTTP/1.1
Examples of HTTP attacks that were detected by OLIDPL but were undetected by Snort:
 1. Nimda IIS worm attack
 GET
 /MSOffice/cltreq.asp?UL=1&ACT=4&BUILD=6551&STRMVER=4&CAPR
 EQ=0HTTP/1.1
 2. VocalTec VGW4/8 telephony gateway remote authentication bypass vulnerability
 GET /home.asp” HTTP/1.1
 3. Spam attacks
 GET /?ref=turkeylist.net HTTP/1.1
 4. Remote commands execution exploit attack
 GET/index?/cgiperl/polling/poll.p1 HTTP/1.1
 5. Attempt to gain administrator privileges
 GET
 /_vti_bin/owssvr.dll?UL=1&ACT=4&BUILD=6551&STRMVER=4&CAPR
 EQ=0
 HTTP/1.1
 6. Directory traversal attack
 GET /.../ HTTP/1.1
 7. Backslash attack
 GET /\/\/ HTTP/1.1
Comparison Between OFID, OLIDMD and OLIDPL as Intrusion Detection Algorithms and Commercial Security Tools
Worms and SQL Injection Attacks—Comparison to Cisco IDS4215
We tested the capabilities and the performance of Cisco IDS4215 on two datasets:
 GET /\/\/ HTTP/1.1
 1. Snort Results: Snort generated about 17,000 alerts with different levels of severity. There were many false alerts of legitimate traffic and we filtered about 99% of these false alerts. Therefore, the total number of true alerts detected by Snort was about 250. These attacks included several types, for example:
 1. Worm attack data from IDED to test the anomaly detection capabilities of Cisco IDS4215.
 2. SQL injection attack data from IDED to detect the signature capabilities of Cisco IDS4215.
The anomaly detection engine of Cisco IDS4215 was tested using seven worm datasets described in IDED and tested with OLIDMD. Cisco IDS4215 was tested online while simulating different network scenarios. First, we used the normal dataset as the training data for the training step by setting the anomaly detection sensor of Cisco IDS4215 to learning mode. Then, after the training period was completed, we began the testing step by setting the anomaly detection sensor of Cisco IDS4215 to detection mode. We used the single and multiple worm data (six datasets) as the test data for the detection step. For every simulation datasets, we recorded the logs generated by the Cisco IDS4215 device. At the end of each simulation, we analyzed the anomaly detection engine logs. To summarize the results:  1. Training step (learning mode) using the normal datasets.
 a. Normal data—while the anomaly detection engine was in learning mode, the other signature based engines (Snort, CISCO, Imperva, Apache ModSecurity) produced hundreds of false alerts (for example, TCP SYN Flood). For the rest of the test, we ignored these engines due to the following main reasons:
 i. These engines do not have any learning capabilities therefore, they will continue to produce hundreds of false alerts during the testing step;
 ii. The goal of the test is to compare the capabilities and the performance of the anomaly detection engine of Cisco IDS4215 with OLIDPL and not to test the other static IDS engines that are signatures based.
 a. Normal data—while the anomaly detection engine was in learning mode, the other signature based engines (Snort, CISCO, Imperva, Apache ModSecurity) produced hundreds of false alerts (for example, TCP SYN Flood). For the rest of the test, we ignored these engines due to the following main reasons:
 2. Testing step (detection mode) using the single worm and the multiple worm data. The “detection” results refer to the detection of anomalous activities by the anomaly detection engine of Cisco IDS4215.
Table 6 summarizes the testing results of the anomaly detectors of Cisco IDS4215 and OLIDMD.
Evaluating the Signature Capabilities of Cisco IDS4215 Using SQL Injection Attack Data
The signature detection engine of Cisco IDS4215 was tested using the two categories of SQL injection data described in IDED and tested with OLIDMD. Cisco IDS4215 was tested online while simulating the different SQL injection attacks. We used 5 well known SQL injection attacks and 5 unknown SQL injection attacks for the testing. To summarize the results:
 1. Known attacks: this data contains 5 different known SQL injection attacks—detection of one attack out of the five attacks by the signature engine of Cisco IDS4215;
 2. Unknown attacks: this data contains 5 different unknown SQL injection attacks—zero detection of the attacks by the signature engine of Cisco IDS4215.
Table 7 summarizes the test results of the SQL injection attacks of Cisco IDS4215 and OLIDMD.
Evaluating the Functionalities of Cisco IDS4215
Table 8 summarizes the research of the functionalities and the capabilities of Cisco IDS4215 (see CISCO4215) in comparison with OLIDMD:
SQL Injection Attacks: Comparison to Snort, Apache ModSecurity and Imperva SecureSphere 5.0
Snort (see SNORTM) and Apache ModSecurity (see MODSEC) are signaturebased intrusion detection systems. Imperva SecureSphere 5.0 (see IMPERVA) is a signaturebased and anomalybased intrusion detection system. We tested the performance of Snort, Apache ModSecurity and Imperva SecureSphere 5.0 using the SQL injection attack data in IDED. We trained Imperva SecureSphere 5.0 using a training dataset. We used the two categories of SQL injection data in IDED and tested with OLIDMD. Snort, Apache ModSecurity and Imperva SecureSphere 5.0 were tested online while simulating the different SQL injection attacks. We used 5 well known SQL injection attacks and 5 unknown SQL injection attacks for the testing. To summarize the results:
 1. Known attacks: this data contains 5 different known SQL injection attacks—detection of 4 attacks by Apache ModSecurity and Imperva SecureSphere 5.0 and one attack by Snort;
 2. Unknown attacks: this data contains 5 different unknown SQL injection attacks—zero detection of the attacks by Snort, Apache ModSecurity and Imperva SecureSphere 5.0.
Table 9 summarizes the testing results of the SQL injection attacks of Snort, Apache ModSecurity, Imperva SecureSphere 5.0 and OLIDMD.
HTTP Attacks: Comparison to Snort
The governmental security system (IDED) uses Snort in order to detect and prevent attacks by malicious Internet users. We used the same Snort version with the same set of rules as used by the governmental security system in order to label the data and compare between the performance of OLIDPL and Snort. We used the governmental network HTTP data (from IDED) for comparing between both systems. Table 10 summarizes the testing results of Snort and OLIDPL. For each tested system we present the following parameters:
 1. Number of tested URIs: The total number of URIs that were tested for attacks/anomaly detection;
 2. Number of false alerts: The number of normal and legitimate HTTP URIs that were mistakenly identified by the detecting system as attacks/anomalies (false positive);
 3. Rating false alerts: The rate of false alerts out of the total number of tested URIs;
 4. Number of known attacks: The number of HTTP URIs that were identified by the detection system as attacks and are known as attacks. Since Snort is a signaturebased system, we categorized each true alert that was identified by Snort as a known attack. In other words, in case Snort finds a match to a signature for a true attack, then, this attack must be a known attack. For example, GET /robots.txt HTTP/1.1;
 5. Number of unique known attacks: The number of unique known attacks from the total number of known attacks;
 6. Number of unknown attacks: The number of HTTP URIs that were identified by the detection system as attacks but are unknown to Snort as attacks. However, these attacks were confirmed as true attacks. Since Snort is a signaturebased system, we categorized each such true alert that was not identified by Snort as an unknown attack. In other words, in case Snort has no matched signature for a true attack, this attack must be an unknown attack (for Snort). Each such unknown attack was confirmed as a true attack according to several security websites and forums. For example,
 GET
 /MSOffice/cltreq.asp?UL=1&ACT=4&BUILD=6551&STRMVER=4
 &CAPREQ=0 HTTP/1.1;
 7. Number of unique unknown attacks: The number of unique unknown attacks from the total number of unknown attacks;
 8. Number of anomalies: The number of HTTP URIs that were identified by the detection system as anomalies but were not confirmed as known attacks (according to security websites and forums). For example,
 GET
 /accounts/Logout2?ilo=1&ils=s.NG&ilc=0&continue=https%3A%2F%2Fww
 w.google.com%2Faccounts%2FServiceLogin%3Fservice%3Dmail%26passiv
 e%3Dtrue%26rm%3Dfalse%26continue%3Dhttp%253A%252F%252Fmail.g
 oogle.com%252Fmail%253Fui%253Dhtml%2526zy%253D1%261tmp1%3Dde
 fault%261tmp1cache%3D2%26h1%3Den&zx=−947374576HTTP/1.1
OLIDPL detected most of the attacks detected by Snort. However, OLIDPL generated much less false alerts than Snort (≈98%). Furthermore, OLIDPL detected several serious attacks that were misdetected by Snort since its database does not contain their signatures.
Detection of Anomalies in Spectral Images
OFID was applied to detect abnormalities in a multidimensional hyperspectral (source) image of size 1200×800×180 (also called image cube), where 180 is the number of wavelengths processed to detect abnormalities located in a pixel or subpixel in the image cube. In this example, a data point is a vector including all the values of all the wavelengths of a pixel, and may also be called a “multipixel” or a “spectral signature”. An abnormality is classified as a pixel whose spectral signature is different from its surrounding pixels. The input data does not have to be scaled since the inputs are pixels with the same numeric range. The data is normalized. A traffic analyzer parsed the data and computed its statistics. The distances among the pixels are computed between normalized vectors of multipixels. To find these abnormalities, OFID is applied to reduce the number of wavelengths in the image cube to get an embedded space.
The same procedure was applied to medical imaging. Hyperspectral imaging was performed on a mouse with suspected tumors using the CRi Maestro imaging system (Cambridge Research Instruments, Cambridge, Mass., USA).
Detection of Anomalies (Defects) in Semiconductor Wafers (Semiconductor Process Monitoring)
OFID was applied to detect defects in semiconductor wafers. One way to find features that are based on the relation of a pixel with its neighborhood is described in
Performance Monitoring and Problem Avoidance
This example illustrates the detection of performance bottlenecks through identification of anomalies (as was explained above) in data that changes all the time via monitoring the performance of a computer. Assume there is a large computer system that serves for example a large commercial financial application. The system cannot afford to be down, but crashes because it is loaded with nonstop incoming transactions. A monitor is installed inside this system. This monitor collects each time unit many parameters like CPU utilization, queues of transactions, virtual memory usage, disk usage, communication queues, etc. A vector of these parameters represents a data point. After a long period of time, we get a large volume of data in which each entry is a row of collected numbers per time slice (time unit interval). Each row represents all the recorded numbers that time unit interval. This data has to be analyzed constantly to inform the operator if there is a buildup of a problem such as deadlocking, queues overload, or communication problems that may crash the system.
The training procedure establishes the normal behavior cluster. In time of normal behavior, the embedded space from the application of OLIDMD algorithm on the designated training data has a well established profile (“normal cluster”) as shown in
Human Health Monitoring
Current health monitoring practices provide physiological data on a patient, the data comprised of physiological parameters which can be acquired simultaneously (in a synchronized way in realtime. These parameters include (but are not limited to) pulse, temperature, heart rate, EKG data, EEG data, blood oxygenation, blood sugar and blood pressure. A data point here is a vector of a plurality of any combination of physiological parameters (preferably more than 3 parameters), acquired simultaneously) per time unit (e.g. per second). Therefore, in each time unit, one obtains multidimensional data. Training may be performed to obtain “normal” behavior, i.e. a “healthy” status. Application of the detection step to continuously monitored data can then be used to detect anomalies, which are indicative of a health problem.
Detection of Mastitis
The methods of the invention can be applied to detect abnormalities during cow milking. Training and detection can be performed per cow udder quarter or per udder (four quarters). The data may include such parameters as milk electrical conductivity, milk flow rate, total milk quantity, milk optical parameters such as transparency and reflectivity, spectral data, acoustical data, somatic cell count and the like. A data point here is a vector of a plurality of any combination of parameters (preferably more than 3 parameters), acquired simultaneously (in a synchronized way) per time unit (e.g. per second). Milk spectra may be obtained in realtime using a regular inline spectrometer or the spectral imaging method described in “Snapshot spectral imaging systems and methods”, Michael A. Golub, Menachem Nathan and Amir Averbuch, PCT Application PCT/IL2007/000926. Note that each spectroscopy or spectral imaging measurement can provide tens of parameters (since an intensity value at one wavelength can be a “parameter” in a data point) and can be synchronized with measurements of electrical conductivity, flow rate, milk amount and the like. Therefore, in each time unit, one obtains multidimensional data. Training may be performed to obtain “normal” behavior, i.e. the mastitisfree condition of milk for each cow. Application of the detection step to continuously monitored data can then be used to detect abnormalities, which are indicative of mastitis.
OFID, OLIDMD and OLIDPL anomalybased systems contradict the following assumptions:

 1. Signaturebased systems generate less false alerts than anomalybased systems;
 2. Signaturebased systems detect many more known attacks than anomalybased systems;
 3. Automatic anomalybased systems are not practical in real world since there is no efficient way to learn the traffic in a real environment that includes anomalies (the problem of polluted training set);
 4. Unsupervised anomaly detection is not practical in the real world.
The methods and systems disclosed herein do not:  1. Rely on signatures of threats (yesterday'"'"'s attacks);
 2. Model/compare to RFCs of protocols—model parameters and compare to thresholds;
 3. Define what is right (normal) and what is wrong (abnormal);
 4. Include any bias by preferring some parameters over others;
 5. Define different analysis methods for different protocols/applications.
The invention may be applied to any multidimensional data, offline and online. In particular, it can be applied to provide:  a unified threat manager for network health (external and internal attacks to perform intrusion detection and prevention—such as adaptive intrusion prevention system (IPS), application fire wall, antispam, client protection, database protection, detection of Trojan horses, etc.);
 smart phone protection;
 base station protection;
 detection and prevention of SQL injection;
 risk management in financial transactions;
 fraud detection;
 performance monitoring;
 prediction and tracking of emerging problems and problem avoidance;
 guarantee QoS in networks;
 network behavior analysis (NBA);
 acoustics processing;
 electronic intelligence and cyber threat management;
 automatic generation of user/resource profiling to detect deviations from a profile;
 data leakage prevention;
 protocol/application classification and identification;
 defect detection in wafers;
 hyperspectral processing for medical, civilian and military applications,
 process control;
 mastitis detection.
The various features and steps discussed above, as well as other known equivalents for each such feature or step, can be mixed and matched by one of ordinary skill in this art to perform methods in accordance with principles described herein. Although the disclosure has been provided in the context of certain embodiments and examples, it will be understood by those skilled in the art that the disclosure extends beyond the specifically described embodiments to other alternative embodiments and/or uses and obvious modifications and equivalents thereof. Accordingly, the disclosure is not intended to be limited by the specific disclosures of embodiments herein. For example, any digital computer system can be configured or otherwise programmed to implement the methods disclosed herein, and to the extent that a particular digital computer system is configured to implement the methods of this invention, it is within the scope and spirit of the present invention. Once a digital computer system is programmed to perform particular functions pursuant to computerexecutable instructions from program software that implements the present invention, it in effect becomes a special purpose computer particular to the present invention. The techniques necessary to achieve this are well known to those skilled in the art and thus are not further described herein.
Computer executable instructions implementing the methods and techniques of the present invention can be distributed to users on a computerreadable medium and are often copied onto a hard disk or other storage medium. When such a program of instructions is to be executed, it is usually loaded into the random access memory of the computer, thereby configuring the computer to act in accordance with the techniques disclosed herein. All these operations are well known to those skilled in the art and thus are not further described herein. The term “computerreadable medium” encompasses distribution media, intermediate storage media, execution memory of a computer, and any other medium or device capable of storing for later reading by a computer a computer program implementing the present invention.
Accordingly, drawings, tables, and description disclosed herein illustrate technologies related to the invention, show examples of the invention, and provide examples of using the invention and are not to be construed as limiting the present invention. Known methods, techniques, or systems may be discussed without giving details, so to avoid obscuring the principles of the invention. As it will be appreciated by one of ordinary skill in the art, the present invention can be implemented, modified, or otherwise altered without departing from the principles and spirit of the present invention. Therefore, the scope of the present invention should be determined by the following claims and their legal equivalents.
All patents, patent applications and publications mentioned in this specification are herein incorporated in their entirety by reference into the specification, to the same extent as if each individual patent, patent application or publication was specifically and individually indicated to be incorporated herein by reference. In addition, citation or identification of any reference in this application shall not be construed as an admission that such reference is available as prior art to the present invention.