System and method for determining network application signatures using flow payloads
First Claim
1. A method for profiling network traffic of a network, comprising:
- obtaining, from the network traffic, a plurality of flows generated by a plurality of servers executing one or more network applications in the network, wherein a five tuple comprising a source IP-address, a destination IP-address, a source port, a destination port, and a transport protocol is same for each of a plurality of packets in a first flow of the plurality of flows;
identifying, using a processor of a computer system, a training set from the plurality of flows by;
determining that a pair comprising a port number and the transport protocol is same for each of the plurality of flows;
determining a number of servers for the plurality of servers as exceeding a pre-determined server diversity threshold;
determining a number of flows for the plurality of flows as exceeding a pre-determined training set size threshold; and
determining a statistical deviation in contributions of each of the plurality of servers to the plurality of flows as being less than a pre-determined server contribution deviation threshold,wherein the training set comprises a plurality of captured payloads corresponding to the plurality of flows;
identifying, from the one or more network applications based on a pre-determined criterion, a unique network application associated with the port number and the transport protocol, wherein a portion of the plurality of flows associated with at least a first server of the plurality of servers is generated responsive to at least the first server executing the unique network application;
determining, using the processor and from the training set that exceeds the pre-determined training set size threshold, a first signature term of the unique network application based on a first pre-determined algorithm; and
determining, using the processor, a second server in the network as executing the unique network application by analyzing, based on at least the first signature term, a second flow generated by the second server.
2 Assignments
0 Petitions
Accused Products
Abstract
A method for profiling network traffic of a network is presented. The method includes obtaining a cohesive flow-set based on a (port number, transport protocol) pair, identifying a statistically representative training set from the flow-set, identifying a network application associated with the (port number, transport protocol) pair, determining a packet content based signature term of the network application based on the training set, generate a nondeterministic finite automaton (NFA) using the signature terms to represent regular expressions in the training set, matching a portion of a new flow to the NFA in real time and identify a server attached to the new flow as executing the network application, and generate an alert in response to the match for blocking the new flow prior to the server completing a task performed using the new flow.
-
Citations
33 Claims
-
1. A method for profiling network traffic of a network, comprising:
-
obtaining, from the network traffic, a plurality of flows generated by a plurality of servers executing one or more network applications in the network, wherein a five tuple comprising a source IP-address, a destination IP-address, a source port, a destination port, and a transport protocol is same for each of a plurality of packets in a first flow of the plurality of flows; identifying, using a processor of a computer system, a training set from the plurality of flows by; determining that a pair comprising a port number and the transport protocol is same for each of the plurality of flows; determining a number of servers for the plurality of servers as exceeding a pre-determined server diversity threshold; determining a number of flows for the plurality of flows as exceeding a pre-determined training set size threshold; and determining a statistical deviation in contributions of each of the plurality of servers to the plurality of flows as being less than a pre-determined server contribution deviation threshold, wherein the training set comprises a plurality of captured payloads corresponding to the plurality of flows; identifying, from the one or more network applications based on a pre-determined criterion, a unique network application associated with the port number and the transport protocol, wherein a portion of the plurality of flows associated with at least a first server of the plurality of servers is generated responsive to at least the first server executing the unique network application; determining, using the processor and from the training set that exceeds the pre-determined training set size threshold, a first signature term of the unique network application based on a first pre-determined algorithm; and determining, using the processor, a second server in the network as executing the unique network application by analyzing, based on at least the first signature term, a second flow generated by the second server. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A method for profiling network traffic of a network, comprising:
-
obtaining, from the network traffic, a plurality of flows associated with a plurality of servers executing one or more network applications in the network, wherein a five tuple comprising a source IP-address, a destination IP-address, a source port, a destination port, and a transport protocol is same for each of a plurality of packets in a first flow of the plurality of flows; identifying, using a processor of a computer system, a training set from the plurality of flows by; determining that a pair comprising a port number and the transport protocol is same for each of the plurality of flows; determining a number of servers for the plurality of servers as exceeding a pre-determined server diversity threshold; determining a number of flows for the plurality of flows as exceeding a pre-determined training set size threshold; and determining a statistical deviation in contributions of each of the plurality of servers to the plurality of flows as being less than a pre-determined server contribution deviation threshold, wherein the training set comprises a plurality of captured payloads corresponding to the plurality of flows; identifying, from the one or more network applications based on a pre-determined criterion, a unique network application associated with the port number and the transport protocol, wherein a portion of the plurality of flows associated with at least a first server of the plurality of servers is generated responsive to at least the first server executing the unique network application; determining, using the processor and from the training set, a first signature term of the unique network application based on a first pre-determined algorithm, wherein determining the first signature term of the unique network application comprises; dividing the plurality of captured payloads into a plurality of groups; identifying a first longest common substring of two or more captured payloads in a first group of the plurality of groups using a second pre-determined algorithm; and determining the first longest common substring as the first signature term based on a first probability of occurrence of the first longest common substring in the plurality of captured payloads exceeding a pre-determined noise threshold; and determining, using the processor, a second server in the network as executing the unique network application by analyzing, based on at least the first signature term, a second flow associated with the second server. - View Dependent Claims (8, 9, 10, 11)
-
-
12. A system for profiling network traffic of a network, comprising:
-
a data collector configured to obtain, from the network traffic, a plurality of flows generated by a plurality of servers executing one or more network applications in the network, wherein a five tuple comprising a source IP-address, a destination IP-address, a source port, a destination port, and a transport protocol is same for each of a plurality of packets in a first flow of the plurality of flows; a statistical analyzer configured to identify a training set from the plurality of flows by; determining that a pair comprising a port number and the transport protocol is same for each of the plurality of flows; determining a number of servers for the plurality of servers as exceeding a pre-determined server diversity threshold; determining a number of flows for the plurality of flows as exceeding a pre-determined training set size threshold; and determining a statistical deviation in contributions of each of the plurality of servers to the plurality of flows as being less than a pre-determined server contribution deviation threshold, wherein the training set comprises a plurality of captured payloads corresponding to the plurality of flows; a signature generator configured to extract, from the training set that exceeds the pre-determined training set size threshold, a first signature term based on a first pre-determined algorithm; and a processor and memory storing instructions when executed by the processor comprising functionalities to; identify, from the one or more network applications based on a pre-determined criterion, a unique network application associated with the port number and the transport protocol, wherein a portion of the plurality of flows associated with at least a first server of the plurality of servers is generated responsive to at least the first server executing the unique network application; and determine a second server in the network as executing the unique network application by analyzing, based on at least the first signature term, a second flow generated by the second server. - View Dependent Claims (13, 14, 15, 16, 17)
-
-
18. A system for profiling network traffic of a network, comprising:
-
a data collector configured to obtain, from the network traffic, a plurality of flows associated with a plurality of servers executing one or more network applications in the network, wherein a five tuple comprising a source IP-address, a destination IP-address, a source port, a destination port, and a transport protocol is same for each of a plurality of packets in a first flow of the plurality of flows; a statistical analyzer configured to identify a training set from the plurality of flows by; determining that a pair comprising a port number and the transport protocol is same for each of the plurality of flows; determining a number of servers for the plurality of servers as exceeding a pre-determined server diversity threshold; determining a number of flows for the plurality of flows as exceeding a pre-determined training set size threshold; and determining a statistical deviation in contributions of each of the plurality of servers to the plurality of flows as being less than a pre-determined server contribution deviation threshold, wherein the training set comprises a plurality of captured payloads corresponding to the plurality of flows; a signature generator configured to determine, from the training set, a first signature term based on a first pre-determined algorithm, wherein determining the first signature term comprises; dividing the plurality of captured payloads into a plurality of groups; identifying a first longest common substring of two or more captured payloads in a first group of the plurality of groups using a pre-determined algorithm; and determining the first longest common substring as the first signature term based on a first probability of occurrence of the first longest common substring in the plurality of captured payloads exceeding a pre-determined noise threshold; and a processor and memory storing instructions when executed by the processor comprising functionalities to; identify, from the one or more network applications based on a pre-determined criterion, a unique network application associated with the port number and the transport protocol, wherein a portion of the plurality of flows associated with at least a first server of the plurality of servers is generated responsive to at least the first server executing the unique network application; and determine a second server in the network as executing the unique network application by analyzing, based on at least the first signature term, a second flow associated with the second server. - View Dependent Claims (19, 20, 21, 22)
-
-
23. A non-transitory computer readable medium embodying instructions for profiling network traffic of a network, the instructions when executed by a processor comprising functionality for:
-
obtaining, from the network traffic, a plurality of flows generated by a plurality of servers executing one or more network applications in the network, wherein a five tuple comprising a source IP-address, a destination IP-address, a source port, a destination port, and a transport protocol is same for each of a plurality of packets in a first flow of the plurality of flows; identifying a training set from the plurality of flows based on a first pre-determined algorithm, wherein the training set comprises a plurality of captured payloads corresponding to the plurality of flows; identifying, from the one or more network applications based on a pre-determined criterion, a unique network application associated with the port number and the transport protocol, wherein a portion of the plurality of flows associated with at least a first server of the plurality of servers is generated responsive to at least the first server executing the unique network application; determining, from the training set that exceeds the pre-determined training set size threshold, a first signature term of the unique network application by; dividing the plurality of captured payloads into a plurality of groups; identifying a first longest common substring of two or more captured payloads in a first group of the plurality of groups using a second pre-determined algorithm; and determining the first longest common substring as the first signature term based on a first probability of occurrence of the first longest common substring in the plurality of captured payloads exceeding a pre-determined noise threshold; and determining a second server in the network as executing the unique network application by analyzing, based on at least the first signature term, a second flow generated by the second server. - View Dependent Claims (24, 25, 26, 27, 28, 29)
-
-
30. A non-transitory computer readable medium embodying instructions for profiling network traffic of a network, the instructions when executed by a processor comprising functionality for:
-
obtaining, from the network traffic, a plurality of flows that are generated by a plurality of servers executing one or more network applications in the network, wherein a five tuple comprising a source IP-address, a destination IP-address, a source port, a destination port, and a transport protocol is same for each of a plurality of packets in a first flow of the plurality of flows; identifying a training set from the plurality of flows based on a first pre-determined algorithm, wherein the training set comprises a plurality of captured payloads corresponding to the plurality of flows; identifying, from the one or more network applications based on a pre-determined criterion, a unique network application associated with the port number and the transport protocol, wherein a portion of the plurality of flows associated with at least a first server of the plurality of servers is generated responsive to at least the first server executing the unique network application; determining, from the training set, a first signature term of the unique network application by; dividing the plurality of captured payloads into a plurality of groups; obtaining a plurality of longest common substrings by identifying one or more longest common substring of each pair of captured payloads in each of the plurality of groups; identifying a first longest common substring of two or more captured payloads in a first group of the plurality of groups using a second pre-determined algorithm, wherein the plurality of longest common substrings comprises the first longest common substring identified from the first group and a second longest common substring identified from a second group of the plurality of groups; and determining the first longest common substring as the first signature term based on a first probability of occurrence of the first longest common substring in the plurality of captured payloads exceeding a pre-determined noise threshold; and determining a second server in the network as executing the unique network application by analyzing, based on at least the first signature term, a second flow associated with the second server. - View Dependent Claims (31, 32, 33)
-
Specification