Methods and Systems for Estimating Entropy

0Associated
Cases 
0Associated
Defendants 
0Accused
Products 
15Forward
Citations 
0
Petitions 
1
Assignment
First Claim
1. A method for estimating entropy, comprising:
 setting an allowable error to deciding a sample number according to a total of packets in a stream data;
randomly sampling a plurality of locations in the stream data;
applying to a count sketch algorithm to each packet for performing the count and update;
applying the counter sketch algorithm to the each packet for performing the query according to flow IDs of the plurality of locations;
recording each query result of the each packet onto an entropy table;
obtaining an average of a counter used in each same row in the entropy table;
selecting a median to be an estimation value from the averages; and
obtaining an estimation entropy value according to the estimation value.
1 Assignment
0 Petitions
Accused Products
Abstract
It is a challenge task to conduct Entropy computation on the attributes of packet header in highspeed networks. Motivated by Ashwin Lall et al., we present a streambased scheme to estimate to the entropy norm based on Count Sketch algorithm. The system is implemented on a NetFPGA10G platform. It is capable of processing IP packets and computing the entropy in 30 Gbps line rate.
20 Citations
View as Search Results
ANONYMIZATION OF TRAFFIC PATTERNS OVER COMMUNICATION NETWORKS  
Patent #
US 20170104675A1
Filed 10/07/2015

Current Assignee
International Business Machines Corporation

Sponsoring Entity
International Business Machines Corporation

Anonymization of traffic patterns over communication networks  
Patent #
US 9,866,532 B2
Filed 10/07/2015

Current Assignee
International Business Machines Corporation

Sponsoring Entity
International Business Machines Corporation

Automated forensics of computer systems using behavioral intelligence  
Patent #
US 9,979,739 B2
Filed 01/15/2014

Current Assignee
Palo Alto Networks UK Ltd.

Sponsoring Entity
Palo Alto Networks UK Ltd.

Identifying anomalous messages  
Patent #
US 9,979,742 B2
Filed 10/06/2016

Current Assignee
Palo Alto Networks UK Ltd.

Sponsoring Entity
Palo Alto Networks UK Ltd.

Detecting denial of service attacks on communication networks  
Patent #
US 10,027,694 B1
Filed 03/28/2016

Current Assignee
Amazon Technologies

Sponsoring Entity
Amazon Technologies

Anonymization of traffic patterns over communication networks  
Patent #
US 10,057,146 B2
Filed 04/11/2017

Current Assignee
International Business Machines Corporation

Sponsoring Entity
International Business Machines Corporation

Detection of anomalous administrative actions  
Patent #
US 10,075,461 B2
Filed 05/31/2015

Current Assignee
Palo Alto Networks UK Ltd.

Sponsoring Entity
Palo Alto Networks UK Ltd.

Anonymization of traffic patterns over communication networks  
Patent #
US 10,178,004 B2
Filed 11/28/2017

Current Assignee
International Business Machines Corporation

Sponsoring Entity
International Business Machines Corporation

Anonymization of traffic patterns over communication networks  
Patent #
US 10,193,776 B2
Filed 11/28/2017

Current Assignee
International Business Machines Corporation

Sponsoring Entity
International Business Machines Corporation

Anonymization of traffic patterns over communication networks  
Patent #
US 10,250,472 B2
Filed 11/28/2017

Current Assignee
International Business Machines Corporation

Sponsoring Entity
International Business Machines Corporation

Anonymization of traffic patterns over communication networks  
Patent #
US 10,277,486 B2
Filed 11/28/2017

Current Assignee
International Business Machines Corporation

Sponsoring Entity
International Business Machines Corporation

Anonymization of traffic patterns over communication networks  
Patent #
US 10,298,473 B2
Filed 11/28/2017

Current Assignee
International Business Machines Corporation

Sponsoring Entity
International Business Machines Corporation

Detecting anomaly action within a computer network  
Patent #
US 10,356,106 B2
Filed 03/21/2016

Current Assignee
Palo Alto Networks UK Ltd.

Sponsoring Entity
Palo Alto Networks UK Ltd.

Systems and methods for log and snort synchronized threat detection  
Patent #
US 10,462,170 B1
Filed 11/21/2017

Current Assignee
Alert Logic Incorporated

Sponsoring Entity
Alert Logic Incorporated

Identifying changes in use of user credentials  
Patent #
US 10,686,829 B2
Filed 09/04/2017

Current Assignee
Light Cyber Ltd.

Sponsoring Entity
Palo Alto Networks UK Ltd.

SYSTEM AND METHOD FOR ANALYZING STREAMS AND COUNTING STREAM ITEMS ON MULTICORE PROCESSORS  
Patent #
US 20090031175A1
Filed 07/26/2007

Current Assignee
International Business Machines Corporation

Sponsoring Entity
International Business Machines Corporation

ESTIMATING ORIGINDESTINATION FLOW ENTROPY  
Patent #
US 20090285117A1
Filed 05/16/2008

Current Assignee
ATT Intellectual Property I LP

Sponsoring Entity
ATT Intellectual Property I LP

Compressed prefix trees and estDec+ method for finding frequent itemsets over data streams  
Patent #
US 20070198548A1
Filed 11/27/2006

Current Assignee
Industry Academic Cooperation Foundation

Sponsoring Entity
Industry Academic Cooperation Foundation

SCHEDULING EVENT STREAMS DEPENDING ON CONTENT INFORMATION DATA  
Patent #
US 20140165068A1
Filed 12/05/2013

Current Assignee
International Business Machines Corporation

Sponsoring Entity
International Business Machines Corporation

METHODS, SYSTEMS, AND COMPUTER READABLE MEDIA FOR RAPID FILTERING OF OPAQUE DATA TRAFFIC  
Patent #
US 20150052601A1
Filed 03/13/2013

Current Assignee
University of North Carolina At Chapel Hill

Sponsoring Entity
University of North Carolina At Chapel Hill, Board of Regents of the University of Michigan, SRI International Inc.

20 Claims
 1. A method for estimating entropy, comprising:
setting an allowable error to deciding a sample number according to a total of packets in a stream data; randomly sampling a plurality of locations in the stream data; applying to a count sketch algorithm to each packet for performing the count and update; applying the counter sketch algorithm to the each packet for performing the query according to flow IDs of the plurality of locations; recording each query result of the each packet onto an entropy table; obtaining an average of a counter used in each same row in the entropy table; selecting a median to be an estimation value from the averages; and obtaining an estimation entropy value according to the estimation value.  View Dependent Claims (2, 3, 4, 5, 6, 7)
 8. A system for estimating entropy, comprising:
a measuring module, setting an allowable error to deciding a sample number according to a total of packets in a stream data; a sampling module, coupled to the measuring module, randomly sampling a plurality of locations in the stream data; a calculation and update module, coupled to the sampling module, applying to a count sketch algorithm to each packet for performing the count and update; a query module, coupled to the calculation and update module, applying the counter sketch algorithm to the each packet for performing the query according to flow IDs of the plurality of locations; and a storage module, coupled to the query module and the calculation and update module, obtaining an average of a counter used in each same row in the entropy table, and selecting a median to be an estimation value from the averages, and obtaining an estimation entropy value according to the estimation value.  View Dependent Claims (9, 10, 11, 12, 13, 14)
 15. The method for estimating entropy, comprising:
applying a data streaming algorithm for estimating entropy for deciding a sampling number and randomly sampling a plurality of locations in a stream data; and applying a count sketch algorithm to obtain a estimation entropy value.  View Dependent Claims (16, 17, 18, 19)
 20. The method for estimating entropy, wherein the plurality of locations is an array having g×
 z locations, and the entropy table is an array table, wherein the g and z are selected by a equation, and the equation is as follows;
 z locations, and the entropy table is an array table, wherein the g and z are selected by a equation, and the equation is as follows;
1 Specification
1. Field of the Invention
This invention generally relates to a method and system for estimating entropy, and more particularly, to a method and system for entropy which is adopted to flow analyze of highspeed network.
2. Description of the Prior Art
In recent year, the network has developed very fast and the transmission rate is growing up every day, and therefore, the cyber attack and the unusual behavior are becoming more prevalent, for example, worm, port scan, distributed denial of service (DDoS), address scan, etc. These cyber attacks and the unusual behaviors may affect the normal network environment and the user. At network observation, we may analyze and count the packet header information to observe whether the abnormal state in the network happens.
Entropy, which measures the degree of concentration and dispersion of a given feature space, is utilized as important indicator of changes of network traffic behavior. The higher entropy value indicates the degree of dispersion, and the lower entropy value indicates the degree of concentration. The entropy value is analyzed according to the packet header information so as to understand the change of the network flow distribution and search out whether the cyber attack or the unusual behavior exists. Take, for example, DDoS, many IP addresses from various source are transmitted to the same destination IP address meanwhile the kind of source IP addresses are increased and the attacked packets of destination IP address are increased. Thereby, distribution of network flow will be changed. These special distributions will change the entropy value, and if the distributions focus on some specific flow ID to low the entropy value, but if the flow ID are averagely dispersed so as to increase the entropy value. Thereby, the analyze of entropy value is widely applied in the associated application analyze of the network security for searching out the distribution feature of network flow so as to conclude whether the cyber attack or the unusual behavior occurs.
In highspeed network, a mass of data is needed to analyze and count in short time. If the mass of data are processed by the software, it needs to spend a long operation time and a hung storage space. Moreover, in the information theory, the entropy value is used to measure the concentration degree of data and count all the packet header information right time, and it must spend much time and storage source to calculate the entropy value. The tradition entropy equation is represented as follow (1), wherein m is the total number of packets, m_{i }is the packet number of flow ID iε[n], n is the kind of packets.
For calculating the entropy value in the high speed network environment, Ashwin Lall offer data streaming algorithms for estimating entropy of network traffic to improve the equation of entropy value as follows (2). Then, the S value is deduced according to the equation (2), and the deduced equation of S value is represented as follows (3). Ashwin Lall believes that the entropy value H is calculated according to the estimation of S value. So the S value is estimated by the data streaming algorithms for estimating entropy, and then the S value is imported into the equation (4) to obtain the last estimation entropy value.
The data streaming algorithms for estimating entropy being a spinoff of the algorithm according to AMS algorithm is represented at Table 1. The data streaming algorithms is mainly divided to three phases, at the first phase, g×z locations are randomly sampled in the stream data, the packet number m must be known and set the allowable error before sampling, so as to determine the number of sample, and regarding to the select equation of g and z, please refer equation (5). The second phase is divided to two portions: update and sample. The flow IDs of all packets are compared in the stream data in update portion. The counter c if someone flow ID was sampled.
The algorithm of Table 2 is simply concluded according to the algorithm of Table 1:
In the environment of highspeed network, the software manner must spend much operation time caused that it can not detect the unusual immediately, so S. Nagalakshmi uses FPGA hardware to implant the data streaming algorithm for estimating entropy disclosed by Ashwin Lall. The S. Nagalakshmi suggests decreasing the number of counters, and proves the error rate of calculation result still to be maintained in the predefine error range. Although S. Nagalakshmi uses 112 set calculation modules to process the packet number in parallel, it still needs to spend much time to perform the comparison and update of memory access. Once it has mass packet and locates in the highspeed network environment, the manner disclosed by S. Nagalakshmi is still satisfied to the requirement of wire speed so as to increase the error rate of estimating entropy.
For the reason that the conventional system and method for estimating entropy could not process the mass packets in the highspeed network environment, a need has arisen to propose a novel scheme that may adaptively process the mass packets in the highspeed network environment so as to decrease the hardware source operated and operation time.
In view of the foregoing, it is an object of the present invention to provide a method for estimating entropy. The method for estimating entropy comprises: setting an allowable error to deciding a sample number according to a total of packets in a stream data; randomly sampling a plurality of locations in the stream data; applying to a count sketch algorithm to each packet for performing the count and update; applying the counter sketch algorithm to the each packet for performing the query according to flow IDs of the plurality of locations; recording each query result of the each packet onto an entropy table; obtaining an average of a counter used in each same row in the entropy table; selecting a median to be an estimation value from the averages; and obtaining an estimation entropy value according to the estimation value.
Based on one object of the present invention, a system for estimating entropy is disclosed according to one embodiment of the present invention. The system for estimating entropy value comprises: a measuring module, setting an allowable error to deciding a sample number according to a total of packets in a stream data; a sampling module, coupled to the measuring module, randomly sampling a plurality of locations in the stream data; a calculation and update module, coupled to the sampling module, applying to a count sketch algorithm to each packet for performing the count and update; a query module, coupled to the calculation and update module, applying the counter sketch algorithm to the each packet for performing the query according to flow IDs of the plurality of locations; and a storage module, coupled to the query module and the calculation and update module, obtaining an average of a counter used in each same row in the entropy table, and selecting a median to be an estimation value from the averages, and obtaining an estimation entropy value according to the estimation value.
Based on one object of the present invention, a method for estimating entropy is disclosed according to one embodiment of the present invention. The method for estimating entropy value comprises: applying a data streaming algorithm for estimating entropy for deciding a sampling number and randomly sampling a plurality of locations in a stream data; and applying a count sketch algorithm to obtain a estimation entropy value.
The accompanying drawings incorporated in and forming a part of the specification illustrate several aspects of the present invention, and together with the description serve to explain the principles of the disclosure. In the drawings:
Some embodiments of the present invention will now be described in greater detail. Nevertheless, it should be noted that the present invention can be practiced in a wide range of other embodiments besides those explicitly described, and the scope of the present invention is expressly not limited except as specified in the accompanying claims.
Moreover, some irrelevant details are not drawn in order to make the illustrations concise and to provide a clear description for easily understanding the present invention.
Referring to
Additionally, applying to a count sketch algorithm to each packet for performing the count and update (the step S103), and applying the counter sketch algorithm to the each packet for performing the query according to flow IDs of the plurality of locations (the step S104), and recording each query result of the each packet onto an entropy table (the step S105), wherein the entropy table is an array table.
Next, obtaining an average of a counter used in each same row in the entropy table (the step S106), and selecting a median to be an estimation value from the averages (the step S107), and obtaining an estimation entropy value according to the estimation value (the step S108). Specifically, the step of obtaining an average of a counter used in each same row in the entropy table further comprises: obtaining a calculation value by calculating the counter value used in the each same row in the entropy table according to a equation, and obtaining the average of the counter of each row according to the calculation value, wherein the equation is represents as follows:
X=m(c log(c)−(c−1)log(c−1)).
Specifically, the step of obtaining an estimation entropy value according to the estimation value further comprises: substituting the estimation value into a equation to obtain a estimation entropy value, and wherein the equation being represented as follow,
Referring to
The measuring module 201 is used for setting an allowable error to deciding a sample number according to a total of packets in a stream data. The sampling module 202 being coupled to the measuring module 201 is used for applying to a count sketch algorithm to each packet for performing the count and update, wherein the plurality of locations is an array having g×z locations according to an equation as follows:
The calculation and update module 203 being coupled to the sampling module 202 is used for applying to a count sketch algorithm to each packet for performing the count and update. The calculation and update module 203 obtains a calculation value by calculating the counter value used in the each same row in the entropy table according to a equation, and obtains the average of the counter of each row according to the calculation value, and the equation is represents as follows:
X=m(c log(c)−(c−1)log(c−1)).
The calculation and update module substitutes the estimation value into a equation to obtain a estimation entropy value, and wherein the equation being represented as follow,
The query module 204 being coupled to the calculation and update module 203 is used for applying the counter sketch algorithm to the each packet for performing the query according to flow IDs of the plurality of locations. The storage module 205 being coupled to the query module 204 and the calculation and update module 203 is used for obtaining an average of a counter used in each same row in the entropy table, and selecting a median to be an estimation value from the averages, and obtaining an estimation entropy value according to the estimation value.
The Count Sketch only utilizes the less memory to perform the fast summary process for a mass data, maps the data to the fixed storage location via the hash function, and query, update the statistical data of each flow ID. The Count Sketch has the feature of linear aggregation, and therefore, the Count Sketch can combine different the observation points or the observation ranges by using the same parameter.
The Count Sketch being an estimation algorithm of (ε, δ) is used to estimate the occurrence frequency of each item in the stream data. The count sketch algorithm uses t sets array counters, and each set array comprises b counters and each counter is 32 bits counter, the parameters t and b are selected by equation (7).
In Count Sketch algorithm, every set counter array has two universal hash function, h_{i}(x) and s_{i}(x), and the mapping ranges of h_{i}(x) and s_{i}(x) respectively is {0, 1, 2, . . . , b−1} and {+1, −1}. The algorithm operation flow is divided to Update and Query.
Referring to
C[i,h_{i}(IP)]=C[i,h_{i}(IP)]+s_{i}(IP),i=1, . . . ,t (8)
When the count is over, the calculation value h_{i}(x) generated from the hash function h_{i}( ) and flow ID x is to be as the address, the statistical value read from the t onedimensional array is multiplied by the hash function s_{i}(x), the calculation equation is showed in (9). A median is captured from the values queried by the t array, and the median is the packet estimation number of flow ID code x.
ƒ_{i}=C[i,h_{i}(x)]×s_{i}(x),i=1, . . . ,t (9)
The present invention improves the data streaming algorithms for estimating entropy of network traffic disclosed by Ashwin Lall et al., and the accuracy packet count number of Ashwin Lall is replaced with packet occurrence time of estimating by the Count Sketch algorithm, as shown in Table 3. When starting to perform the measure, the system updates every packet count number, and then, performs the packet query of Count Sketch according to flow IDs of g×z locations selected. Next, the query result is recorded onto the entropy array Est[ ]. When the observation operation is ended, all of the counter value at same raw is used to calculate X value according to the equation (6). It obtains the average avg[i] of every raw after calculating, and then, selects a median to be as estimation value of S from every raw value. The estimation value H is calculated by equation (4). Therefore, in the allowable error, the system and method for estimating entropy of the present invention is better than A. Lall because the present invention can decrease the usage of memory and operation time of algorithm. For example, the algorithm disclosed by A. Lall, the flow ID is operated by the hash function to search out the associated locations when someone packet is sampled, and then, the counter number is added for the sample packet, and the flow ID of packet is stored into the estimating entropy table and the counter is set as 1. If the same locations are sampled, it must perform the serial connection update. The memory access time is 2+i according to different the collision time i. However, the present invention combines the Count Sketch algorithm with the data streaming algorithm to perform the packet count query of flow ID, and then the query value is recorded onto the estimating entropy table in sequence, the memory access time is 2.
The algorithm of Table 4 is simply concluded according to the algorithm of Table 3.
At update portion, the data streaming algorithm for estimating entropy value disclosed by A. Lall performs the hash function operation to the flow IDs to search the associated locations and performs the comparison of the flow IDs. The algorithm performs the comparison of 1+i times and i times for the read of serial connection index according to different collision times i. if the number of same flow IDs is j, the memory access must be performs 2×j times, and the memory access of the total (i+2)+i+(2×j) times is performed. However, the present invention uses the Count Sketch algorithm combined with the data streaming algorithm for estimating entropy to perform the memory access, it only needs the memory access (read/write) of 2 times. The memory access times as shown in Table 5.
Moreover, the preset invention uses the same memory size (400 k bytes) and the less count sketch array (112 k bytes) to compare the estimation error of entropy. The comparison result is represented to Table 6. The present invention uses the Count Sketch algorithm to replace with the accuracy count of data streaming algorithm disclosed by the A. Lall, and thereby, at Table 6, to take the Count Sketch array 4 k for example, the present invention only uses near a quarter of memory size to limit the error rate in 5% so as to decrease the operation time of system.
Additionally, the present invention applies the NetFPGA10G platform to implant two modules, one is the measure of the data streaming algorithm for estimating entropy adapted to the highspeed network, another one is the module of estimating entropy. The NetFPGA platform being Fieldprogrammable gate array is developed by the Stanford University of US. The designer may use Verilog language (VHDL) to develop the network system, for example, the switch, router and the like.
The present invention uses the network card of NetFPGA10G to be as reference NIC, and adds the Histogram Module based on packet update of Count Sketch algorithm into the reference NIC, so as to implant the Hash function, counter array and reservoir sampling. This purpose of adding the NetFPGA10G is to accelerate the counter update speed of the statistical table and estimating entropy table. The measure level mainly has two parts: the packet update and packet sample.
At packet update part, the Count Sketch algorithm is used to perform the packet number count, the count packet parameter is set to three sets, every set has 28 k number counter array with 32 bits (t=3, b=28 k), and for hardware part, three BRAMs are used to packet update in right time. When the packet is entered to the measure module, the access location operation of Count Sketch is performed by using 2Universal hash function. When the hash function is operated, it also performs the mod operation. Thereby, it needs a mass source and time to implant the hardware system. As a result, the present invention uses CWtrick or Mersenne number to perform the operations of three hash function in parallel.
Referring to
Reservoir sampling algorithm may solve the data sample question of unknown length, as shown in Table 7. When data streaming number (n) is smaller than sample number (m), every data e_{n }is sampled; when the data streaming number (n) is larger than sample number (n), it determines whether e_{n }is selected according to the probability of
if the e_{n }is selected, the original sample e_{n }is replaced with the new one which is randomly selected from the sample data (R).
The present invention performs the sample by using the method of reservoir sample algorithm. When the packet is sampled, the count packet number of three set count sketch can be queried according to flow IDs of the packet sampled, and the median is selected to be as the estimation number of the flow ID, and the packet number estimated is into the array of estimating entropy algorithm. The reservoir sampling algorithm is implanted by the hardware manner, the random algorithm uses 2Universal hash function to determine whether the packet is selected so as to improve the reservoir sample algorithm to decrease the hardware source and operation time. The algorithm disclosed by the original inventor set the packet random sample probability of
wherein m is the sample number, n is data steaming number. When the algorithm is implanted by the hardware manner, the original sample probability is changed to
wherein
In other words, when
is a multiple of 2, and then replaces with p. Therefore, it uses the movement to replace with the multiplier to reduce the operation time of hardware system so as to solve the question is that it must perform mod operation when the hash function is operated.
Referring to
When the observation range is ended, the estimation entropy module is implanted by software emulation read out the estimation entropy table implanted by hardware emulation through PCI bus by using the moving window register, so as to perform the operation of entropy value, and the system operation structure is shown in
Additionally, because the space size is different, it will cause that the difference of the entropy measure result when using the Count Sketch algorithm to estimate the entropy value. Therefore, in the present embodiment, it will discuss that the accuracy of estimating entropy is affected by array size of different Count Sketch algorithm, and performs the psychical system test according the flow of the usual network behavior, so as to determine whether the network is abnormal and verify the system function.
For testing the accuracy of the algorithm and the actual efficacy of system, the embodiment respectively selects the different flow files to perform the test, the flow files are respectively introduced as follows:
The flow test file 1: the packet information is captured from the equinixsanjose highspeed internet backbone. The flow test file 1 only remains the packet header information, and contains 30,801,712 IP packets, 448,809 IP source addresses, and the flow file length is 1 minutes.
The flow test file 2: the flow test file 2 is provided by the MAW. The flow test file 2 only remains the packet header information, and contains 3514656 IP packets, 51779 IP source addresses, and the total length of the flow test file 2 is 15 minutes.
The flow test file 3: the flow test file 3 is provided by the MAWI. The flow test file 3 only remains the packet header information, and contains 50,544,023 IP packets, 6364302 IP source addresses, and the total length of the flow test file 3 is 15 minutes.
Open source network tester (OSNT) is an open source network test system for realizing on the NetFPGA10G platform. OSNT not only performs the test of high speed flow, but also saves the cost of buying the flow generation equipments. The ONST has two functions of OSNT Traffic Generator and OSNT Traffic Monitor, and provides the user the graphic user interface (GUI). The ONST flow generator may load the PCAP flow file on the SRAM, and repeat broadcast many times according to the different requirements, and the flow file size is limited according to the SRAM size. The OSNT also provide two control functions of the delay module and the rate limiter. The OSNT flow observer may capture the packets, count the packet number and measure the speed, and analyze the kind of packets and filter the specific packets.
For verifying whether the OSNT generates the flow of line speed, the present invention respectively enables two computers with NetFPGA10G for testing whether the flow achieves 40 G of line speed, wherein one has the OSNT flow generator, another has flow observer, and the two computer are connected each other. Because the OSNT only provide two ports, and each port only receives 10G of line speed at present, the present embodiment uses two ONST flow generator to test whether the system achieves 40 G of line speed.
In the present embodiment, we uses two experiments to test whether the system for estimating entropy achieves the line speed. The effect of the biggest factor in the system is the update portion of the count sketch, so it is important to verify whether accurately count the packet in the line speed. The first test is to use the flow file generated from the IXIA, and the flow file comprises a hundreds IP locations (192. 168.0.*) from different source terminals, and repeat to transmit 10 million times from four terminals. The value of Count Sketch counter in the measure system is read out after transmitting, and the result represents forty millions packet numbers read out from 100 counters. The second test is to use the smallest packet from the same flow IDs to perform the test, and uses different terminals to receive the packet at the same time. Lastly, the result of Count Sketch counter in the measure system is read out, the result is shown in Table 8. From the Table 8, it represents the measure system implanted by the hardware can correctly perform the count in the wire speed of 30G.
Referring to
The present invention uses the same flow files and same parameters to implant the hardware and software system when discussing the measure result of using the system implanted by the hardware and the software simulation, so as to verify whether the entropy value estimated by hardware and software are same. At hardware test, the present invention uses the host having the network card with the transmission rate of 10 G bit/s (Myricom 10G NIC) having two SPF+ fiber network ports, and uses TCPREPLAY to broadcast the test flow file to the system for estimating entropy so as to determine whether the test system can correctly calculate the entropy value, and compare the entropy value generated from the hardware with the entropy value generated from the software. The result is shown in Table 9. As shown in Table 9, it shows the test result from the hardware is same to the software, and therefore, the system function implanted by the hardware is correct.
In the present embodiment, the proposed scheme is verified using synthetic data sets that are generated using a Zipfian distribution (no shown in). In the Zipfian distribution, the simulation is based on a Zipf parameter from 0.1 to 2.5. The synthetic data set comprises 1,000,000 packets with kε[2^{12}]. A simulation based on the CAIDA antonymous Internet trace is also carried out. The trace, which is collected from CAIDA'"'"'s equinixsanjose highspeed Internet backbone, contains over 30 million packets and 448, 809 unique IPv4 source addresses. The total size of the memory that is used in the system for Count Sketch (384 k bytes) and the entropy norm estimator array C_{ij }(40 k bytes) is 424 k bytes. The Count Sketch is composed of three counter arrays, each with 32 k entries. The estimator counter array C_{ij }has 1×5120 counters. Each counter entry has 32bits words. The proposed scheme can estimate the entropy value with limited error on synthetic data of various Zipf parameters. The Count Sketch approach has a higher estimated error than does the method of Lall et al. With a Zipf parameter of less than 1.25 mainly because the point query error of Count Sketch in a lightly skew distribution. The point query error of Count Sketch can be reduced by adding more entries to the counter. As present in
The present invention improves the data streaming algorithms for estimating entropy of network traffic disclosed by Ashwin Lall et al., we present a streambased scheme to estimate to the entropy to the entropy norm based on count sketch algorithm. The system is confirmed using a NetFPGA10G platform and applies the Verilog HDL language to implement the method for estimating entropy adapted to high speed core network. The present invention uses the actual network flow files to emulate and verify the accuracy of the estimating entropy algorithm, and use the OSNT to test the hardware system so as to make sure the result emulated by the emulator match the entropy value calculated by the measuring system according to the flow. Lastly, the present invention performs graphical interface (GUI) for allowing the network manager to observe the change of entropy value of the flow head information easily so as to verify the unusual situation of the present network.
It is a challenge task to conduct entropy computation on the attributes of packet header in highspeed networks. Motivated by Ashwin Lall et al., we present a streambased scheme to estimate to the entropy to the entropy norm based on count sketch algorithm. The system is implemented on a NetFPGA10G platform. It is capable of processing IP packets and computing the entropy in 30 Gbps line rate.
Although specific embodiments have been illustrated and described, it will be obvious to those skilled in the art that various modifications may be made without departing from what is intended to be limited solely by the appended claims.