System and method to sample a large data set of network traffic records
- US 10,313,209 B2
- Filed: 12/30/2016
- Issued: 06/04/2019
- Est. Priority Date: 12/30/2016
- Status: Active Grant
First Claim
1. A computer-implemented method to sample a large data set of traffic records, the traffic records corresponding to network traffic flows associated with at least one particular address, the method comprising:
- processing multiple iterations associated with respective traffic records of the large data set that satisfy particular criteria, processing an iteration of the multiple iterations comprising;
receiving a traffic record from a source of a large data set of traffic records, the traffic record corresponding to a traffic flow and identifying a pair of addresses exchanging communications included in the traffic flow and including a traffic size value that indicates the size of communications included in the traffic flow;
receiving a flow counter and a total traffic size, the flow counter representing the number of traffic flows received for one of the addresses of the pair identified, the number of traffic flows representing previously received traffic records associated with the address, the total traffic size representing a sum of traffic sizes associated with all previously received traffic records, the previously received traffic records having been received during previous iterations of the multiple iterations;
incrementing the flow counter;
adding the traffic size associated with the received traffic record to the total traffic size;
if the flow counter is less than a predetermined sampling threshold, then storing a traffic record sample associated with the traffic record;
if the flow counter is more than the predetermined sampling threshold, then determining whether or not to sample the received traffic record by applying an exponentially decreasing probability function; and
storing the traffic record sample as sampled data associated with the traffic record only if the determination is to sample the received traffic record.
2 Assignments
0 Petitions
Accused Products
Abstract
A computer-implemented method to sample a large data set of traffic records, including receiving a traffic record associated with a traffic flow from a source of a large data set of traffic records, incrementing a flow counter representing a number of traffic flows received for one address of a pair of addresses identified by a traffic record, adding a traffic size of the traffic flow associated with the received traffic record to a total traffic size of all flows received in previous iterations. If the flow counter is less than a predetermined sampling threshold, then storing a traffic record sample associated with the traffic record. If the flow counter is more than the predetermined sampling threshold, then determining whether or not to sample the received traffic record by applying an exponentially decreasing probability function. Storing the traffic record sample as sampled data associated with the traffic record only if the determination is to sample the received traffic record.
-
Citations
20 Claims
-
1. A computer-implemented method to sample a large data set of traffic records, the traffic records corresponding to network traffic flows associated with at least one particular address, the method comprising:
processing multiple iterations associated with respective traffic records of the large data set that satisfy particular criteria, processing an iteration of the multiple iterations comprising; receiving a traffic record from a source of a large data set of traffic records, the traffic record corresponding to a traffic flow and identifying a pair of addresses exchanging communications included in the traffic flow and including a traffic size value that indicates the size of communications included in the traffic flow; receiving a flow counter and a total traffic size, the flow counter representing the number of traffic flows received for one of the addresses of the pair identified, the number of traffic flows representing previously received traffic records associated with the address, the total traffic size representing a sum of traffic sizes associated with all previously received traffic records, the previously received traffic records having been received during previous iterations of the multiple iterations; incrementing the flow counter; adding the traffic size associated with the received traffic record to the total traffic size; if the flow counter is less than a predetermined sampling threshold, then storing a traffic record sample associated with the traffic record; if the flow counter is more than the predetermined sampling threshold, then determining whether or not to sample the received traffic record by applying an exponentially decreasing probability function; and storing the traffic record sample as sampled data associated with the traffic record only if the determination is to sample the received traffic record. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
15. A system to sample a large data set of traffic records, the traffic records corresponding to network traffic flows associated with at least one particular address, the system comprising:
-
a memory configured to store instructions; a processor disposed in communication with the memory, wherein the processor upon execution of the instructions is configured to; process, in multiple iterations associated with respective traffic records of the large data set that satisfy particular criteria, processing an iteration of the multiple iterations comprising; receiving a traffic record from a source of a large data set of traffic records, the traffic record corresponding to a traffic flow and identifying a pair of addresses exchanging communications included in the traffic flow and including a traffic size value that indicates the size of communications included in the traffic flow; receiving a flow counter and a total traffic size, the flow counter representing the number of traffic flows received for one of the addresses of the pair identified, the number of traffic flows representing previously received traffic records associated with the address, the total traffic size representing a sum of traffic sizes associated with all previously received traffic records, the previously received traffic records having been received during previous iterations of the multiple iterations; incrementing the flow counter; adding the traffic size associated with the received traffic record to the total traffic size; if the flow counter is less than a predetermined sampling threshold, then storing a traffic record sample associated with the traffic record; if the flow counter is more than the predetermined sampling threshold, then determining whether or not to sample the received traffic record by applying an exponentially decreasing probability function; and storing the traffic record sample as sampled data associated with the traffic record only if the determination is to sample the received traffic record. - View Dependent Claims (16, 17, 18)
-
-
19. A non-transitory computer readable storage medium and one or more computer programs embedded therein, the computer programs comprising instructions, which when executed by a computer system, cause the computer system to:
process multiple iterations associated with respective traffic records of the large data set that satisfy particular criteria, processing an iteration of the multiple iterations comprising; receiving a traffic record from a source of a large data set of traffic records, the traffic record corresponding to a traffic flow, the traffic record further identifying a pair of addresses of devices that exchange communications included in the traffic flow and including a traffic size value that indicates the size of communications included in the traffic flow; receiving a flow counter and a total traffic size, the flow counter representing the number of traffic flows received for one of the addresses of the pair identified, the number of traffic flows representing previously received traffic records associated with the address, the total traffic size representing a sum of traffic sizes associated with all previously received traffic records, the previously received traffic records having been received during previous iterations of the multiple iterations; incrementing the flow counter; adding the traffic size associated with the received traffic record to the total traffic size; if the flow counter is less than a predetermined sampling threshold, then storing a traffic record sample associated with the traffic record; if the flow counter is more than the predetermined sampling threshold, then determining whether or not to sample the received traffic record by applying an exponentially decreasing probability function; and storing the traffic record sample as sampled data associated with the traffic record only if the determination is to sample the received traffic record. - View Dependent Claims (20)
Specification