Efficient access to sparse packets in large repositories of stored network traffic
First Claim
1. A method comprising:
- capturing a first packet from a network;
annotating the first packet with a time stamp specifying an arrival time of the first packet at a packet capture system coupled to the network;
storing the first packet in a first data file of a set of data files associated with predetermined intervals, the first data file corresponding to a first predetermined interval based on the time stamp;
creating a first primary index for the first packet, the first primary index containing a location to the first packet stored in the first data file on a storage device of the packet capture system;
storing the first primary index for the first packet in a first primary index file associated with the first data file corresponding to the first predetermined interval; and
creating a secondary index for the first packet, the secondary index having an ordered sequence of present indicators, wherein a first present indicator corresponds to the first primary index and the first data file of the first predetermined interval.
4 Assignments
0 Petitions
Accused Products
Abstract
A secondary indexing technique cooperates with primary indices of an indexing arrangement to enable efficient storage and access of metadata used to retrieve packets persistently stored in data files of a data repository. Efficient storage and access of the metadata used to retrieve the persistently stored packets may be based on a target value of the packets over a search time window. The metadata is illustratively organized as a metadata repository of primary index files that store the primary indices containing hash values of network flows of the packets, as well as offsets and paths to those packets stored in the data files. The technique includes one or more secondary indices having a plurality of present bits arranged in a binary format (i.e., a bit array) to indicate the presence of the target value in one or more packets stored in the data files over the search time window. Notably, the present bits may be used to reduce (i.e., “prune”) a relatively large search space of the stored packets (e.g., defined by the hash values) to a pruned search space of only those data files in which packets having the target value are stored.
145 Citations
23 Claims
-
1. A method comprising:
-
capturing a first packet from a network; annotating the first packet with a time stamp specifying an arrival time of the first packet at a packet capture system coupled to the network; storing the first packet in a first data file of a set of data files associated with predetermined intervals, the first data file corresponding to a first predetermined interval based on the time stamp; creating a first primary index for the first packet, the first primary index containing a location to the first packet stored in the first data file on a storage device of the packet capture system; storing the first primary index for the first packet in a first primary index file associated with the first data file corresponding to the first predetermined interval; and creating a secondary index for the first packet, the secondary index having an ordered sequence of present indicators, wherein a first present indicator corresponds to the first primary index and the first data file of the first predetermined interval. - View Dependent Claims (4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
2. The method of 1 wherein the first present indicator denotes an element of a network flow present in the first packet over a search time window.
-
3. The method of 2 wherein the element is one of an internet protocol address and a port number.
-
15. A method comprising:
-
capturing a packet from a network; annotating the packet with a time stamp specifying an arrival time of the packet at a packet capture system coupled to the network; storing the packet in a data file of a set of data files associated with predetermined intervals, the data file corresponding to a predetermined interval based on the time stamp; creating a first primary index for the packet, the first primary index containing a location to the packet stored in the data file on a storage device of the packet capture system; storing the first primary index for the packet in a first primary index file associated with the data file corresponding to the predetermined interval; creating a secondary index for the packet, the secondary index having an ordered sequence of indicators, wherein a first indicator corresponds to the first primary index and the data file of the predetermined interval, wherein the indicator denotes presence of a target value in the packet; organizing the secondary index file as a trie based on the target value; and in response to a request to retrieve the packet having the target value, traversing the trie until reaching a leaf node of the trie containing the secondary index for the packet.
-
-
16. A system comprising:
-
one or more processors coupled to a network; a plurality of storage repositories coupled to the one or more processors, the storage repositories including a data repository having data files configured to store packets captured from the network and a metadata repository having primary and secondary index files configured to store primary and secondary indices, the primary indices having hash values along with locations to the captured packets stored in the data files, the hash values calculated from a hash function applied to network flows of the captured packets, the secondary indices having a plurality of indicators denoting presence of a target value in one or more of the captured packets; and a memory coupled to the one or more processors and configured to store one or more processes of an operating system, the one or more processes executable by the one or more processors to use the indicators of the secondary indices to prune a search space of the captured packets as defined by the hash values to a pruned search space of only the data files of the data repository storing captured packets having the target value, the one or more processes further executable to use the locations of the primary indices having the hash values defined by the pruned search space to retrieve the captured packets having the target value from the data repository. - View Dependent Claims (17, 18, 19)
-
-
20. A method comprising:
-
capturing a first packet from a network; annotating the first packet with a time stamp specifying an arrival time of the first packet at a packet capture system coupled to the network; storing the first packet in a first data storage container corresponding to a first predetermined interval based on the time stamp; creating a first primary index for the first packet, the first primary index containing a location to the first packet stored in the first data storage container on a storage device of the packet capture system; storing the first primary index for the first packet in a first primary index storage container associated with the first data storage container corresponding to the first predetermined interval; and creating a secondary index for the first packet, the secondary index having an ordered sequence of present indicators, wherein a first present indicator corresponds to the first primary index and the first data storage container of the first predetermined interval. - View Dependent Claims (21, 22)
-
-
23. A non-transitory computer readable medium including program instructions for execution on one or more processors, the program instructions configured to:
-
capture a packet from a network; annotate the packet with a time stamp specifying an arrival time of the packet; store the packet in a data file of a set of data files associated with predetermined intervals, the data file corresponding to a predetermined interval based on the time stamp; create a primary index for the packet, the primary index containing a location to the packet stored in the data file; store the primary index for the packet in a primary index file associated with the data file corresponding to the predetermined interval; and create a secondary index for the packet, the secondary index having an ordered sequence of indicators, wherein an indicator corresponds to the primary index and the data file of the predetermined interval, and wherein the indicator denotes presence of a target value in the packet stored in the data file over a search time window.
-
Specification