Active learning framework for automatic field extraction from network traffic
First Claim
1. A method for extracting at least one data field from a stream of data received from a network, comprising:
- inputting at least one positive example of an instance of the at least one data field; and
based on the at least one positive example, analyzing the stream of data received from the network to determine a first result set including at least one candidate for the at least one data field in the stream, wherein the analyzing is based on no knowledge or only partial knowledge of any protocol represented by the stream received from the network.
2 Assignments
0 Petitions
Accused Products
Abstract
An active learning framework is provided to extract information from particular fields from a variety of protocols. Extraction is performed in an unknown protocol, in which the user presents the system with a small number of labeled instances. The system then automatically generates an abundance of features and negative examples. A boosting approach is then used for feature selection and classifier combination. The system then displays its results for the user to correct and/or add new examples. The process can be iterated until the user is satisfied with the performance of the extraction capabilities provided by the classifiers generated by the system.
-
Citations
20 Claims
-
1. A method for extracting at least one data field from a stream of data received from a network, comprising:
-
inputting at least one positive example of an instance of the at least one data field; and based on the at least one positive example, analyzing the stream of data received from the network to determine a first result set including at least one candidate for the at least one data field in the stream, wherein the analyzing is based on no knowledge or only partial knowledge of any protocol represented by the stream received from the network. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A method for automatically generating features for classifying data in a data stream, comprising:
-
specifying an item of interest in example data of the data stream; and automatically generating a plurality of features from the specified item, wherein the item and the plurality of features are used to form classifiers for classifying data in the data stream. - View Dependent Claims (12, 13, 14, 15)
-
-
16. A computing device for automatically extracting data of interest from a binary stream of data received or stored by the computing device without reference to full knowledge of the structure of any protocol of the binary stream of data, comprising:
-
an analysis engine for analyzing the binary stream based on at least one classifier formed from at least one positive example of the data of interest provided to the analysis engine to determine a result set including at least one candidate from the binary stream of data as a potential match for the data of interest; and a user interface for outputting the result set and for receiving either at least one additional positive example or a designation of at least one candidate of the result set as an incorrect match for the data of interest, or both, wherein the analysis engine re-analyzes the binary stream based on the at least one additional positive example, or the at least one designated candidate, or both, to revise the at least one classifier and improve the accuracy of the result set. - View Dependent Claims (17, 18, 19, 20)
-
Specification