Object classification in a capture system
First Claim
Patent Images
1. A method for classifying an object according to content comprising:
- determining whether the object is binary or textual in nature, wherein the object is captured and the captured object is a plurality of packets that are broken down by a capture system and then reassembled;
classifying the object as one of a plurality of textual content types based on tokens found in the object if the object is determined to be textual in nature, wherein each of the tokens found in the object have an associated weight and the weights of the tokens are used to determine the content type of the object, and wherein a confidence level is assigned for classifying the content type of the object; and
inserting the content type into a content field of a tag that indexes the object in a storage location and contains a plurality of fields to describe the object, wherein the capture system is configured to allow a document that includes the captured object to be forwarded from the capture system to its intended destination at a network node unless a capture rule prohibits forwarding the document based on the document including the captured object; and
further classifying the object as an encrypted document based on a statistical characteristic of the object if the object is determined to be binary in nature, wherein the statistical characteristic of the object comprises a byte distribution of the object.
13 Assignments
0 Petitions
Accused Products
Abstract
Objects can be extracted from data flows captured by a capture device. Each captured object can then be classified according to content. In one embodiment, the present invention includes determining whether a captured object is binary or textual in nature, and classifying the captured object as one of a plurality of textual content types based tokens found in the captured object if the captured object is determined to be textual in nature.
-
Citations
23 Claims
-
1. A method for classifying an object according to content comprising:
- determining whether the object is binary or textual in nature, wherein the object is captured and the captured object is a plurality of packets that are broken down by a capture system and then reassembled;
classifying the object as one of a plurality of textual content types based on tokens found in the object if the object is determined to be textual in nature, wherein each of the tokens found in the object have an associated weight and the weights of the tokens are used to determine the content type of the object, and wherein a confidence level is assigned for classifying the content type of the object; and
inserting the content type into a content field of a tag that indexes the object in a storage location and contains a plurality of fields to describe the object, wherein the capture system is configured to allow a document that includes the captured object to be forwarded from the capture system to its intended destination at a network node unless a capture rule prohibits forwarding the document based on the document including the captured object; and
further classifying the object as an encrypted document based on a statistical characteristic of the object if the object is determined to be binary in nature, wherein the statistical characteristic of the object comprises a byte distribution of the object. - View Dependent Claims (2, 3, 4, 5, 6)
- determining whether the object is binary or textual in nature, wherein the object is captured and the captured object is a plurality of packets that are broken down by a capture system and then reassembled;
-
7. A method comprising:
- intercepting a flow;
classifying the flow according to transmission protocol;
extracting one or more objects from the classified flow using a protocol handler corresponding with the transmission protocol, wherein the objects are respective pluralities of packets that are captured and broken down by a capture system and then reassembled;
classifying the one or more objects based on content type by statistically analyzing the object, wherein tokens found in the object have an associated weight and the weights of the tokens are used to determine the content type of the object, and wherein a confidence level is assigned for classifying the content type of the object; and
inserting the content type into a content field of a tag that indexes the object in a storage location and contains a plurality of fields to describe the object, wherein a capture system that receives the captured object, which is part of a document, is configured to allow the document to be forwarded from the capture system to its intended destination at a network node unless a capture rule prohibits forwarding the document based on the document including one or more objects; and
further classifying the object as an encrypted document based on a statistical characteristic of the object if the object is determined to be binary in nature, wherein statistically analyzing the object comprises determining a byte distribution. - View Dependent Claims (8, 9)
- intercepting a flow;
-
10. An apparatus comprising:
- an object statistics module to determining whether an object is binary or textual in nature, wherein the object is captured and the captured object is a plurality of packets that are broken down by a capture system and then reassembled;
a token database to store a plurality of tokens, each token being associated with a textual content type, wherein each of the tokens found in the object have an associated weight and the weights of the tokens are used to determine the content type of the object, and wherein a confidence level is assigned for classifying the content type of the object; and
a token analyzer to classify the object as one of the plurality of textual content types by accessing the token database if the object is determined to be textual in nature, wherein the content type is inserted into a content field of a tag that indexes the object in a storage location and contains a plurality of fields to describe the object, wherein the capture system is configured to allow a document that includes the captured object to be forwarded from the capture system to its intended destination at a network node unless a capture rule prohibits forwarding the document based on the document including the captured object; and
wherein the object statistics module is configured to classify the object as an encrypted document based on a statistical characteristic of the object if the object is binary in nature, wherein the statistical characteristic of the object comprises a byte distribution of the object. - View Dependent Claims (11, 12, 13, 14)
- an object statistics module to determining whether an object is binary or textual in nature, wherein the object is captured and the captured object is a plurality of packets that are broken down by a capture system and then reassembled;
-
15. A non-transitory computer storage medium having stored thereon data representing instructions that, when executed by a processor of a capture system, cause the processor to perform operations comprising:
- determining whether a captured object is binary or textual in nature, wherein the object is captured and the captured object is a plurality of packets that are broken down by a capture system and then reassembled;
classifying the captured object as one of a plurality of textual content types based on tokens found in the captured object if the captured object is determined to be textual in nature, wherein each of the tokens found in the object have an associated weight and the weights of the tokens are used to determine the content type of the object, and wherein a confidence level is assigned for classifying the content type of the object; and
inserting the content type into a content field of a tag that indexes the object in a storage location and contains a plurality of fields to describe the object, wherein the capture system is configured to allow a document that includes the captured object to be forwarded from the capture system to its intended destination at a network node unless a capture rule prohibits forwarding the document based on the document including the captured object; and
wherein the instructions further cause the processor to classify the captured object as an encrypted document based on a statistical characteristic of the captured object if the captured object is determined to be binary in nature, wherein the statistical characteristic of the captured object comprises a byte distribution of the captured object. - View Dependent Claims (16, 17, 18, 19, 20)
- determining whether a captured object is binary or textual in nature, wherein the object is captured and the captured object is a plurality of packets that are broken down by a capture system and then reassembled;
-
21. A non-transitory computer storage medium having stored thereon data representing instructions that, when executed by a processor of a capture system, cause the processor to perform operations comprising:
- intercepting a flow;
classifying the flow according to transmission protocol;
extracting one or more objects from the classified flow using a protocol handler corresponding with the transmission protocol, wherein the objects are respective pluralities of packets that are captured and broken down by a capture system and then reassembled;
classifying the one or more objects based on content type by statistically analyzing the object, wherein tokens found in the object have an associated weight and the weights of the tokens are used to determine the content type of the object, and wherein a confidence level is assigned for classifying the content type of the object; and
inserting the content type into a content field of a tag that indexes the object in a storage location and contains a plurality of fields to describe the object, wherein the capture system that receives the captured objects, which are part of a document, is configured to allow the document to be forwarded from the capture system to its intended destination at a network node unless a capture rule prohibits forwarding the document based on the document including the objects; and
wherein the instructions further cause the processor to classify the captured object as an encrypted document based on a statistical characteristic of the captured object if the captured object is determined to be binary in nature, wherein statistically analyzing the object comprises determining a byte distribution. - View Dependent Claims (22, 23)
- intercepting a flow;
Specification