×

Partial document content matching using sectional analysis

  • US 7,954,151 B1
  • Filed: 09/24/2004
  • Issued: 05/31/2011
  • Est. Priority Date: 10/28/2003
  • Status: Active Grant
First Claim
Patent Images

1. A method comprising:

  • computing a set of reference sectional fingerprints corresponding to a reference document having a classification selected from a group comprising at least a public classification and a private classification, wherein successful access to the reference document by discovery agent software results in the reference document having the public classification and otherwise having a classification other than the public classification, and at least one of the reference sectional fingerprints is based at least in part on two or more tokens of the reference document, the tokens being selected from remaining words of the reference document after exclusion of a set of language-dependent words of the reference document;

    associating the reference sectional fingerprints with the classification of the reference document;

    after the associating, monitoring network traffic via a network interface;

    computing a set of traffic sectional fingerprints corresponding to the monitored network traffic, wherein at least one of the traffic sectional fingerprints is based at least in part on two or more tokens of the monitored network traffic;

    determining that at least one of the traffic sectional fingerprints matches at least one of the reference sectional fingerprints;

    for each respective traffic sectional fingerprint matching at least one of the reference sectional fingerprints associated with the public classification, classifying the respective traffic sectional fingerprint as the public classification;

    for each respective traffic sectional fingerprint matching none of the reference sectional fingerprints associated with the public classification and matching at least one of the reference sectional fingerprints associated with the private classification, classifying the respective traffic sectional fingerprint as the private classification;

    wherein the act of associating, the acts of computing, and the act of determining are at least in part via one or more central processing units enabled to execute software;

    wherein the reference sectional fingerprints and the traffic sectional fingerprints are sliding sectional fingerprints;

    wherein the reference document is interpreted as groups of contiguous token strings and each reference sliding sectional fingerprint corresponds to one of the groups of reference document contiguous token strings; and

    wherein the monitored network traffic is interpreted as groups of contiguous token strings and each traffic sliding sectional fingerprint corresponds to one of the groups of monitored traffic contiguous token strings.

View all claims
  • 15 Assignments
Timeline View
Assignment View
    ×
    ×