×

Methods and systems to fingerprint textual information using word runs

  • US 8,286,171 B2
  • Filed: 07/21/2008
  • Issued: 10/09/2012
  • Est. Priority Date: 07/21/2008
  • Status: Active Grant
First Claim
Patent Images

1. A computer implemented method for preventing unauthorized disclosure of secure information, the computer implemented method comprising:

  • receiving information including a first text, by a computer system having at least a processor for executing instructions, said first text including a plurality of words;

    normalizing, by said computer system, said first text into a first canonical text expression, said first canonical text expression including a plurality of normalized words;

    generating, at said computer system, a first word hash list for said first canonical text expression, where said first word hash list is generated at a word level;

    generating, at said computer system, a first set of fingerprints for said first word hash list;

    wherein generating said first word hash list includes converting said plurality of normalized words into a plurality of word-value hashes, each specific one of said word-value hashes representing a specific normalized word; and

    wherein said generating said first set of fingerprints includes;

    assigning a sliding window of size W, wherein said sliding window is used for reading a W number of said word-value hashes from said first word hash list;

    using said sliding window to read said W number of said word-level hashes from said first word hash list;

    designating said word-value hash with a distinct value within said sliding window as an anchor; and

    generating a fingerprint using a fingerprint hash function, wherein said fingerprint hash function is applied over all said word-value hashes contained within a start of said sliding window to where said anchor resides in said sliding window.

View all claims
  • 5 Assignments
Timeline View
Assignment View
    ×
    ×