×

METHODS AND SYSTEMS TO FINGERPRINT TEXTUAL INFORMATION USING WORD RUNS

  • US 20130074198A1
  • Filed: 09/14/2012
  • Published: 03/21/2013
  • Est. Priority Date: 07/21/2008
  • Status: Abandoned Application
First Claim
Patent Images

1. A system to prevent unauthorized disclosure of secure information, the system comprising:

  • a processor;

    a memory;

    a processing component configured to;

    receive information including a first text, wherein the first text includes a plurality of words;

    normalize the first text into a first canonical text expression, the first canonical text expression including a plurality of normalized words;

    generate a first word hash list for the first canonical text expression, wherein the first word hash list is generated at a word level; and

    generate one or more fingerprints for the first word hash list, wherein the generation of one or more fingerprints includes;

    assigning a sliding window of size W, wherein W specifies a number of word-value hashes to read from the first word hash list;

    using the sliding window to read the W word-value hashes from the first word hash list;

    designating an anchor word-value hash for the sliding window by selecting a distinct-valued word-value hash among the W word-value hashes; and

    applying a fingerprint hash function to all words starting from a first word-value hash to the anchor word value-hash, wherein applying the fingerprint hash function generates the one or more fingerprints.

View all claims
  • 3 Assignments
Timeline View
Assignment View
    ×
    ×