×

Identifying whether electronic data under test includes particular information from a database

  • US 8,365,247 B1
  • Filed: 06/30/2009
  • Issued: 01/29/2013
  • Est. Priority Date: 06/30/2009
  • Status: Active Grant
First Claim
Patent Images

1. A method of identifying whether electronic data under test includes particular information from a database, the method comprising:

  • deriving a set of sample tokens from the electronic data under test;

    forming a set of sample fingerprints from the set of sample tokens, each sample fingerprint being based on a sample token of the set of sample tokens; and

    outputting a result signal based on a comparison between the set of sample fingerprints and a set of database fingerprints generated from records of the database, the result signal providing an indication of whether the electronic data under test includes the particular information from the database;

    wherein deriving the set of sample tokens includes;

    parsing the electronic data under test into a series of un-normalized words, andremoving predefined characters from the series of un-normalized words to form, as the set of sample tokens, a series of normalized words, each normalized word including a string of actual characters;

    wherein forming the set of sample fingerprints from the set of sample tokens includes;

    applying a hashing function to the string of actual characters of each normalized word of the series of normalized words to generate, as the set of sample fingerprints, hash results corresponding to the series of normalized words;

    wherein outputting the result signal based on the comparison between the set of sample fingerprints and the set of database fingerprints includes;

    searching the set of database fingerprints for the hash results and determining whether, for any record of the database, a predetermined number of database fingerprints corresponding to that record is found to match the hash results, andproviding the result signal with one of (i) a first control value when, for any record of the database, the predetermined number of database fingerprints corresponding to that record is found to match the hash results, and (ii) a second control value when, for each record of the database, less than the predetermined number of database fingerprints corresponding to that record is found to match the hash results, the first value being different than the second value;

    wherein the method further comprises;

    blocking access to the electronic data under test when the result signal has the first control value, and permitting access to the electronic data under test when the result signal has the second control value;

    wherein searching and determining includes;

    receiving a set of fingerprint matching rules,providing a set of intermediate search results based on searching the set of database fingerprints for the hash results, andapplying the set of fingerprint matching rules to the set of intermediate search results to identify whether, for any record of the database, the predetermined number of database fingerprints corresponding to that record is found to match the hash results;

    wherein the set of database fingerprints resides in a database fingerprint structure which increases monotonically based on the database fingerprints of the set of database fingerprints; and

    wherein providing the set of intermediate search results based on searching the set of database fingerprints for the hash results includes;

    carrying out a binary search of the database fingerprint structure for the hash results.

View all claims
  • 14 Assignments
Timeline View
Assignment View
    ×
    ×