Identifying whether electronic data under test includes particular information from a database
First Claim
1. A method of identifying whether electronic data under test includes particular information from a database, the method comprising:
- deriving a set of sample tokens from the electronic data under test;
forming a set of sample fingerprints from the set of sample tokens, each sample fingerprint being based on a sample token of the set of sample tokens; and
outputting a result signal based on a comparison between the set of sample fingerprints and a set of database fingerprints generated from records of the database, the result signal providing an indication of whether the electronic data under test includes the particular information from the database;
wherein deriving the set of sample tokens includes;
parsing the electronic data under test into a series of un-normalized words, andremoving predefined characters from the series of un-normalized words to form, as the set of sample tokens, a series of normalized words, each normalized word including a string of actual characters;
wherein forming the set of sample fingerprints from the set of sample tokens includes;
applying a hashing function to the string of actual characters of each normalized word of the series of normalized words to generate, as the set of sample fingerprints, hash results corresponding to the series of normalized words;
wherein outputting the result signal based on the comparison between the set of sample fingerprints and the set of database fingerprints includes;
searching the set of database fingerprints for the hash results and determining whether, for any record of the database, a predetermined number of database fingerprints corresponding to that record is found to match the hash results, andproviding the result signal with one of (i) a first control value when, for any record of the database, the predetermined number of database fingerprints corresponding to that record is found to match the hash results, and (ii) a second control value when, for each record of the database, less than the predetermined number of database fingerprints corresponding to that record is found to match the hash results, the first value being different than the second value;
wherein the method further comprises;
blocking access to the electronic data under test when the result signal has the first control value, and permitting access to the electronic data under test when the result signal has the second control value;
wherein searching and determining includes;
receiving a set of fingerprint matching rules,providing a set of intermediate search results based on searching the set of database fingerprints for the hash results, andapplying the set of fingerprint matching rules to the set of intermediate search results to identify whether, for any record of the database, the predetermined number of database fingerprints corresponding to that record is found to match the hash results;
wherein the set of database fingerprints resides in a database fingerprint structure which increases monotonically based on the database fingerprints of the set of database fingerprints; and
wherein providing the set of intermediate search results based on searching the set of database fingerprints for the hash results includes;
carrying out a binary search of the database fingerprint structure for the hash results.
14 Assignments
0 Petitions
Accused Products
Abstract
Electronic circuitry includes an input/output (I/O) interface, memory which stores a set of database fingerprints generated from records of a database, and an analyzing circuit coupled to the I/O interface and the memory. The analyzing circuit is constructed and arranged to derive a set of sample tokens from electronic data under test (e.g., an email, an electronic document, etc.), and form a set of sample fingerprints from the set of sample tokens. Each sample fingerprint is based on a sample token of the set of sample tokens. The analyzing circuit is further constructed and arranged to output a result signal based on a comparison between the set of sample fingerprints and the set of database fingerprints. The result signal provides an indication of whether the electronic data under test includes particular information from the database.
21 Citations
15 Claims
-
1. A method of identifying whether electronic data under test includes particular information from a database, the method comprising:
-
deriving a set of sample tokens from the electronic data under test; forming a set of sample fingerprints from the set of sample tokens, each sample fingerprint being based on a sample token of the set of sample tokens; and outputting a result signal based on a comparison between the set of sample fingerprints and a set of database fingerprints generated from records of the database, the result signal providing an indication of whether the electronic data under test includes the particular information from the database; wherein deriving the set of sample tokens includes; parsing the electronic data under test into a series of un-normalized words, and removing predefined characters from the series of un-normalized words to form, as the set of sample tokens, a series of normalized words, each normalized word including a string of actual characters; wherein forming the set of sample fingerprints from the set of sample tokens includes; applying a hashing function to the string of actual characters of each normalized word of the series of normalized words to generate, as the set of sample fingerprints, hash results corresponding to the series of normalized words; wherein outputting the result signal based on the comparison between the set of sample fingerprints and the set of database fingerprints includes; searching the set of database fingerprints for the hash results and determining whether, for any record of the database, a predetermined number of database fingerprints corresponding to that record is found to match the hash results, and providing the result signal with one of (i) a first control value when, for any record of the database, the predetermined number of database fingerprints corresponding to that record is found to match the hash results, and (ii) a second control value when, for each record of the database, less than the predetermined number of database fingerprints corresponding to that record is found to match the hash results, the first value being different than the second value; wherein the method further comprises; blocking access to the electronic data under test when the result signal has the first control value, and permitting access to the electronic data under test when the result signal has the second control value; wherein searching and determining includes; receiving a set of fingerprint matching rules, providing a set of intermediate search results based on searching the set of database fingerprints for the hash results, and applying the set of fingerprint matching rules to the set of intermediate search results to identify whether, for any record of the database, the predetermined number of database fingerprints corresponding to that record is found to match the hash results; wherein the set of database fingerprints resides in a database fingerprint structure which increases monotonically based on the database fingerprints of the set of database fingerprints; and wherein providing the set of intermediate search results based on searching the set of database fingerprints for the hash results includes; carrying out a binary search of the database fingerprint structure for the hash results. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 13)
-
-
10. Electronic circuitry, comprising:
-
an input/output (I/O) interface; memory which stores a set of database fingerprints generated from records of a database; and an analyzing circuit coupled to the I/O interface and the memory, the analyzing circuit being constructed and arranged to; derive a set of sample tokens from electronic data under test, form a set of sample fingerprints from the set of sample tokens, each sample fingerprint being based on a sample token of the set of sample tokens, and output a result signal based on a comparison between the set of sample fingerprints and the set of database fingerprints, the result signal providing an indication of whether the electronic data under test includes particular information from the database; wherein the analyzing circuit, when deriving the set of sample tokens, is constructed and arranged to; parse the electronic data under test into a series of un-normalized words, and remove predefined characters from the series of un-normalized words to form, as the set of sample tokens, a series of normalized words, each normalized word including a string of actual characters; wherein the analyzing circuit, when forming the set of sample fingerprints from the set of sample tokens, is constructed and arranged to; apply a hashing function to the string of actual characters of each normalized word of the series of normalized words to generate, as the set of sample fingerprints, hash results corresponding to the series of normalized words; wherein the analyzing circuit, when outputting the result signal based on the comparison between the set of sample fingerprints and the set of database fingerprints, is constructed and arranged to; search the set of database fingerprints for the hash results and determining whether, for any record of the database, a predetermined number of database fingerprints corresponding to that record is found to match the hash results, and provide the result signal with one of (i) a first control value when, for any record of the database, the predetermined number of database fingerprints corresponding to that record is found to match the hash results, and (ii) a second control value when, for each record of the database, less than the predetermined number of database fingerprints corresponding to that record is found to match the hash results, the first value being different than the second value; wherein the analyzing circuit is further constructed and arranged to; block access to the electronic data under test when the result signal has the first control value, and permitting access to the electronic data under test when the result signal has the second control value; wherein the analyzing circuit, when searching and determining, is constructed and arranged to; receive a set of fingerprint matching rules, provide a set of intermediate search results based on searching the set of database fingerprints for the hash results, and apply the set of fingerprint matching rules to the set of intermediate search results to identify whether, for any record of the database, the predetermined number of database fingerprints corresponding to that record is found to match the hash results; wherein the set of database fingerprints resides in a database fingerprint structure which increases monotonically based on the database fingerprints of the set of database fingerprints; and wherein the analyzing circuit, when providing the set of intermediate search results based on searching the set of database fingerprints for the hash results, is constructed and arranged to; carry out a binary search of the database fingerprint structure for the hash results. - View Dependent Claims (11, 12)
-
-
14. A method of identifying whether electronic data under test includes particular information from a database, the method comprising:
-
deriving a set of sample tokens from the electronic data under test; forming a set of sample fingerprints from the set of sample tokens, each sample fingerprint being based on a sample token of the set of sample tokens; and outputting a result signal based on a comparison between the set of sample fingerprints and a set of database fingerprints generated from records of the database, the result signal providing an indication of whether the electronic data under test includes the particular information from the database; wherein the method further comprises; querying the records of the database to create a list of query results, and generating the set of database fingerprints from the list of query results; wherein each query result of the list of query results includes a set of un normalized words corresponding to a record of the database; wherein generating the set of database fingerprints from the list of query results includes, for each query result, (i) removing predefined characters from the set of un normalized words of that query result to form a set of normalized words, each normalized word including a string of actual characters, (ii) applying a hashing function to the string of actual characters of each normalized word to provide a set of hash results as the set of database fingerprints, and (iii) storing the hash results in a database fingerprint structure; wherein the list of query results includes a set of data categories, each data category of the set of data categories corresponding to a field in the database; wherein the method further comprises; receiving a set of fingerprint matching rules, the set of fingerprint matching rules being based on the set of data categories, and providing the set of fingerprint rules to the database fingerprint structure; and wherein each query result of the list of query results includes a set of cells, each cell of the set of cells including an un-normalized word of the set of un-normalized words and being associated with a data category of the set of data categories, each un-normalized word of the set of un-normalized words being a value of the data category associated with the cell; wherein removing predefined characters from the set of un-normalized words of that query result to form a set of normalized words includes; for each cell of the set of cells of that query result, deleting the predefined characters from the un-normalized word of the cell to form a normalized word associated with the cell; and wherein receiving the set of fingerprint matching rules includes; obtaining at least one matching rule which labels certain database fingerprints of the set of database fingerprints for required matching, a database fingerprint being labeled for required matching being based on whether the database fingerprint is a hash result resulting from an application of the hashing function to a set of actual characters of a normalized word associated with a cell to which a particular data category of the set of data categories is associated.
-
-
15. Electronic circuitry, comprising:
-
an input/output (I/O) interface; memory which stores a set of database fingerprints generated from records of a database; and an analyzing circuit coupled to the I/O interface and the memory, the analyzing circuit being constructed and arranged to; derive a set of sample tokens from electronic data under test, form a set of sample fingerprints from the set of sample tokens, each sample fingerprint being based on a sample token of the set of sample tokens, and output a result signal based on a comparison between the set of sample fingerprints and the set of database fingerprints, the result signal providing an indication of whether the electronic data under test includes particular information from the database; wherein the analyzing circuit is further constructed and arranged to; query the records of the database to create a list of query results, and generating the set of database fingerprints from the list of query results; wherein each query result of the list of query results includes a set of un normalized words corresponding to a record of the database; wherein the analyzing circuit, when generating the set of database fingerprints from the list of query results, is constructed and arranged to, for each query result, (i) remove predefined characters from the set of un normalized words of that query result to form a set of normalized words, each normalized word including a string of actual characters, (ii) apply a hashing function to the string of actual characters of each normalized word to provide a set of hash results as the set of database fingerprints, and (iii) store the hash results in a database fingerprint structure; wherein the list of query results includes a set of data categories, each data category of the set of data categories corresponding to a field in the database; wherein the analyzing circuit is further constructed and arranged to; receive a set of fingerprint matching rules, the set of fingerprint matching rules being based on the set of data categories, and provide the set of fingerprint rules to the database fingerprint structure; and wherein each query result of the list of query results includes a set of cells, each cell of the set of cells including an un-normalized word of the set of un-normalized words and being associated with a data category of the set of data categories, each un-normalized word of the set of un-normalized words being a value of the data category associated with the cell; wherein the analyzing circuit, when removing predefined characters from the set of un-normalized words of that query result to form a set of normalized words, is constructed and arranged to; for each cell of the set of cells of that query result, delete the predefined characters from the un-normalized word of the cell to form a normalized word associated with the cell; and wherein the analyzing circuit, when receiving the set of fingerprint matching rules, is constructed and arranged to; obtain at least one matching rule which labels certain database fingerprints of the set of database fingerprints for required matching, a database fingerprint being labeled for required matching being based on whether the database fingerprint is a hash result resulting from an application of the hashing function to a set of actual characters of a normalized word associated with a cell to which a particular data category of the set of data categories is associated.
-
Specification