System and method for protecting specified data combinations
First Claim
1. A method to be executed by a processor in a network environment, comprising:
- extracting a plurality of data elements from a record of a data file;
tokenizing the plurality of data elements into a plurality of tokens;
storing the plurality of tokens in a first tuple of a registration list;
selecting one of the plurality of tokens as a token key for indexing the first tuple, wherein a total number of occurrences of the token key is less than a total number of occurrences of each other token of the first tuple, wherein the total number of occurrences of the token key and the total number of occurrences of each other token are determined based on a plurality of tuples in the registration list; and
generating an index table with a plurality of indexes each corresponding to a unique token key, the index table including a first index corresponding to the token key of the first tuple, wherein the generating the index table includes forcing the unique token keys into a boundary of memory with modulus, wherein the boundary is defined by a prime number,wherein, when two or more tuples of the registration list are indexed by the token key, the first index includes two or more unique offsets indicating respective locations in the registration list of the two or more tuples, wherein each of the two or more tuples includes a respective plurality of tokens and each of the respective plurality of tokens includes the token key.
10 Assignments
0 Petitions
Accused Products
Abstract
A method in one example implementation includes extracting a plurality of data elements from a record of a data file, tokenizing the data elements into tokens, and storing the tokens in a first tuple of a registration list. The method further includes selecting one of the tokens as a token key for the first tuple, where the token is selected because it occurs less frequently in the registration list than each of the other tokens in the first tuple. In specific embodiments, at least one data element is an expression element having a character pattern matching a predefined expression pattern that represents at least two words and a separator between the words. In other embodiments, at least one data element is a word defined by a character pattern of one or more consecutive essential characters. Other specific embodiments include determining an end of the record by recognizing a predefined delimiter.
-
Citations
44 Claims
-
1. A method to be executed by a processor in a network environment, comprising:
-
extracting a plurality of data elements from a record of a data file; tokenizing the plurality of data elements into a plurality of tokens; storing the plurality of tokens in a first tuple of a registration list; selecting one of the plurality of tokens as a token key for indexing the first tuple, wherein a total number of occurrences of the token key is less than a total number of occurrences of each other token of the first tuple, wherein the total number of occurrences of the token key and the total number of occurrences of each other token are determined based on a plurality of tuples in the registration list; and generating an index table with a plurality of indexes each corresponding to a unique token key, the index table including a first index corresponding to the token key of the first tuple, wherein the generating the index table includes forcing the unique token keys into a boundary of memory with modulus, wherein the boundary is defined by a prime number, wherein, when two or more tuples of the registration list are indexed by the token key, the first index includes two or more unique offsets indicating respective locations in the registration list of the two or more tuples, wherein each of the two or more tuples includes a respective plurality of tokens and each of the respective plurality of tokens includes the token key. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. An apparatus, comprising:
-
one or more registration modules operable to generate a registration list having a plurality of tuples, each tuple representing a specified combination of data elements; and a processor operable to execute instructions associated with the one or more registration modules, including; extracting a plurality of data elements from a record of a data file; tokenizing the plurality of data elements into a plurality of tokens; storing the plurality of tokens in a first tuple of the registration list; selecting one of the plurality of tokens as a token key for indexing the first tuple, wherein a total number of occurrences of the token key is less than a total number of occurrences of each other token of the first tuple, wherein the total number of occurrences of the token key and the total number of occurrences of each other token are determined based on a plurality of tuples in the registration list; and generating an index table with a plurality of indexes each corresponding to a unique token key, the index table including a first index corresponding to the token key of the first tuple, wherein the generating the index table includes forcing the unique token keys into a boundary of memory with modulus, wherein the boundary is defined by a prime number, wherein, when two or more tuples of the registration list are indexed by the token key, the first index includes two or more unique offsets indicating respective locations in the registration list of the two or more tuples, wherein each of the two or more tuples includes a respective plurality of tokens and each of the respective plurality of tokens includes the token key. - View Dependent Claims (10, 11, 12, 13)
-
-
14. A method to be executed by a processor in a network environment, comprising:
-
identifying a start of a first data element in a record of a data file;
determining the first data element is an expression element if a first string of characters beginning at the start of the first data element matches a predefined expression pattern, the predefined expression pattern representing at least two words and a separator between the two words;extracting the expression element; tokenizing the expression element into a first token; storing the first token in a first tuple of a registration list; selecting the first token as a token key for indexing the first tuple, wherein a total number of occurrences of the token key is less than a total number of occurrences of each one of one or more other tokens of the first tuple, wherein the total number of occurrences of the token key and the total number of occurrences of each one of the one or more other tokens are determined based on a plurality of tuples in the registration list; and generating an index table with a plurality of indexes each corresponding to a unique token key, the index table including a first index corresponding to the token key of the first tuple, wherein the generating the index table includes forcing the unique token keys into a boundary of memory with modulus, wherein the boundary is defined by a prime number. - View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22, 23, 24)
-
-
25. An apparatus, comprising:
-
a registration list module; and a processor operable to execute instructions associated with the registration list module, including; identifying a start of a first data element in a record of a data file; determining the first data element is an expression element if a first string of characters beginning at the start of the first data element matches a predefined expression pattern, the predefined expression pattern representing at least two words and a separator between the two words; extracting the expression element; tokenizing the expression element into a first token; storing the first token in a first tuple of the registration list; selecting the first token as a token key for indexing the first tuple, wherein a total number of occurrences of the token key is less than a total number of occurrences of each one of one or more other tokens of the first tuple, wherein the total number of occurrences of the token key and the total number of occurrences of each one of the one or more other tokens are determined based on a plurality of tuples in the registration list; and generating an index table with a plurality of indexes each corresponding to a unique token key, the index table including a first index corresponding to the token key of the first tuple, wherein the generating the index table includes forcing the unique token keys into a boundary of memory with modulus, wherein the boundary is defined by a prime number. - View Dependent Claims (26, 27, 28, 29, 30)
-
-
31. At least one non-transitory machine readable medium having instructions stored thereon that, when executed, cause one or more processors to:
-
extract a plurality of data elements from a record of a data file; tokenize the plurality of data elements into a plurality of tokens; store the plurality of tokens in a first tuple of a registration list; select one of the plurality of tokens as a token key for indexing the first tuple, wherein a total number of occurrences of the token key is less than a total number of occurrences of each other token of the first tuple, wherein the total number of occurrences of the token key and the total number of occurrences of each other token are determined based on a plurality of tuples in the registration list; and generate an index table with a plurality of indexes each corresponding to a unique token key, the index table including a first index corresponding to the token key of the first tuple, wherein the generating the index table includes forcing the unique token keys into a boundary of memory with modulus, wherein the boundary is defined by a prime number, wherein, when two or more tuples of the registration list are indexed by the token key, the first index includes two or more unique offsets indicating respective locations in the registration list of the two or more tuples, wherein each of the two or more tuples includes a respective plurality of tokens and each of the respective plurality of tokens includes the token key. - View Dependent Claims (32, 33, 34, 35, 36, 37)
-
-
38. At least one non-transitory machine readable medium having instructions stored thereon that, when executed, cause one or more processors to:
-
identify a start of a first data element in a record of a data file; determine the first data element is an expression element if a first string of characters beginning at the start of the first data element matches a predefined expression pattern, the predefined expression pattern representing at least two words and a separator between the two words; extract the expression element; tokenize the expression element into a first token; store the first token in a first tuple of a registration list; select the first token as a token key for indexing the first tuple, wherein a total number of occurrences of the token key is less than a total number of occurrences of each one of one or more other tokens of the first tuple, wherein the total number of occurrences of the token key and the total number of occurrences of each one of the one or more other tokens are determined based on a plurality of tuples in the registration list; and generate an index table with a plurality of indexes each corresponding to a unique token key, the index table including a first index corresponding to the token key of the first tuple, wherein the generating the index table includes forcing the unique token keys into a boundary of memory with modulus, wherein the boundary is defined by a prime number. - View Dependent Claims (39, 40, 41, 42, 43, 44)
-
Specification