Method of searching for a data element in a data structure
First Claim
1. In an information handling device having a table of data elements, a method for searching for a received data element in the table of data elements:
- a) selecting a like portion of each data element as an index value for differentiating between the data elements in the table, the portion comprising a contiguous string of bits located at like bit positions in each data element in the table such that the same bit positions from each data element form a logical column in the table;
b) measuring the information content for each logical column;
c) ranking each logical column according to the amount of information content associated with each logical column;
d) selecting a set of logical columns in the portion having a rank at or above a desired rank;
e) receiving a data element at the information handling device;
f) identifying a set of bit positions in a portion of the received data element corresponding to the selected set of columns;
g) applying a mask to the set of bit positions in the portion of the received data element to create a reduced set of bit positions in the portion of the received data element;
h) providing the content of each bit position in the reduced set of bit positions in the portion of the received data element as one of a plurality of keys to be input to a hashing function, the hashing function providing the index value for a data element to be searched for in the table corresponding to the received data element; and
i) rotating the mask;
j) repeating steps g) through i) according to the plurality of keys desired, the mask being rotated in such a manner as to minimize correlation among the plurality of keys;
k) performing hashing functions to obtain a plurality of index values, each hashing function utilizing a unique one of the plurality of keys;
l) searching the table for the index value to locate the received data element.
4 Assignments
0 Petitions
Accused Products
Abstract
A method for searching for a record in a table in a memory of a computer system. A table of records is organized into a group of arrays. A hashing algorithm locates a record in the table. Multiple hashing functions are executed concurrently, according to the number of arrays in the group, such that the record can be located relatively quickly in one of the arrays in the group. The table is analyzed to determine the information content of each bit in a string of bits comprising an index value associated with the table, according to Shannon'"'"'s formula for information-theoretic entropy. The entropy associated with each bit in the string of bits provides a basis for selecting a subset of bits in the string of bits from which to obtain the seed values utilized in the hashing functions. A rotating mask, based on Neumann'"'"'s code, is applied to the subset of bits to obtain different seed values for each of the hashing functions, thereby minimizing the correlation of the keys provided by the hashing functions.
149 Citations
25 Claims
-
1. In an information handling device having a table of data elements, a method for searching for a received data element in the table of data elements:
-
a) selecting a like portion of each data element as an index value for differentiating between the data elements in the table, the portion comprising a contiguous string of bits located at like bit positions in each data element in the table such that the same bit positions from each data element form a logical column in the table;
b) measuring the information content for each logical column;
c) ranking each logical column according to the amount of information content associated with each logical column;
d) selecting a set of logical columns in the portion having a rank at or above a desired rank;
e) receiving a data element at the information handling device;
f) identifying a set of bit positions in a portion of the received data element corresponding to the selected set of columns;
g) applying a mask to the set of bit positions in the portion of the received data element to create a reduced set of bit positions in the portion of the received data element;
h) providing the content of each bit position in the reduced set of bit positions in the portion of the received data element as one of a plurality of keys to be input to a hashing function, the hashing function providing the index value for a data element to be searched for in the table corresponding to the received data element; and
i) rotating the mask;
j) repeating steps g) through i) according to the plurality of keys desired, the mask being rotated in such a manner as to minimize correlation among the plurality of keys;
k) performing hashing functions to obtain a plurality of index values, each hashing function utilizing a unique one of the plurality of keys;
l) searching the table for the index value to locate the received data element. - View Dependent Claims (2, 12, 13, 14, 15)
-
-
3. A method for providing a seed value for input to a hashing function, the hashing function providing a key for searching for a data element in a table of data elements, comprising:
-
a. identifying a subset of bit positions in an index field associated with each of the data elements in the table;
b. receiving a data element for which to search in the table of data elements;
c. selecting a subset of bits in the received data element corresponding to the identified subset of bit positions in the index field;
d. applying a mask to reduce the selected subset of bits in the received data element; and
e. providing the reduced subset of bits from the received data element as the seed value input to the hashing function. - View Dependent Claims (4, 5, 6, 7, 8, 16, 17, 18, 19, 20)
a) measuring the information content of each of the bit positions in the index field; and
b) ranking the bit positions in the index field based on the measure of information content of each of the bit positions in the index field; and
b) identifying a subset of bit positions in the index field having a higher relative measure of information content.
-
-
18. The method of claim 17, wherein measuring the information content of each of the bit positions in the index field comprises measuring the entropic value of each of the bit positions in the index field.
-
19. The method of claim 18, wherein measuring the entropic value of each of the bit positions in the index field comprises measuring the entropic value of each of the bit positions in the index field in accordance with an information-theoretic entropy algorithm.
-
20. The method of claim 19, wherein measuring the entropic value of each of the bit positions in the index field in accordance with an information-theoretic entropy algorithm comprises measuring the entropic value of each of the bit positions in the index field in accordance with Claude Shannon'"'"'s information-theoretic entropy algorithm.
-
9. A method for providing a seed value for input to a hashing function, the hashing function providing a key for searching for a data element in a table of data elements, comprising:
-
a. identifying a subset of bit positions in an index value associated with each of the data elements in the table;
b. receiving a data element for which to search in the table of data elements;
c. selecting a subset of bits in that portion of the received data element corresponding to the identified subset of bit positions in the index value; and
d. repeating, to provide as many seed values as desired for input to a like number of iterations of the hashing function;
i. applying a mask to reduce the subset of bits in the received data element corresponding to the identified subset of bit positions in the index value;
ii. providing the reduced subset of bits from the received data element as the seed value input to the hashing function; and
ii. rotating the mask. - View Dependent Claims (10, 11, 21, 22)
-
-
23. In an information handling device having a table of data elements, a means for searching for a received data element in the table of data elements, comprising:
-
means for selecting a portion of each data element as an index value for differentiating between the data elements in the table, the portion comprising a string of bits located at bit positions in each data element in the table such that the bit positions from each data element form a logical column in the table;
means for measuring the information content for each logical column;
means for ranking each logical column according to the information content associated with each logical column;
means for selecting a set of logical columns in the portion based on their ranking;
means for receiving a data element at the information handling device;
means for identifying a set of bit positions in a portion of the received data element corresponding to the selected set of columns;
means for applying a mask to the set of bit positions in the portion of the received data element to create a reduced set of bit positions in the portion of the received data element;
means for providing the content of each bit position in the reduced set of bit positions in the portion of the received data element as one of a plurality of keys to be input to a hashing function, the hashing function providing the index value for a data element to be searched for in the table corresponding to the received data element; and
means for performing the hashing function to obtain an index value utilizing the one of the plurality of keys;
means for searching the table for the index value to locate the received data element.
-
-
24. An article of manufacture comprising a machine readable medium having a plurality of machine readable instructions stored thereon, wherein the instructions, when executed by a processor, cause the processor to:
-
identify a subset of bit positions in an index field associated with each of the data elements in the table;
receive a data element for which to search in the table of data elements;
select a subset of bits in the received data element corresponding to the identified subset of bit positions in the index field;
apply a mask to reduce the selected subset of bits in the received data element; and
provide the reduced subset of bits from the received data element as the seed value input to the hashing function. - View Dependent Claims (25)
-
Specification