Hashed indexing
First Claim
1. In a computing environment including a data store, the data store including data items stored in records in the data store, a method of indexing data items into an index which indexes the data items by indexing hashes of the data items rather than the data items themselves, the method comprising:
- performing a parse phase to parse through the data items in a data store to calculate hashes for the data items in the data store;
correlating calculated hashes to the hashed data items;
maintaining the correlation between calculated hashes and the hashed data items such that when a data items is encountered a subsequent time, when either parsing the data store or searching the data store by referencing an index, a given hash correlated to a given data item does not need to be recalculated, but rather can be obtained from the maintained correlation;
a computing system that includes one or more processors identifying a data item stored in a data store record of the data store, the data store record having a data store address;
the computing system identifying a hash of the data item by referencing the maintained correlation between calculated hashes and data items generated as a result of the parse phase;
the computing system indexing the data store address for the data store record for the data item to at least an entry for the hash in a hash keyed index such that the hash keyed index comprises records,wherein each record is at an index record physical memory address, and the records comprise hash keys, comprising at least a portion of a hash of a data item, correlated to data store records such that at least a portion of a hash of a given data item is indexed to data store records comprising the given data item,wherein the hash keyed index comprises a linked list such that records in the hash key index comprise a reference to an index record physical memory address of a preceding hash keyed index entry and a reference to an index record physical memory address of a subsequent hash keyed index entry, wherein indexing the data store address for the data store record for the data item comprises updating the linked list when new entries are added to the hash keyed index between existing entries in the hash keyed index;
wherein indexing the data store address for the data store record for the data item comprises attempting to locate an index record physical memory address to add a data item record reference by referencing an offset correlation table that indexes index record physical addresses and jumps to index records indexing physical memory address to a defined number of top bits of hashes of data items, such that by knowing a number of top bits of a hash of a data item, entry can be made into the linked list at an index record physical memory address when a physical memory address is indexed to a defined top number of bits of a hash and a determination can be made that a data item has not been previously indexed when a negative number and a jump distance is indexed to a defined top number of bits of a hash; and
updating the offset correlation table to reflect the entry of the data store item into the hash keyed index when a negative number is encountered when referencing the offset correlation table to attempt to locate an index record physical memory address, wherein updating the offset correlation table includes replacing the encountered negative number with the physical memory address of the hash keyed index entry for the data item.
3 Assignments
0 Petitions
Accused Products
Abstract
Indexing data items into an index. A method includes identifying a parameter pattern for a data item stored in a data store record of a data store. The data store record has a data store location identifier. The method further includes identifying a hash of the parameter pattern. The data store location identifier is correlated to at least a portion of the hash in the index. The index includes index entries where each index entry includes at least a portion of a hash and one or more references to data store records by reference to data store location identifiers.
-
Citations
10 Claims
-
1. In a computing environment including a data store, the data store including data items stored in records in the data store, a method of indexing data items into an index which indexes the data items by indexing hashes of the data items rather than the data items themselves, the method comprising:
-
performing a parse phase to parse through the data items in a data store to calculate hashes for the data items in the data store; correlating calculated hashes to the hashed data items; maintaining the correlation between calculated hashes and the hashed data items such that when a data items is encountered a subsequent time, when either parsing the data store or searching the data store by referencing an index, a given hash correlated to a given data item does not need to be recalculated, but rather can be obtained from the maintained correlation; a computing system that includes one or more processors identifying a data item stored in a data store record of the data store, the data store record having a data store address; the computing system identifying a hash of the data item by referencing the maintained correlation between calculated hashes and data items generated as a result of the parse phase; the computing system indexing the data store address for the data store record for the data item to at least an entry for the hash in a hash keyed index such that the hash keyed index comprises records, wherein each record is at an index record physical memory address, and the records comprise hash keys, comprising at least a portion of a hash of a data item, correlated to data store records such that at least a portion of a hash of a given data item is indexed to data store records comprising the given data item, wherein the hash keyed index comprises a linked list such that records in the hash key index comprise a reference to an index record physical memory address of a preceding hash keyed index entry and a reference to an index record physical memory address of a subsequent hash keyed index entry, wherein indexing the data store address for the data store record for the data item comprises updating the linked list when new entries are added to the hash keyed index between existing entries in the hash keyed index; wherein indexing the data store address for the data store record for the data item comprises attempting to locate an index record physical memory address to add a data item record reference by referencing an offset correlation table that indexes index record physical addresses and jumps to index records indexing physical memory address to a defined number of top bits of hashes of data items, such that by knowing a number of top bits of a hash of a data item, entry can be made into the linked list at an index record physical memory address when a physical memory address is indexed to a defined top number of bits of a hash and a determination can be made that a data item has not been previously indexed when a negative number and a jump distance is indexed to a defined top number of bits of a hash; and updating the offset correlation table to reflect the entry of the data store item into the hash keyed index when a negative number is encountered when referencing the offset correlation table to attempt to locate an index record physical memory address, wherein updating the offset correlation table includes replacing the encountered negative number with the physical memory address of the hash keyed index entry for the data item. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. In a computing environment including a data store, the data store including data items stored in records in the data store, a non-transitory storage medium storing computer executable instructions that when executed by one or more cause the following acts to be performed:
-
performing a parse phase to parse through data items in a data store to calculate hashes for the data items in the data store; correlating calculated hashes to the hashed data items; maintaining the correlation between calculated hashes and the hashed data items such that when a data items is encountered a subsequent time, when either parsing the data store or searching the data store by referencing an index, a given hash correlated to a given data item does not need to be recalculated, but rather can be obtained from the maintained correlation; a computing system that includes one or more processors identifying a data item stored in a data store record of the data store, the data store record having a data store address; the computing system identifying a hash of the data item by referencing the maintained correlation between calculated hashes and data items generated as a result of the parse phase; the computing system indexing the data store address for the data store record for the data item to at least an entry for the hash in a hash keyed index such that the hash keyed index comprises records, wherein each record is at an index record physical memory address, and the records comprise hash keys, comprising at least a portion of a hash of a data item, correlated to data store records such that at least a portion of a hash of a given data item is indexed to data store records comprising the given data item, wherein the hash keyed index comprises a linked list such that records in the hash key index comprise a reference to an index record physical memory address of a preceding hash keyed index entry and a reference to an index record physical memory address of a subsequent hash keyed index entry, wherein indexing the data store address for the data store record for the data item comprises updating the linked list when new entries are added to the hash keyed index between existing entries in the hash keyed index; wherein indexing the data store address for the data store record for the data item comprises attempting to locate an index record physical memory address to add a data item record reference by referencing an offset correlation table that indexes index record physical addresses and jumps to index records indexing physical memory address to a defined number of top bits of hashes of data items, such that by knowing a number of top bits of a hash of a data item, entry can be made into the linked list at an index record physical memory address when a physical memory address is indexed to a defined top number of bits of a hash and a determination can be made that a data item has not been previously indexed when a negative number and a jump distance is indexed to a defined top number of bits of a hash; and updating the offset correlation table to reflect the entry of the data store item into the hash keyed index when a negative number is encountered when referencing the offset correlation table to attempt to locate an index record physical memory address, wherein updating the offset correlation table includes replacing the encountered negative number with the physical memory address of the hash keyed index entry for the data item.
-
-
10. In a computing environment including a data store, the data store including data items stored in records in the data store, a system for indexing data items into an index, system comprising:
-
one or more processors; a computer readable memory coupled to the one or more processors, the computer readable memory storing computer executable instructions that when executed by the one or more processors cause the following acts to be performed; performing a parse phase to parse through data items in a data store to calculate hashes for the data items in the data store; correlating calculated hashes to the hashed data items; maintaining the correlation between calculated hashes and the hashed data items such that when a data items is encountered a subsequent time, when either parsing the data store or searching the data store by referencing an index, a given hash correlated to a given data item does not need to be recalculated, but rather can be obtained from the maintained correlation; identifying a data item stored in a data store record of the data store, the data store record having a data store address; identifying a hash of the data item by referencing the maintained correlation between calculated hashes and data items generated as a result of the parse phase; indexing the data store address for the data store record for the data item to at least an entry for the hash in a hash keyed index such that the hash keyed index comprises records, wherein each record is at an index record physical memory address, and the records comprise hash keys, comprising at least a portion of a hash of a data item, correlated to data store records such that at least a portion of a hash of a given data item is indexed to data store records comprising the given data item, wherein the hash keyed index comprises a linked list such that records in the hash key index comprise a reference to an index record physical memory address of a preceding hash keyed index entry and a reference to an index record physical memory address of a subsequent hash keyed index entry, wherein indexing the data store address for the data store record for the data item comprises updating the linked list when new entries are added to the hash keyed index between existing entries in the hash keyed index; wherein indexing the data store address for the data store record for the data item comprises attempting to locate an index record physical memory address to add a data item record reference by referencing an offset correlation table that indexes index record physical addresses and jumps to index records indexing physical memory address to a defined number of top bits of hashes of data items, such that by knowing a number of top bits of a hash of a data item, entry can be made into the linked list at an index record physical memory address when a physical memory address is indexed to a defined top number of bits of a hash and a determination can be made that a data item has not been previously indexed when a negative number and a jump distance is indexed to a defined top number of bits of a hash; and updating the offset correlation table to reflect the entry of the data store item into the hash keyed index when a negative number is encountered when referencing the offset correlation table to attempt to locate an index record physical memory address, wherein updating the offset correlation table includes replacing the encountered negative number with the physical memory address of the hash keyed index entry for the data item.
-
Specification