×

Hashed indexing

  • US 8,266,152 B2
  • Filed: 08/30/2007
  • Issued: 09/11/2012
  • Est. Priority Date: 03/03/2006
  • Status: Active Grant
First Claim
Patent Images

1. In a computing environment including a data store, the data store including data items stored in records in the data store, a method of indexing data items into an index which indexes the data items by indexing hashes of the data items rather than the data items themselves, the method comprising:

  • performing a parse phase to parse through the data items in a data store to calculate hashes for the data items in the data store;

    correlating calculated hashes to the hashed data items;

    maintaining the correlation between calculated hashes and the hashed data items such that when a data items is encountered a subsequent time, when either parsing the data store or searching the data store by referencing an index, a given hash correlated to a given data item does not need to be recalculated, but rather can be obtained from the maintained correlation;

    a computing system that includes one or more processors identifying a data item stored in a data store record of the data store, the data store record having a data store address;

    the computing system identifying a hash of the data item by referencing the maintained correlation between calculated hashes and data items generated as a result of the parse phase;

    the computing system indexing the data store address for the data store record for the data item to at least an entry for the hash in a hash keyed index such that the hash keyed index comprises records,wherein each record is at an index record physical memory address, and the records comprise hash keys, comprising at least a portion of a hash of a data item, correlated to data store records such that at least a portion of a hash of a given data item is indexed to data store records comprising the given data item,wherein the hash keyed index comprises a linked list such that records in the hash key index comprise a reference to an index record physical memory address of a preceding hash keyed index entry and a reference to an index record physical memory address of a subsequent hash keyed index entry, wherein indexing the data store address for the data store record for the data item comprises updating the linked list when new entries are added to the hash keyed index between existing entries in the hash keyed index;

    wherein indexing the data store address for the data store record for the data item comprises attempting to locate an index record physical memory address to add a data item record reference by referencing an offset correlation table that indexes index record physical addresses and jumps to index records indexing physical memory address to a defined number of top bits of hashes of data items, such that by knowing a number of top bits of a hash of a data item, entry can be made into the linked list at an index record physical memory address when a physical memory address is indexed to a defined top number of bits of a hash and a determination can be made that a data item has not been previously indexed when a negative number and a jump distance is indexed to a defined top number of bits of a hash; and

    updating the offset correlation table to reflect the entry of the data store item into the hash keyed index when a negative number is encountered when referencing the offset correlation table to attempt to locate an index record physical memory address, wherein updating the offset correlation table includes replacing the encountered negative number with the physical memory address of the hash keyed index entry for the data item.

View all claims
  • 3 Assignments
Timeline View
Assignment View
    ×
    ×