HYBRID HASH TABLES

US 20100217953A1
Filed: 06/15/2009
Published: 08/26/2010
Est. Priority Date: 02/23/2009
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method for inserting a new entry in a hash table system in a data storage system, the method comprising:

providing the hash table system having a first hash table and a second hash table;

computing a first index for the new entry, the first index corresponding to a first element in the first hash table;

computing a second index for the new entry, the second index corresponding to a second element in the second hash table;

inserting a first entry corresponding to the new entry into the first element in the first hash table; and

when the first hash table reaches a threshold load factor, flushing the first entry from the first hash table,wherein flushing comprises inserting a value associated with the new entry into the second element in the second hash table, and removing the first entry from the first element in the first hash table.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A hash table system having a first hash table and a second hash table is provided. The first hash table may be in-memory and the second hash table may be on-disk. Inserting an entry to the hash table system comprises inserting the entry into the first hash table, and, when the first hash table reaches a threshold load factor, flushing entries into the second hash table. Flushing the first hash table into the second hash table may comprise sequentially flushing the first hash table segments into corresponding second hash table segments. When looking up a key/value pair corresponding to a selected key in the hash table system, the system checks both the first and second hash tables for values corresponding to the selected key. The first and second hash tables may be divided into hash table segments and collision policies may be implemented within the hash table segments.

125 Citations

View as Search Results

43 Claims

1. A computer-implemented method for inserting a new entry in a hash table system in a data storage system, the method comprising:
- providing the hash table system having a first hash table and a second hash table;
  
  computing a first index for the new entry, the first index corresponding to a first element in the first hash table;
  
  computing a second index for the new entry, the second index corresponding to a second element in the second hash table;
  
  inserting a first entry corresponding to the new entry into the first element in the first hash table; and
  
  when the first hash table reaches a threshold load factor, flushing the first entry from the first hash table,wherein flushing comprises inserting a value associated with the new entry into the second element in the second hash table, and removing the first entry from the first element in the first hash table.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
- - 2. The method of claim 1, wherein the first hash table is an in-memory hash table, and wherein the second hash table is an on-disk hash table.
  - 3. The method of claim 1, wherein the first entry is the new entry.
  - 4. The method of claim 1,wherein the first entry comprises the second index and the value associated with the new entry, andwherein inserting the value associated with the new entry into the second element comprises using the second index to locate the second element in the second hash table.
  - 5. The method of claim 1,wherein the first entry comprises the value associated with the new entry, andwherein inserting the value associated with the new entry into the second element comprisesusing the value associated with the new entry to identify a key associated with the new entry,re-computing the second index for the new entry by performing a hash function on the key associated with the new entry, andusing the re-computed second index to locate the second element in the second hash table.
  - 6. The method of claim 1,wherein computing the second index comprises performing a hash function on a key associated with the new entry, andwherein computing the first index comprises dividing the second index by a ratio of a size of the second hash table to a size of the first hash table,wherein the ratio is rounded to an integer.
  - 7. The method of claim 6,wherein computing the second index further comprises computing a first modulus of the hashed key and the size of the second hash table, andwherein computing the first index further comprises computing a second modulus of the divided second index and the size of the second hash table.
  - 8. The method of claim 7, wherein the key associated with the new entry is a unique identifier of a data object stored in the data storage system.
  - 9. The method of claim 1,wherein inserting the first entry corresponding to the new entry into the first element in the first hash table further compriseschecking whether the first element contains a previously inserted entry, andwhen the first element contains the previously inserted entry, executing a collision policy to find an unused element, andsubstituting the unused element for the first element.
  - 10. The method of claim 9, wherein the collision policy is a linear probing method.
  - 11. The method of claim 9,wherein the first hash table is divided into a plurality of first hash table segments, andwherein the collision policy is executed within a first hash table segment such that the unused element is found within the first hash table segment.
  - 12. The method of claim 11,wherein the collision policy is a linear probing method, andwherein executing the collision policy within the first hash table segment comprises executing the linear probing method from a beginning of the first hash table segment.
  - 13. The method of claim 1,wherein the second hash table is divided into a plurality of second hash table segments,wherein inserting the value entry associated with the new entry into the second element in the second hash table further compriseschecking whether the second element contains a previously inserted entry, andwhen the second element contains the previously inserted entry, executing a collision policy to find an unused element, andsubstituting the unused element for the second element.
  - 14. The method of claim 13,wherein the collision policy is a linear probing method and is executed within a second hash table segment such that the unused element is found within the second hash table segment, andwherein executing the collision policy within the second hash table segment comprises executing the linear probing method from a beginning of the second hash table segment.
  - 15. The method of claim 1,wherein the first hash table is divided into a plurality of first hash table segments,wherein the second hash table is divided into a plurality of second hash table segments, each of the second hash table segments corresponding to one of the plurality of first hash table segments, andwherein flushing further comprisesflushing all entries stored in one of the first hash table segments into its corresponding second hash table segment before flushing all entries stored in another of the first hash table segments into its corresponding second hash table segment.

16. A computer-implemented method for identifying a value associated with a selected key, the method comprising:
- providing a computer system having a processor, an in-memory hash table, and an on-disk hash table;
  
  checking, by the processor, the in-memory hash table for an entry corresponding to the selected key,wherein the entry corresponding to the selected key identifies the value associated with the selected key, andwhen the entry corresponding to the selected key is not found in the in-memory hash table, checking, by the processor, the on-disk hash table for the entry corresponding to the selected key.
- View Dependent Claims (17)
- - 17. The method of claim 16,wherein the computer system comprises a data storage system,wherein the selected key is a unique identifier of a data object stored in the data storage system, andwherein the value indicates a location in the data storage system where the data object is stored.

18. A computer-implemented method for identifying a set of values associated with a selected key, the method comprising:
- providing a computer system having a processor, an in-memory hash table, and an on-disk hash table;
  
  computing, by the processor, a first index for the selected key;
  
  selecting first candidate elements from the in-memory hash table using the first index, the first candidate elements having associated first candidate keys;
  
  computing, by the processor, a second index for the selected key;
  
  selecting second candidate elements from the on-disk hash table using the second index, the second candidate elements having associated second candidate keys;
  
  examining, by the processor, the first candidate elements to determine whether a first candidate key matches the selected key, and, when the first candidate key matches the selected key, identifying, as a member of the set of values, a first value associated with a first candidate element that is associated with the first candidate key; and
  
  examining, by the processor, the second candidate elements to determine whether a second candidate key matches the selected key, and, when the second candidate key matches the selected key, identifying, as a member of the set of values, a second value associated with a second candidate element that is associated with the second candidate key.
- View Dependent Claims (19, 20, 21)
- - 19. The method of claim 18,wherein selecting the first candidate elements comprises selecting an element corresponding to the first index in the in-memory hash table, and when a next in-memory element in the in-memory hash table has a previously inserted entry, selecting the next in-memory element;
    - andwherein selecting second candidate elements comprises selecting an element corresponding to the second index in the on-disk hash table, and when a next on-disk element in the on-disk hash table has a previously stored entry, selecting the next on-disk element.
  - 20. The method of claim 19,wherein the in-memory hash table is divided into a plurality of in-memory hash table segments,wherein the next in-memory element is found within a first of the in-memory hash table segments,wherein the on-disk hash table is divided into a plurality of on-disk hash table segments, andwherein the next on-disk element is found within a first of the on-disk hash table segments.
  - 21. The method of claim 18,wherein the selected key is associated with a selected discriminator value,wherein selecting the first candidate elements comprises rejecting elements in the in-memory hash table containing an entry having a first discriminator value that is different than the selected discriminator value, andwherein selecting the second candidate elements comprises rejecting elements in the on-disk hash table containing an entry having a second discriminator value that is different than the selected discriminator value.

22. A data storage system comprising:
- a hash table system comprising a first hash table and a second hash table;
  
  an insertion module for inserting a new entry in the hash table system, wherein inserting comprisescomputing a first index for the new entry, the first index corresponding to a first element in the first hash table;
  
  computing a second index for the new entry, the second index corresponding to a second element in the second hash table;
  
  inserting a first entry corresponding to the new entry into the first element in the first hash table; and
  
  an entry flushing module for flushing the first entry from the first hash table when the first hash table reaches a threshold load factor,wherein flushing comprises inserting a value associated with the new entry into the second element in the second hash table, and removing the first entry from the first element in the first hash table.
- View Dependent Claims (23, 24, 25, 26, 27)
- - 23. The data storage system of claim 22, wherein the first hash table is an in-memory hash table, and wherein the second hash table is an on-disk hash table.
  - 24. The data storage system of claim 22,wherein the first entry comprises the second index and the value associated with the new entry, andwherein inserting the value associated with the new entry into the second element comprises using the second index to locate the second element in the second hash table.
  - 25. The data storage system of claim 22,wherein the first hash table is divided into a plurality of first hash table segments, andwherein inserting the first entry corresponding to the new entry into the first element in the first hash table further compriseschecking whether the first element contains a previously inserted entry,when the first element contains the previously inserted entry, executing a collision policy within a first hash table segment such that an unused element is found within the first hash table segment, andsubstituting the unused element for the first element.
  - 26. The data storage system of claim 22,wherein the second hash table is divided into a plurality of second hash table segments,wherein inserting the value entry associated with the new entry into the second element in the second hash table further compriseschecking whether the second element contains a previously inserted entry, andwhen the second element contains the previously inserted entry, executing a collision policy within a second hash table segment such that an unused element is found within the second hash table segment, andsubstituting the unused element for the second element.
  - 27. The data storage system of claim 22,wherein the first hash table is divided into a plurality of first hash table segments,wherein the second hash table is divided into a plurality of second hash table segments, each of the second hash table segments corresponding to one of the plurality of first hash table segments, andwherein flushing further comprisesflushing all entries stored in one of the first hash table segments into its corresponding second hash table segment before flushing all entries stored in another of the first hash table segments into its corresponding second hash table segment.

28. A data storage system comprising:
- a hash table system comprising an in-memory hash table and an on-disk hash table; and
  
  an identification module for identifying a value associated with a selected key, wherein identifying compriseschecking the in-memory hash table for an entry corresponding to the selected key,wherein the entry corresponding to the selected key identifies the value associated with the selected key, andwhen the entry corresponding to the selected key is not found in the in-memory hash table, checking the on-disk hash table for the entry corresponding to the selected key.

29. A data storage system comprising:
- a hash table system comprising a first hash table and a second hash table; and
  
  an identification module for identifying a set of values associated with a selected key, wherein identifying comprisescomputing a first index for the selected key;
  
  selecting first candidate elements from the in-memory hash table using the first index, the first candidate elements having associated first candidate keys;
  
  computing a second index for the selected key;
  
  selecting second candidate elements from the on-disk hash table using the second index, the second candidate elements having associated second candidate keys;
  
  examining the first candidate elements to determine whether a first candidate key matches the selected key, and when the first candidate key matches the selected key, identifying, as a first member of the set of values, a first value associated with a first candidate element that is associated with the first candidate key; and
  
  examining the second candidate elements to determine whether a second candidate key matches the selected key, and when the second candidate key matches the selected key, identifying, as a second member of the set of values, a second value associated with a second candidate element that is associated with the second candidate key.
- View Dependent Claims (30, 31, 32)
- - 30. The data storage system of claim 29,wherein selecting the first candidate elements comprises selecting an element corresponding to the first index in the in-memory hash table, and when a next in-memory element in the in-memory hash table has a previously inserted entry, selecting the next in-memory element;
    - andwherein selecting second candidate elements comprises selecting an element corresponding to the second index in the on-disk hash table, and when a next on-disk element in the on-disk hash table has a previously stored entry, selecting the next on-disk element.
  - 31. The data storage system of claim 30,wherein the in-memory hash table is divided into a plurality of in-memory hash table segments,wherein the next in-memory element is found within a first of the in-memory hash table segments,wherein the on-disk hash table is divided into a plurality of on-disk hash table segments, andwherein the next on-disk element is found within a first of the on-disk hash table segments.
  - 32. The data storage system of claim 29,wherein the selected key is associated with a selected discriminator value,wherein selecting the first candidate elements comprises rejecting elements in the in-memory hash table containing an entry having a first discriminator value that is different than the selected discriminator value, andwherein selecting the second candidate elements comprises rejecting elements in the on-disk hash table containing an entry having a second discriminator value that is different than the selected discriminator value.

33. A computer program product comprising a computer usable medium having computer-executable instructions that control a computer to perform a method for inserting a new entry in a hash table system, the method comprising the computer-implemented steps of:
- providing the hash table system comprising a first hash table and a second hash table;
  
  computing a first index for the new entry, the first index corresponding to a first element in the first hash table;
  
  computing a second index for the new entry, the second index corresponding to a second element in the second hash table;
  
  inserting a first entry corresponding to the new entry into the first element in the first hash table; and
  
  when the first hash table reaches a threshold load factor, flushing the first entry from the first hash table,wherein flushing comprises inserting a value associated with the new entry into the second element in the second hash table, and removing the first entry from the first element in the first hash table.
- View Dependent Claims (34, 35, 36, 37, 38)
- - 34. The computer program product of claim 33, wherein the first hash table is an in-memory hash table, and wherein the second hash table is an on-disk hash table.
  - 35. The computer program product of claim 33,wherein the first entry comprises the second index and the value associated with the new entry, andwherein inserting the value associated with the new entry into the second element comprises using the second index to locate the second element in the second hash table.
  - 36. The computer program product of claim 33,wherein the first hash table is divided into a plurality of first hash table segments, andwherein inserting the first entry corresponding to the new entry into the first element in the first hash table further compriseschecking whether the first element contains a previously inserted entry,when the first element contains the previously inserted entry, executing a collision policy within a first hash table segment such that an unused element is found within the first hash table segment, andsubstituting the unused element for the first element.
  - 37. The computer program product of claim 33,wherein the second hash table is divided into a plurality of second hash table segments,wherein inserting the value entry associated with the new entry into the second element in the second hash table further compriseschecking whether the second element contains a previously inserted entry, andwhen the second element contains the previously inserted entry, executing a collision policy within a second hash table segment such that an unused element is found within the second hash table segment, andsubstituting the unused element for the second element.
  - 38. The computer program product of claim 33,wherein the first hash table is divided into a plurality of first hash table segments,wherein the second hash table is divided into a plurality of second hash table segments, each of the second hash table segments corresponding to one of the plurality of first hash table segments, andwherein flushing further comprisesflushing all entries stored in one of the first hash table segments into its corresponding second hash table segment before flushing all entries stored in another of the first hash table segments into its corresponding second hash table segment.

39. A computer program product comprising a computer usable medium having computer-executable instructions that control a computer to perform a method for identifying a value associated with a selected key, the method comprising the computer-implemented steps of:
- providing a hash table system comprising an in-memory hash table and an on-disk hash table;
  
  checking the in-memory hash table for an entry corresponding to the selected key,wherein the entry corresponding to the selected key identifies the value associated with the selected key, andwhen the entry corresponding to the selected key is not found in the in-memory hash table, checking the on-disk hash table for the entry corresponding to the selected key.

40. A computer program product comprising a computer usable medium having computer-executable instructions that control a computer to perform a method for identifying a set of values associated with a selected key, the method comprising the computer-implemented steps of:
- providing a hash table system comprising an in-memory hash table and an on-disk hash table;
  
  computing a first index for the selected key;
  
  selecting first candidate elements from the in-memory hash table using the first index, the first candidate elements having associated first candidate keys;
  
  computing a second index for the selected key;
  
  selecting second candidate elements from the on-disk hash table using the second index, the second candidate elements having associated second candidate keys;
  
  examining the first candidate elements to determine whether a first candidate key matches the selected key, and, when the first candidate key matches the selected key, identifying, as a member of the set of values, a first value associated with a first candidate element that is associated with the first candidate key; and
  
  examining the second candidate elements to determine whether a second candidate key matches the selected key, and, when the second candidate key matches the selected key, identifying, as a member of the set of values, a second value associated with a second candidate element that is associated with the second candidate key.
- View Dependent Claims (41, 42, 43)
- - 41. The computer program product of claim 40,wherein selecting the first candidate elements comprises selecting an element corresponding to the first index in the in-memory hash table, and when a next in-memory element in the in-memory hash table has a previously inserted entry, selecting the next in-memory element;
    - andwherein selecting second candidate elements comprises selecting an element corresponding to the second index in the on-disk hash table, and when a next on-disk element in the on-disk hash table has a previously stored entry, selecting the next on-disk element.
  - 42. The computer program product of claim 41,wherein the in-memory hash table is divided into a plurality of in-memory hash table segments,wherein the next in-memory element is found within a first of the in-memory hash table segments,wherein the on-disk hash table is divided into a plurality of on-disk hash table segments, andwherein the next on-disk element is found within a first of the on-disk hash table segments.
  - 43. The computer program product of claim 42,wherein the selected key is associated with a selected discriminator value,wherein selecting the first candidate elements comprises rejecting elements in the in-memory hash table containing an entry having a first discriminator value that is different than the selected discriminator value, andwherein selecting the second candidate elements comprises rejecting elements in the on-disk hash table containing an entry having a second discriminator value that is different than the selected discriminator value.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Autonomy Incorporated (HP Inc.)
Original Assignee
Autonomy Incorporated (HP Inc.)
Inventors
Newson, Robert S., Beaman, Peter D., Tran, Tuyen M.

Granted Patent

US 8,397,051 B2
Time in Patent Office

Days
Field of Search
US Class Current

711/216
CPC Class Codes

G06F 16/2255 Hash tables

HYBRID HASH TABLES

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

125 Citations

43 Claims

Specification

Solutions

Use Cases

Quick Links

HYBRID HASH TABLES

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

125 Citations

43 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links