Hash-based system and method with primary and secondary hash functions for rapidly identifying the existence and location of an item in a file
First Claim
1. A system for rapidly identifying the existence of an item in a file, comprising:
- master file storage means for storing a plurality of items; and
hash table means, coupled to the master file storage means, for storing a plurality of hash buckets, each hash bucket identified by a primary hash key, each hash bucket comprising at least one hash entry, each hash entry comprising;
pointing means for pointing to an item in the master file, for identifying the location of the item; and
storing means for storing a secondary hash key obtained by applying a secondary hash function.
0 Assignments
0 Petitions
Accused Products
Abstract
A system and method for rapidly identifying the existence and location of an item in a file using an improved hash table architecture. A hash table is constructed having a plurality of hash buckets, each identified by a primary hash key. Each hash entry in each hash bucket contains a pointer to a record in a master file, as well as a secondary hash key independent of the primary hash key. A search for a particular item is performed by identifying the appropriate hash bucket by obtaining a primary hash key for the search term. Individual hash entries within the hash bucket are checked for matches by comparing the stored secondary keys with the secondary key for the search term. Potentially matching records can be identified or ruled out without necessitating repeated reads of the master file. The improved hash table system and method is employed in a contextual text searching application for determining the intersection of a text search with a hierarchical categorization scheme.
79 Citations
43 Claims
-
1. A system for rapidly identifying the existence of an item in a file, comprising:
-
master file storage means for storing a plurality of items; and
hash table means, coupled to the master file storage means, for storing a plurality of hash buckets, each hash bucket identified by a primary hash key, each hash bucket comprising at least one hash entry, each hash entry comprising;
pointing means for pointing to an item in the master file, for identifying the location of the item; and
storing means for storing a secondary hash key obtained by applying a secondary hash function. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
search term input means for obtaining a search term;
primary hash function application means coupled to the search term input means, for applying a primary hash function to obtain a primary hash key for the search term;
secondary hash function application means coupled to the search term input means, for applying the secondary hash function to obtain a secondary hash key for the search term; and
hash key comparison means coupled to the hash table means, for cornparing the secondary hash key for the search term with at least one secondary hash key for a hash entry in an identified hash bucket;
wherein the identified hash bucket is identified by a primary hash key matching the obtained primary hash key for the search term.
-
-
3. The system of claim 2, wherein the search term comprises a plurality of characters, and the primary hash function application means applies a primary hash function by performing successive exclusive-OR operations on successive characters of the search term.
-
4. The system of claim 2, wherein the search term comprises a plurality of segments, and the secondary hash function application means applies a secondary hash function by encoding each of at least one of the plurality of segments according to a predetermined encoding scheme.
-
5. The system of claim 2, further comprising:
item comparison means, coupled to the master file storage means and to the hash table means, for, responsive to the hash key comparison means indicating at least one hash entry having a matching secondary hash key, comparing the search term with an item in the master file storage means identified by the pointing means in the hash entry.
-
6. The system of claim 5, further comprising:
output means coupled to the item comparison means, for outputting the results of the item comparison performed by the item comparison means.
-
7. The system of claim 6, wherein, responsive to the results of the item comparison indicating a match, the output means outputs the location of the matching item in the master file storage means.
-
8. The system of claim 2, further comprising:
output means coupled to the hash key comparison means, for outputting the results of the hash key comparison performed by the hash key comparison means.
-
9. The system of claim 8, wherein the output means, responsive to the results of the hash key comparison indicating a match, outputs the location of the matching item in the master file storage means.
-
10. The system of claim 1, wherein:
-
the master file storage means is implemented as a database in a computer system; and
the hash table means is implemented as a database in a computer system.
-
-
11. The system of claim 1, wherein:
-
the master file storage means stores a plurality of descriptor means, each descriptor means for representing a document associated with a particular subject category; and
the item represents a result of a text search.
-
-
12. A system for identifying the existence of an item in a file, comprising:
-
primary hash function means, for applying a primary hash function to obtain a primary hash key for a search term;
hash bucket identification means, coupled to the primary hash function means, for identifying a hash bucket having a primary hash key corresponding to the obtained primary hash key, the hash bucket comprising at least one hash entry, each hash entry comprising a value and a secondary hash key;
secondary hash function means, for applying a secondary hash function to obtain a secondary hash key for the search term; and
comparing means, coupled to the secondary hash function means and to the hash bucket identification means, for comparing the secondary hash key for the search term with the secondary hash key for at least one hash entry in the identified hash bucket. - View Dependent Claims (13, 14, 15, 16)
output means, coupled to the comparing means, for outputting the results of the comparing means.
-
-
14. The system of claim 12, wherein the value in each hash entry represents a pointer to a record location in a master file storage means, the system further comprising:
-
retrieving means, for, responsive to the comparing means indicating at least one match, retrieving a record in the master file storage means having a location corresponding to the value in the means matching hash entry; and
second comparing means, coupled to the retrieving means, for comparing the search term with the retrieved record.
-
-
15. The system of claim 14, further comprising:
second output means, coupled to the second comparing means, for, responsive to the second comparing means indicating a match, outputting the pointer for the matching record.
-
16. The system of claim 14, further comprising:
second output means, coupled to the second comparing means, for, responsive to the second comparing means indicating a match, outputting the matching record.
-
17. A computer-implemented method for rapidly identifying the existence of an item in a file, comprising:
-
a) accessing a master file for storing a plurality of items; and
b) accessing a hash table, comprising a plurality of hash buckets, each hash bucket identified by a primary hash key, each hash bucket comprising at least one hash entry, each hash entry comprising;
a pointer to an item in the master file, for identifying the location of the item; and
a secondary hash key obtained by applying a secondary hash function. - View Dependent Claims (18, 19, 20, 21, 22, 23, 24, 25, 26, 27)
c) obtaining a search term;
d) applying a primary hash function to obtain a primary hash key for the search term;
e) applying the secondary hash function to obtain a secondary hash key for the search term; and
(f) comparing the secondary hash key for the search term with at least one secondary hash key for a hash entry in an identified hash bucket;
wherein the identified hash bucket is identified by a primary hash key matching the obtained primary hash key for the search term.
-
-
19. The method of claim 18, wherein the search term comprises a plurality of characters, and d) comprises applying a primary hash function by performing successive exclusive-OR operations on successive characters of the search term.
-
20. The method of claim 18, wherein the search term comprises a plurality of segments, and d) comprises applying a secondary hash function by encoding each of at least one of the plurality of segments according to a predetermined encoding scheme.
-
21. The method of claim 18, further comprising:
g) responsive to f) indicating at least one hash entry having a matching secondary hash key, comparing the search term with an item in the master file identified by the pointer in the hash entry.
-
22. The method of claim 21, further comprising:
h) outputting the results of the item comparison performed in g).
-
23. The method of claim 22, wherein h) comprises, responsive to the results of g) indicating a match, outputting the location of the matching item in the master file.
-
24. The method of claim 18, further comprising:
g) outputting the results of f).
-
25. The method of claim 24, wherein g) comprises, responsive to the results of the f) indicating a match, outputting the location of the matching item in the master file.
-
26. The method of claim 17, wherein:
-
the master file is a database in a computer system; and
the hash table is a database in a computer system.
-
-
27. The method of claim 17, wherein:
-
the master file stores a plurality of descriptors, each descriptor representing a document associated with a particular subject category; and
the item represents a result of a text search.
-
-
28. A computer program product comprising a computer-usable medium having computer-readable code embodied therein for rapidly identifying the existence of an item in a file, comprising:
-
computer-readable program code devices configured to cause a computer to access a master file for storing a plurality of items; and
computer-readable program code devices configured to cause a computer to access a hash table, comprising a plurality of hash buckets, each hash bucket identified by a primary hash key, each hash bucket comprising at least one hash entry, each hash entry comprising;
a pointer to an item in the master file, for identifying the location of the item; and
a secondary hash key obtained by applying a secondary hash function. - View Dependent Claims (29, 30, 31, 32, 33, 34, 35, 36, 37, 38)
computer-readable program code devices configured to cause a computer to obtain a search term;
computer-readable program code devices configured to cause a computer to apply a primary hash function to obtain a primary hash key for the search term;
computer-readable program code devices configured to cause a computer to apply the secondary hash function to obtain a secondary hash key for the search term; and
computer-readable program code devices configured to cause a computer to compare the secondary hash key for the search term with at least one secondary hash key for a hash entry in an identified hash bucket;
wherein the identified hash bucket is identified by a primary hash key matching the obtained primary hash key for the search term.
-
-
30. The computer program product of claim 29, wherein the search term comprises a plurality of characters, and the computer-readable program code devices configured to cause a computer to apply a primary hash function comprise computer-readable program code devices configured to cause a computer to apply a primary hash function by performing successive exclusive-OR operations on successive characters of the search term.
-
31. The computer program product of claim 29, wherein the search term comprises a plurality of segments, and the computer-readable program code devices configured to cause a computer to apply a primary hash function comprise computer-readable program code devices configured to cause a computer to apply a secondary hash function by encoding each of at least one of the plurality of segments according to a predetermined encoding scheme.
-
32. The computer program product of claim 29, further comprising:
computer-readable program code devices configured to cause a computer, responsive to the computer-readable program code devices configured to cause a computer to compare the secondary hash key for the search term with at least one secondary hash key for a hash entry in an identified hash bucket indicating at least one hash entry having a matching secondary hash key, to compare the search term with an item in the master file identified by the pointer in the hash entry.
-
33. The computer program product of claim 32, further comprising:
computer-readable program code devices configured to cause a computer to output the results of the item comparison.
-
34. The computer program product of claim 33, wherein the computer-readable program code devices configured to cause a computer to output the results comprise computer-readable program code devices configured, responsive to the results of the item comparison indicating a match, to cause a computer to output the location of the matching item in the master file.
-
35. The computer program product of claim 29, further comprising:
computer-readable program code devices configured to cause a computer to output the results of the computer-readable program code devices configured to cause a computer to compare the secondary hash key for the search term with at least one secondary hash key for a hash entry in an identified hash bucket.
-
36. The computer program product of claim 35, wherein the computer-readable program code devices configured to cause a computer to output the results comprise computer-readable program code devices configured to, responsive to the results of the comparison indicating a match, output the location of the matching item in the master file.
-
37. The computer program product of claim 28, wherein:
-
the master file is a database in a computer system; and
the hash table is a database in a computer system.
-
-
38. The computer program product of claim 28, wherein:
-
the master file stores a plurality of descriptors, each descriptor representing a document associated with a particular subject category; and
the item represents a result of a text search.
-
-
39. A computer program product comprising a computer-usable medium having computer-readable code embodied therein for identifying the existence of an item in a file, comprising:
-
computer-readable program code devices configured to cause a computer to apply a primary hash function to obtain a primary hash key for a search term;
computer-readable program code devices configured to cause a computer to identify a hash bucket having a primary hash key corresponding to the obtained primary hash key, the hash bucket comprising at least one hash entry, each hash entry comprising a value and a secondary hash key;
computer-readable program code devices configured to cause a computer to apply a secondary hash function to obtain a secondary hash key for the search term; and
computer-readable program code devices configured to cause a computer to compare the secondary hash key for the search term with the secondary hash key for at least one hash entry in the identified hash bucket. - View Dependent Claims (40, 41, 42, 43)
computer-readable program code devices configured to cause a computer to output the results of the comparison.
-
-
41. The computer program product of claim 39, wherein the value in each hash entry represents a pointer to a record location in a master file, the computer program product further comprising:
-
computer-readable program code devices configured to cause a computer to, responsive to the comparison indicating at least one match, retrieve a record in the master file having a location corresponding to the value in the matching hash entry; and
computer-readable program code devices configured to cause a computer to compare the search term with the retrieved record.
-
-
42. The computer program product of claim 41, further comprising:
computer-readable program code devices configured to cause a computer, responsive to the computer-readable program code devices configured to cause a computer to compare the search term with the retrieved record indicating a match, to output the pointer for the matching record.
-
43. The computer program product of claim 41, further comprising:
computer-readable program code devices configured to cause a computer, responsive to the computer-readable program code devices configured to cause a computer to compare the search term with the retrieved record indicating a match, to output the matching record.
Specification