System and Method for Providing a Trustworthy Inverted Index to Enable Searching of Records
First Claim
1. A processor-implemented method of providing an inverted index to enable searching of records, the method comprising:
- processing the records to identify features for indexing in the inverted index;
generating a plurality of posting lists from the records, wherein each of the posting lists corresponds to at least one of the identified features;
maintaining, in a storage cache, a tail of at least one of the posting lists to minimize random I/Os to the inverted index;
determining a desired number of the posting lists based on a desired level of any of an insertion performance, a query performance, and a size of the storage cache;
receiving a query that includes a search feature; and
reading a posting list corresponding to the search feature in the query, in order to identify records that include the search feature.
2 Assignments
0 Petitions
Accused Products
Abstract
A trustworthy inverted index system processes records to identify features for indexing, generates posting lists corresponding to features in a dictionary, maintains in a storage cache a tail of at least one of the posting lists to minimize random I/Os to the index, determines a desired number of the posting lists based on a desired level of insertion performance, a query performance, or a size of the storage cache, and reads a posting list corresponding to a search feature in a query to identify records that comprise the search feature. The system maps the features in the dictionary to the desired number of posting lists. The system uses a jump pointer to point from one entry to the next in the posting lists based on increasing values of entries in the posting lists.
58 Citations
38 Claims
-
1. A processor-implemented method of providing an inverted index to enable searching of records, the method comprising:
-
processing the records to identify features for indexing in the inverted index; generating a plurality of posting lists from the records, wherein each of the posting lists corresponds to at least one of the identified features; maintaining, in a storage cache, a tail of at least one of the posting lists to minimize random I/Os to the inverted index; determining a desired number of the posting lists based on a desired level of any of an insertion performance, a query performance, and a size of the storage cache; receiving a query that includes a search feature; and reading a posting list corresponding to the search feature in the query, in order to identify records that include the search feature. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A method of providing an index to enable searching of records, the method comprising:
-
generating an index structure for an increasing sequence of identifiers; inserting an identifier for a new record into the index structure beginning at a root node of the index structure; wherein inserting the identifier of the new record comprises comparing the identifier of the new record and a reference identifier at the root node of the index structure; and upon a determination that inserting the identifier of the new record is unsuccessful, identifying a target node in the index structure based on a jump pointer that points to the target node, and recursively repeating the step of inserting the identifier of the new record starting at the target node, until the identifier of the new record is successfully inserted into the index structure. - View Dependent Claims (13, 14, 15, 16, 17, 18)
-
-
19. A processor-implemented method of providing an inverted index to enable searching of records, the method comprising:
-
a record ingest module for processing the records to identify features for indexing in the inverted index; an index insert module for generating a plurality of posting lists from the records, wherein each of the posting lists corresponds to at least one of the identified features; a storage cache for maintaining a tail of at least one of the posting lists to minimize random I/Os to the inverted index; a strategy module for determining a desired number of the posting lists based on a desired level of any of an insertion performance, a query performance, and a size of the storage cache; a query interface for receiving a query that includes a search feature; and a query runtime module for reading a posting list corresponding to the search feature in the query, in order to identify records that include the search feature. - View Dependent Claims (20, 21, 22, 23, 24, 25, 26, 27, 28, 29)
-
-
30. A processor-implemented system for providing an index to enable searching of records, the system comprising:
-
an index insert module for generating an index structure for an increasing sequence of identifiers; the index insert module inserting an identifier for a new record into the index structure beginning at a root node of the index structure by comparing the identifier of the new record and a reference identifier at the root node of the index structure; and upon a determination by the index insert module that inserting the identifier of the new record is unsuccessful, the index insert module identifies a target node in the index structure based on a jump pointer that points to a target node, and recursively repeats the insertion of the identifier of the new record starting at the target node, until the identifier of the new record is successfully inserted into the index structure. - View Dependent Claims (31, 32, 33, 34, 35, 36)
-
-
37. A computer product including a plurality of executable instruction codes stored on a computer-usable medium, for providing an inverted index to enable searching of records, the system comprising:
-
a set of instruction codes for processing the records to identify features for indexing in the inverted index; a set of instruction codes for generating a plurality of posting lists from the records, wherein each of the posting lists corresponds to at least one of the identified features; a storage cache for maintaining a tail of at least one of the posting lists to minimize random I/Os to the inverted index; a set of instruction codes for determining a desired number of the posting lists based on a desired level of any of an insertion performance, a query performance, and a size of the storage cache; a set of instruction codes for receiving a query that includes a search feature; and a set of instruction codes for reading a posting list corresponding to the search feature in the query, in order to identify records that include the search feature.
-
-
38. A computer product including a plurality of executable instruction codes stored on a computer-usable medium, for providing an index to enable searching of records, the system comprising:
-
a set of instruction codes for generating an index structure for an increasing sequence of identifiers; a set of instruction codes for inserting an identifier for a new record into the index structure beginning at a root node of the index structure by comparing the identifier of the new record and a reference identifier at the root node of the index structure; and upon a determination that inserting the identifier of the new record is unsuccessful, a set of instruction codes identifies a target node in the index structure based on a jump pointer that points to the target node, and recursively repeats the insertion of the identifier of the new record starting at the target node, until the identifier of the new record is successfully inserted into the index structure.
-
Specification