System and method for providing a trustworthy inverted index to enable searching of records
First Claim
1. A processor-implemented method of providing an inverted index to enable searching of records, the method comprising:
- processing the records to identify features for indexing in the inverted index;
generating a plurality of posting lists from the records, wherein each of the plurality of posting lists corresponds to at least one of the identified features;
maintaining, in a storage cache, a tail block of at least one of the plurality of posting lists to minimize random Input/Output to the inverted index;
creating a merged posting list by moving posting lists smaller than one cache block into a combined posting list for reducing the size of the inverted index with a reduction in the plurality of posting lists;
storing the plurality of posting lists in a write-once-read-many (WORM) storage;
storing tail blocks removed from the storage cache into one of the plurality of posting lists in the WORM storage;
maintaining an encoding of the identified features for the inverted index and a record identifier in each entry of the merged posting list;
determining a desired number of the plurality of posting lists based on a desired level of any of an insertion performance, a query performance, and a size of the storage cache;
receiving a query that includes a search feature;
ranking records in the plurality of posting lists for answering the query to the plurality of posting lists;
reading a posting list corresponding to the search feature in the query, in order to identify records that include the search feature; and
displaying the plurality of posting lists on a display device.
2 Assignments
0 Petitions
Accused Products
Abstract
A trustworthy inverted index system processes records to identify features for indexing, generates posting lists corresponding to features in a dictionary, maintains in a storage cache a tail of at least one of the posting lists to minimize random I/Os to the index, determines a desired number of the posting lists based on a desired level of insertion performance, a query performance, or a size of the storage cache, and reads a posting list corresponding to a search feature in a query to identify records that comprise the search feature. The system maps the features in the dictionary to the desired number of posting lists. The system uses a jump pointer to point from one entry to the next in the posting lists based on increasing values of entries in the posting lists.
-
Citations
38 Claims
-
1. A processor-implemented method of providing an inverted index to enable searching of records, the method comprising:
-
processing the records to identify features for indexing in the inverted index; generating a plurality of posting lists from the records, wherein each of the plurality of posting lists corresponds to at least one of the identified features; maintaining, in a storage cache, a tail block of at least one of the plurality of posting lists to minimize random Input/Output to the inverted index; creating a merged posting list by moving posting lists smaller than one cache block into a combined posting list for reducing the size of the inverted index with a reduction in the plurality of posting lists; storing the plurality of posting lists in a write-once-read-many (WORM) storage; storing tail blocks removed from the storage cache into one of the plurality of posting lists in the WORM storage; maintaining an encoding of the identified features for the inverted index and a record identifier in each entry of the merged posting list; determining a desired number of the plurality of posting lists based on a desired level of any of an insertion performance, a query performance, and a size of the storage cache; receiving a query that includes a search feature; ranking records in the plurality of posting lists for answering the query to the plurality of posting lists; reading a posting list corresponding to the search feature in the query, in order to identify records that include the search feature; and displaying the plurality of posting lists on a display device. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A computer-implemented method of providing an index to enable searching of records, the method comprising:
-
generating an index structure for an increasing sequence of record identifiers and index features; generating posting lists with the record identifiers and frequency of the index features; creating a merged posting list by moving posting lists smaller than one block into combined posting lists for reducing the size of the index with a reduction in the number of the posting lists; searching the combined posting lists in response to a query searching the records; inserting an identifier for a new record into the index structure beginning at a root node of the index structure; wherein inserting the identifier of the new record comprises comparing the identifier of the new record and a reference identifier at the root node of the index structure; and upon a determination that inserting the identifier of the new record is unsuccessful, identifying a target node in the index structure based on a jump pointer that points to the target node, and recursively repeating the step of inserting the identifier of the new record starting at the target node, until the identifier of the new record is successfully inserted into the index structure; and displaying the combined posting lists on a display device. - View Dependent Claims (13, 14, 15, 16, 17, 18)
-
-
19. A processor-implemented system for providing an inverted index to enable searching of records, the system comprising:
-
a record ingest module for processing the records to identify features for indexing in the inverted index; an index insert module for generating a plurality of posting lists from the records, wherein each of the plurality of posting lists corresponds to at least one of the identified features; a storage cache for maintaining a tail block of at least one of the plurality of posting lists to minimize random Input/Output to the inverted index; a merge module for creating a merged posting list by moving posting lists smaller than one cache block into a combined posting list for reducing the size of the inverted index with a reduction in the plurality of posting lists, and for maintaining an encoding of the identified features for the inverted index and a record identifier in each entry of the merged posting list; a write-once-read-many (WORM) storage for storing the plurality of posting lists, and storing tail blocks removed from the storage cache into one of the plurality of posting lists; a strategy module for determining a desired number of the plurality of posting lists based on a desired level of any of an insertion performance, a query performance, and a size of the storage cache; a query interface for receiving a query that includes a search feature; a ranking module for ranking records in the plurality of posting lists for answering the query to the plurality of posting lists; a query runtime module for reading a posting list corresponding to the search feature in the query, in order to identify records that include the search feature; and a display module for displaying the plurality of posting lists on a display device. - View Dependent Claims (20, 21, 22, 23, 24, 25, 26, 27, 28, 29)
-
-
30. A computer-implemented system for providing an index to enable searching of records, the system comprising:
-
an index insert module including a computer processor for generating an index structure for an increasing sequence of record identifiers and index features; a generation module for generating posting lists with the record identifiers and frequency of the index features; a merge module for creating a merged posting list by moving posting lists smaller than one block into combined posting lists for reducing the size of the index with a reduction in the number of the posting lists; a search module for searching the combined posting lists in response to a query searching the records; the index insert module inserting an identifier for a new record into the index structure beginning at a root node of the index structure by comparing the identifier of the new record and a reference identifier at the root node of the index structure; upon a determination by the index insert module that inserting the identifier of the new record is unsuccessful, the index insert module identifies a target node in the index structure based on a jump pointer that points to a target node, and recursively repeats the insertion of the identifier of the new record starting at the target node, until the identifier of the new record is successfully inserted into the index structure; and a display module for displaying the combined posting lists on a display device. - View Dependent Claims (31, 32, 33, 34, 35, 36)
-
-
37. A computer product for providing an inverted index to enable searching of records, the computer program product comprising:
-
a computer readable storage medium having computer readable program code embodied therewith, the computer readable program code comprising; computer readable program code configured to process the records to identify features for indexing in the inverted index; computer readable program code configured to generate a plurality of posting lists from the records, wherein each of the posting lists corresponds to at least one of the identified features; computer readable program code configured to maintain in a storage cache a tail block of at least one of the plurality of posting lists to minimize random Input/Output to the inverted index; computer readable program code configured to create a merged posting list by moving posting lists smaller than one cache block into a combined posting list for reducing the size of the inverted index with a reduction in the plurality of posting lists; computer readable program code configured to store the plurality of posting lists in a write-once-read-many (WORM) storage; computer readable program code configured to store tail blocks removed from the storage cache into one of the plurality of posting lists in the WORM storage; computer readable program code configured to maintain an encoding of the identified features for the inverted index and a record identifier in each entry of the merged posting list; computer readable program code configured to determine a desired number of the plurality of posting lists based on a desired level of any of an insertion performance, a query performance, and a size of the storage cache; computer readable program code configured to receive a query that includes a search feature; computer readable program code configured to rank records in the plurality of posting lists for answering the query to the plurality of posting lists; computer readable program code configured to read a posting list corresponding to the search feature in the query, in order to identify records that include the search feature; and computer readable program code configured to display the plurality of posting lists on a display device.
-
-
38. A computer product for providing an index to enable searching of records, the computer program product comprising:
-
a computer readable storage medium having computer readable program code embodied therewith, the computer readable program code comprising; computer readable program code configured to generate an index structure for an increasing sequence of record identifiers and index features; computer readable program code configured to generate posting lists with the record identifiers and frequency of the index features; computer readable program code configured to create a merged posting list by moving posting lists smaller than one block into combined posting lists for reducing the size of the index with a reduction in the number of the posting lists; computer readable program code configured to search the combined posting lists in response to a query searching the records; computer readable program code configured to insert an identifier for a new record into the index structure beginning at a root node of the index structure by comparing the identifier of the new record and a reference identifier at the root node of the index structure; and upon a determination that inserting the identifier of the new record is unsuccessful, computer readable program code identifies a target node in the index structure based on a jump pointer that points to the target node, and recursively repeats the insertion of the identifier of the new record starting at the target node, until the identifier of the new record is successfully inserted into the index structure; and computer readable program code configured to display the combined posting lists on a display device.
-
Specification