Methods and systems for indexing references to documents of a database and for locating documents in the database
First Claim
1. A method for locating documents of a database that contain search terms, the method comprising:
- receiving a search term, at the database, from a client, the search term being associated with a posting list, the posting list being arranged in blocks, each block comprising a header and M truncated references, the M being an integer, each block having been compressed into a compressed block by encoding content of each block using an encoding pattern, the encoding pattern for each block having been determined based on values of the M truncated references in each block;
reading a pointer from a header of a current block of the posting list;
using the pointer to extract a decoding protocol from a decoding protocol table, wherein the decoding protocol defines the encoding pattern for the current block, the encoding pattern of the current block comprises;
a base length b of M truncated references in the current block;
a number n of patches in the current block;
if n>
0, one or more patch values vk of the current block, wherein k is in a range from 1 to n;
if n>
0, one or more patch positions pk in the current block, wherein pk is in a range of 0 to M−
1;
decompressing the current block based on the decoding protocol by;
reading, in the current block, b·
M bits comprising the M truncated references, wherein M−
n of the M truncated references are database references; and
if n>
0, for each patch from k=1 to n, calculating an expanded patch value as vk·
2b and adding the expanded patch value to a pkth of the M truncated reference numbered from 0 to M−
1 for providing n additional database references; and
the decompressing comprises identifying M database references based on the decoding protocol and the M truncated references within the current block.
3 Assignments
0 Petitions
Accused Products
Abstract
Methods and systems allow indexing references to documents of a database according to database reference profiles. Documents may then be located in the database using decoding protocols based on the database reference profiles. To this end, the documents are stored in the database and searchable terms extracted therefrom are associated with posting lists. Each posting list is divided into blocks of M database references. The blocks are encoded according to a pattern that depends on the M database references. A corresponding pointer to a table of encoding patterns is appended to each block. When a query is received for a searchable term, blocks are extracted from a posting list corresponding to the searchable term and a pointer for each block is used to extract a decoding protocol related to an encoding pattern for the block.
18 Citations
10 Claims
-
1. A method for locating documents of a database that contain search terms, the method comprising:
-
receiving a search term, at the database, from a client, the search term being associated with a posting list, the posting list being arranged in blocks, each block comprising a header and M truncated references, the M being an integer, each block having been compressed into a compressed block by encoding content of each block using an encoding pattern, the encoding pattern for each block having been determined based on values of the M truncated references in each block; reading a pointer from a header of a current block of the posting list; using the pointer to extract a decoding protocol from a decoding protocol table, wherein the decoding protocol defines the encoding pattern for the current block, the encoding pattern of the current block comprises; a base length b of M truncated references in the current block; a number n of patches in the current block; if n>
0, one or more patch values vk of the current block, wherein k is in a range from 1 to n;if n>
0, one or more patch positions pk in the current block, wherein pk is in a range of 0 to M−
1;decompressing the current block based on the decoding protocol by; reading, in the current block, b·
M bits comprising the M truncated references, wherein M−
n of the M truncated references are database references; andif n>
0, for each patch from k=1 to n, calculating an expanded patch value as vk·
2b and adding the expanded patch value to a pkth of the M truncated reference numbered from 0 to M−
1 for providing n additional database references; andthe decompressing comprises identifying M database references based on the decoding protocol and the M truncated references within the current block. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A database system for locating documents that contain search terms, the system comprising:
-
a database for storing a plurality of documents; a memory containing; a decoding protocol table; and an inverted index comprising a plurality of posting lists, each posting list being arranged in blocks, each block comprising a header and M truncated binary references, M being an integer, each block having been compressed into a compressed block by encoding content of each block using an encoding pattern, during the encoding of the content of each block M binary database references having been truncated such that to become corresponding M truncated binary references, each one of the M truncated binary references being a portion of a corresponding one of the M binary database references, each one of the M binary database references being a binary representation of one of (i) a document number of a corresponding document of the database and (ii) a delta reference of the corresponding document of the database, the encoding pattern for each block having been determined based on values of the M truncated binary references in each block; an input for receiving a search term from a client; and a processor, operably connected to the input, to the database and to the memory, for; selecting in the memory a posting list associated with the search term; reading a pointer from a header of a current block of the posting list; and using the pointer to extract a decoding protocol from the decoding protocol table, wherein the decoding protocol defines the encoding pattern for the current block, the encoding pattern for the current block having been determined based on values of the M truncated binary references within the current block, the encoding pattern of the current block comprises; a base length b of M binary truncated references in the current block; a number n of patches in the current block; if n>
0, one or more patch values vk of the current block, wherein k is in a range from 1 to n;if n>
0, one or more patch positions pk in the current block, wherein pk is in a range of 0 to M−
1 ; anddecompressing the current block based on the decoding protocol by; reading, in the current block, b·
M bits comprising the M binary truncated references, wherein M−
n of the M binary truncated references are database references; andif n>
0, for each patch from k=1 to n, calculating an expanded patch value as vk·
2b and adding the expanded patch value to a pkth of the M binary truncated reference numbered from 0 to M−
1 for providing n additional database references,the decompressing comprises identifying M corresponding binary database references based on the decoding protocol and the M truncated binary references within the current block.
-
Specification