Method for storing inverted index, method for on-line updating the same and inverted index mechanism
First Claim
1. A method for storing an inverted index based on an inverted file, the method comprising:
- creating an inverted file in a storage medium for storing the inverted index, the inverted file includes a plurality of fixed-size index blocks, at least one of which includes a plurality of fixed-size index units, wherein each index unit is used to store one piece of index information; and
sequentially storing the index information related to each index item into the created inverted file, wherein the index information related to the same index item is stored in continuous blocks, and the index units in each index block are only used for storing the index information related to the same index item.
1 Assignment
0 Petitions
Accused Products
Abstract
The invention provides a method for storing inverted index based on an inverted file, the method comprising: creating an inverted file in a storage medium for storing the inverted index, the inverted file including a plurality of fixed-size index blocks, each of them including a plurality of fixed-size index units, wherein each index unit is used to store one piece of index information; and sequentially storing the index information related to each index item into the created inverted file, wherein the index information related to the same index item is stored in continuous blocks and the index units in each index block are only for storing index information related to the same index item. Since each index block is used only for storing index information related to the same index item, when performing operations on the index information in an index block, other index items are not affected, therefore, it is possible to on-line update index information in any index block.
57 Citations
9 Claims
-
1. A method for storing an inverted index based on an inverted file, the method comprising:
-
creating an inverted file in a storage medium for storing the inverted index, the inverted file includes a plurality of fixed-size index blocks, at least one of which includes a plurality of fixed-size index units, wherein each index unit is used to store one piece of index information; and
sequentially storing the index information related to each index item into the created inverted file, wherein the index information related to the same index item is stored in continuous blocks, and the index units in each index block are only used for storing the index information related to the same index item. - View Dependent Claims (2)
-
-
3. A method for on-line inserting a new piece of index information in an inverted file, wherein said inverted file includes:
- a plurality of fixed-size index blocks, each of which includes a plurality of fixed-size index units, each index unit being used to store one piece of index information, wherein the index information related to the same index item is stored in continuous index blocks and the index units in each index block are used only for storing the index information related to the same index item, the method comprising the steps of;
extracting a corresponding index item from a new piece of index information to be inserted, and copying index blocks corresponding to the index item into the memory;
setting the on-line updating flag for the index item;
checking whether there is any empty index unit in the index block corresponding to the index item;
if there is, writing the piece of index information into the found empty index unit, otherwise creating a new index block at the end of the inverted file, and writing the piece of index information into the newly created index block and updating information in the block header of the present index block; and
resetting the on-line updating flag for the index item.
- a plurality of fixed-size index blocks, each of which includes a plurality of fixed-size index units, each index unit being used to store one piece of index information, wherein the index information related to the same index item is stored in continuous index blocks and the index units in each index block are used only for storing the index information related to the same index item, the method comprising the steps of;
-
4. A method for on-line deleting a piece of index information in an inverted file, wherein said inverted file includes:
- a plurality of fixed-size index blocks, each of said blocks includes a plurality of fixed-size index units, each index unit is used to store one piece of index information, wherein the index information related to the same index item is stored in continuous index blocks and the index units in each index block are used only for storing the index information related to the same index item, the method comprising the steps of;
extracting a corresponding index item from the piece of index information to be deleted, and copying all index blocks corresponding to the index item into the memory;
setting the on-line updating flag for the index item;
finding the index unit that stores the piece of index information from the index blocks corresponding to the index item, setting the flag bit of the index unit to indicate that the index unit is empty; and
resetting the on-line updating flag for the index item.
- a plurality of fixed-size index blocks, each of said blocks includes a plurality of fixed-size index units, each index unit is used to store one piece of index information, wherein the index information related to the same index item is stored in continuous index blocks and the index units in each index block are used only for storing the index information related to the same index item, the method comprising the steps of;
-
5. A method for on-line defragmenting an inverted file, wherein said inverted file includes:
- a plurality of fixed-size index blocks, at least one said blocks including a plurality of fixed-size index units, each index unit storing one piece of index information, wherein the index information related to the same index item is stored in continuous index blocks and the index units in each index block are used only for storing the index information related to the same index item, the method comprising the steps of;
creating a new inverted file in a storage medium, which has the same format as that of the old inverted file mentioned above;
sequentially processing each index item;
copying all index blocks related to the index item from the old inverted file to the memory;
setting the on-line defragment flag of the index item;
sequentially writing the index blocks related to the index item into the newly created inverted file; and
resetting the on-line defragment flag of the index item; and
stopping the searching service on the old inverted file and beginning the searching service on the new inverted file.
- a plurality of fixed-size index blocks, at least one said blocks including a plurality of fixed-size index units, each index unit storing one piece of index information, wherein the index information related to the same index item is stored in continuous index blocks and the index units in each index block are used only for storing the index information related to the same index item, the method comprising the steps of;
-
6. An inverted index mechanism adapted for on-line updating, the inverted index mechanism comprising:
-
an inverted file, including;
a plurality of fixed-size index blocks, each block including a plurality of fixed-size index units, each index unit being used for storing one piece of index information, wherein, index information related to the same index item is stored in continuous index blocks, and the index units in each index block are only used for storing index information related to the same index item;
a retrieval unit for retrieving documents, based on the keyword input, by means of the inverted file, evaluating the correlation degree between the documents and the query, ranking the results to be output, and returning the searching results to the user; and
an on-line updating unit for on-line inserting/deleting index information into/from the inverted file. - View Dependent Claims (7)
-
-
8. A program product comprising a signal-bearing media tangibly embodying a program of machine-readable instructions executable by a digital processing apparatus to perform a method for storing an inverted index based on an inverted file, the method comprising:
-
creating an inverted file in a storage medium for storing the inverted index, the inverted file includes a plurality of fixed-size index blocks, at least one of which includes a plurality of fixed-size index units, wherein each index unit is used to store one piece of index information; and
sequentially storing the index information related to each index item into the created inverted file, wherein the index information related to the same index item is stored in continuous blocks, and the index units in each index block are only used for storing the index information related to the same index item. - View Dependent Claims (9)
-
Specification