Compressed document surrogates
First Claim
Patent Images
1. A method for maintaining information about documents in a database, the method comprising:
- (a) creating a compressed document surrogate corresponding to each document that is part of a collection of documents of interest in said database, so as to identify terms occurring in said each document that is part of said collection;
(b) inserting in said compressed document surrogate information about terms which occur in said each document that is part of said collection, and, (c) creating at least one inverted term list corresponding to at least one selected term of interest which occurs in said database, said at least one inverted term list including;
at least one document identifier number to uniquely identify a selected document in said database, and corresponding information indicating how often said selected term of interest occurs in said selected document.
4 Assignments
0 Petitions
Accused Products
Abstract
Disclosed is a method and device for storing information about Web documents such as pages or sites in a manner which may be used in conjunction with inverted term lists to facilitate the retrieval of documents of interest from the Web. The method involves constructing compressed surrogates for documents, such that various operations may be performed without the need to retrieve a copy of the document from the Web. The method permits the efficient updating of inverted term lists when documents on the Web have been modified or deleted, and also permits the efficient processing of search queries in a variety of circumstances.
60 Citations
30 Claims
-
1. A method for maintaining information about documents in a database, the method comprising:
-
(a) creating a compressed document surrogate corresponding to each document that is part of a collection of documents of interest in said database, so as to identify terms occurring in said each document that is part of said collection;
(b) inserting in said compressed document surrogate information about terms which occur in said each document that is part of said collection, and, (c) creating at least one inverted term list corresponding to at least one selected term of interest which occurs in said database, said at least one inverted term list including;
at least one document identifier number to uniquely identify a selected document in said database, and corresponding information indicating how often said selected term of interest occurs in said selected document.- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A device for maintaining information about documents in a database, the device comprising:
-
(a) means for creating a compressed document surrogate corresponding to each document that is part of a collection of documents of interest in said database, so as to identify terms occurring in said each document that is part of said collection;
(b) means for inserting in said compressed document surrogate information about terms which occur in said each document that is part of said collection, and, (c) means for creating at least one inverted term list corresponding to at least one selected term of interest which occurs in said database, said at least one inverted term list including;
at least one document identifier number to uniquely identify a selected document in said database, and corresponding information indicating how often said selected term of interest occurs in said selected document.- View Dependent Claims (17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30)
-
Specification