Infrequent word index for document indexes
First Claim
Patent Images
1. For use with a search engine that processes user queries, a system that locates documents containing words corresponding to a user query comprising;
- an infrequent word identifier that identifies infrequent words that occur in less than a threshold number of documents;
a frequent word index that maps the location of documents that contain words that occur in more than the threshold number of documents;
an infrequent word index, maintained separately from the frequent word index, that maps the location of documents that contain infrequent words;
an index scanning component that, in response to a query containing an infrequent word, scans the infrequent word index to find the location of documents containing the infrequent word.
3 Assignments
0 Petitions
Accused Products
Abstract
A document indexing system utilizes two indexes. An infrequent word index is maintained separately from a frequent word index to map the locations of words that occur infrequently in the indexed documents. The infrequent word index may be stored and partitioned differently than the frequent word index to promote efficiency.
-
Citations
27 Claims
-
1. For use with a search engine that processes user queries, a system that locates documents containing words corresponding to a user query comprising;
-
an infrequent word identifier that identifies infrequent words that occur in less than a threshold number of documents;
a frequent word index that maps the location of documents that contain words that occur in more than the threshold number of documents;
an infrequent word index, maintained separately from the frequent word index, that maps the location of documents that contain infrequent words;
an index scanning component that, in response to a query containing an infrequent word, scans the infrequent word index to find the location of documents containing the infrequent word. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 23)
-
-
13. For use with a search engine that processes user queries, a method that searches a set of documents for documents containing terms found in a user query comprising:
-
scanning the set of documents and gathering infrequent words that occur a number of times that is less than a threshold amount;
constructing an infrequent word index that maps infrequent words to locations of documents that contain the words;
constructing a frequent word index, separately maintained from the infrequent word index, that maps frequent words that occur a number of times that is greater than the threshold amount to locations of documents that contain the words; and
examining the terms in the user query to identify any terms are infrequent words; and
searching the infrequent word index for the terms that are identified as infrequent words. - View Dependent Claims (14, 15, 16, 17, 18)
-
-
19. For use with a search engine that processes user queries, a computer readable medium comprising computer-executable instructions for locating documents containing words corresponding to a user query by:
-
identifying infrequent words that occur in less than a threshold number of documents;
mapping the location of documents that contain words that occur in more than the threshold number of documents in a frequent word index;
maintaining, separately from the frequent word index, an infrequent word index that maps the location of documents that contain infrequent words;
in response to a query containing an infrequent word, scanning the infrequent word index to find the location of documents containing the infrequent word. - View Dependent Claims (20, 21, 22, 24, 25, 26)
-
-
27. For use with a search engine that processes user queries, an apparatus for searching set of documents for documents containing terms found in a user query comprising:
-
means for scanning the set of documents and gathering infrequent words that occur a number of times that is less than a threshold amount;
means for constructing an infrequent word index that maps infrequent words to locations of documents that contain the words;
means for constructing a frequent word index, separately maintained from the infrequent word index, that maps frequent words that occur a number of times that is greater than the threshold amount to locations of documents that contain the words; and
means for examining the terms in the user query to identify any terms are infrequent words; and
means for searching the infrequent word index for the terms that are identified as infrequent words.
-
Specification