Fast ranked full-text searching
First Claim
1. A method for use in a computing system containing a database of available documents that are to be searched using one or more key terms, the method searching the database to identify a plurality of documents for subsequent presentation in a ranked order, and comprising:
- using a ranking algorithm on documents identified by an inverted index as containing one or more key terms to create a special list for at least one key term, the special list for the at least one key term ranking a predetermined number of highest ranked documents identified in the inverted index as containing the at least one key term, wherein the ranking algorithm ranks documents in the special list according to a form consisting of Rank=(tf/K+tf), where K=dl/avdl, and wherein tf is occurrence of a key term in a document, dl is document length, and avdl is an average document length;
appending the special list to the inverted index and storing the inverted index on one or more computer readable storage medium, the inverted index having separate portions for each key term, and the portion for the at least one key term including;
a listing of documents that contain the at least one key term;
within the listing of documents, occurrence data for each of those documents; and
the special list for the at least one key term, the special list being separate from the listing of documents;
at a server computing device, receiving a user request for a search of the at least one key term, the user request also including a speed and thoroughness level for searching for the at least one key term;
at the server computing device, accessing the inverted index and the portion of the inverted index corresponding to the at least one key term;
at the server computing device, searching one or both of the inverted index and the special list for documents containing the at least one key term, such that;
when the speed and thoroughness level is a minimum thoroughness, only the special list is searched;
when the speed and thoroughness level is a maximum thoroughness, all relevant entries in the listing of documents in the inverted index are searched; and
when the speed and thoroughness level is between the minimum and maximum thoroughness, the special list is searched along with only a portion of relevant entries in the listing of documents within the inverted index.
2 Assignments
0 Petitions
Accused Products
Abstract
Special lists can be used to perform fast ranked searching of documents containing key terms. The special lists are distinguished from basic inverted indices because they contain a ranking of only a predetermined number of documents that may be identified in the index. During a search, search engines can utilize the special lists to perform fast ranked searching without having to redundantly search through the entire corpus or index of documents available to the search engine. Rather, the search engine can search only the documents listed in the special list, thereby saving the time and resources required to perform the search. The search engine can also be configured to search a combination of the special lists and the index to provide users selective control over the balance between the accuracy and speed of the search.
9 Citations
24 Claims
-
1. A method for use in a computing system containing a database of available documents that are to be searched using one or more key terms, the method searching the database to identify a plurality of documents for subsequent presentation in a ranked order, and comprising:
-
using a ranking algorithm on documents identified by an inverted index as containing one or more key terms to create a special list for at least one key term, the special list for the at least one key term ranking a predetermined number of highest ranked documents identified in the inverted index as containing the at least one key term, wherein the ranking algorithm ranks documents in the special list according to a form consisting of Rank=(tf/K+tf), where K=dl/avdl, and wherein tf is occurrence of a key term in a document, dl is document length, and avdl is an average document length; appending the special list to the inverted index and storing the inverted index on one or more computer readable storage medium, the inverted index having separate portions for each key term, and the portion for the at least one key term including; a listing of documents that contain the at least one key term; within the listing of documents, occurrence data for each of those documents; and the special list for the at least one key term, the special list being separate from the listing of documents; at a server computing device, receiving a user request for a search of the at least one key term, the user request also including a speed and thoroughness level for searching for the at least one key term; at the server computing device, accessing the inverted index and the portion of the inverted index corresponding to the at least one key term; at the server computing device, searching one or both of the inverted index and the special list for documents containing the at least one key term, such that; when the speed and thoroughness level is a minimum thoroughness, only the special list is searched; when the speed and thoroughness level is a maximum thoroughness, all relevant entries in the listing of documents in the inverted index are searched; and when the speed and thoroughness level is between the minimum and maximum thoroughness, the special list is searched along with only a portion of relevant entries in the listing of documents within the inverted index. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
-
-
17. A computer program product for use in a computing system containing a database of available documents that is to be searched using one or more key terms, the computer program product enabling the computing system to search the database to identify a plurality of documents for subsequent presentation in a ranked order, the computer program product comprising:
one or more computer-readable storage media having computer-executable instructions encoded thereon for causing a computing system to; apply a ranking algorithm on documents identified by an inverted index as containing one or more key terms to create a special list for at least one key term, the special list for the at least one key term ranking a predetermined number of highest ranked documents identified in the inverted index as containing the at least one key term, wherein the ranking algorithm ranks documents in the special list according to a form consisting of;
Rank=(tf/K+tf), where K=dl/avdl, and wherein tf is occurrence of a key term in a document, dl is document length, and avdl is an average document length;append the special list to the inverted index, the inverted index having separate portions for each key term and the portion for the at least one key term including; a listing of documents that contain the at least one key term; within the listing of documents, occurrence data for each of those documents; and the special list for the at least one key term, the special list being separate from the listing of documents; receive a user request for a search of the at least one key term, the user request also including a speed and thoroughness level for searching for the at least one key term; access the inverted index and the portion of the inverted index corresponding to the at least one key term; search one or both of the inverted index and the special list for documents containing the at least one key term, such that; when the speed and thoroughness level is a minimum thoroughness, only the special list is searched; when the speed and thoroughness level is a maximum thoroughness, all relevant entries in the listing of documents in the inverted index are searched; and when the speed and thoroughness level is between the minimum and maximum thoroughness, the special list is searched along with only a portion of relevant entries in the listing of documents in the inverted index. - View Dependent Claims (18, 19, 20, 21, 22, 23)
-
24. A method for searching a database of available documents for later presentation to a user in a ranked order, the method comprising:
-
at a web-based search engine on a server computing device, receiving a user-initiated search request that identifies one or more key terms to be searched for in the database of available documents; at the web-based search engine on a server computing device, receiving a user indication of a speed and thoroughness level for searching the database of available documents for the one or more key terms; accessing an inverted index stored on computer-readable storage media, the inverted index having terms found within documents in the database, the inverted index having separate portions for each term and each portion of a respective term including; a listing of those documents that contain the respective term; within the listing of those documents that contain the respective term, occurrence data for each of those documents; and a special list specific to the respective term and appended to the inverted index apart from the listing of those documents that contain the respective term, the special list identifying a predetermined number of highest ranked documents from the listing, the special list ranking corresponding documents according to a form consisting of Rank=(tf/K+tf), where K=dl/avdl, and wherein tf is occurrence of a key term in a document, dl is document length, and avdl is an average document length; and accessing inverted index portions for terms corresponding to each of the one or more key terms, and; when the user indication of the speed and thoroughness level is a first thoroughness, searching only the special lists corresponding to the one or more key terms; when the user indication of the speed and thoroughness level is a second thoroughness, searching all relevant entries in the inverted index; and when the user indication of the speed and thoroughness level is a third thoroughness between the fist thoroughness and the second thoroughness, searching the special lists corresponding to the one or more key terms and only a portion of other relevant entries in the inverted index.
-
Specification