Method and system for retrieving relevant documents from a database
First Claim
1. A method for ranking a plurality of candidate documents on the basis of the similarity each of the plurality of candidate documents to a user-query, said method comprising the steps ofparsing the user-query, thereby generating a query-word and a distribution of the query-word in the user-query, assessing an importance of the query-word on the basis of the frequency with which the query-word occurs in the plurality of candidate documents, and the distribution of the query-word in the user-query, evaluating the similarity of a candidate document to the user-query on the basis of a distribution of the query-word in the candidate document, the candidate document having at least one candidate document sentence, evaluating the similarity of the at least one candidate document sentence to the user-query on the basis of the frequency with which the query-word occurs in the at least one candidate document sentence, ranking the candidate document relative to the plurality of candidate documents on the basis of the similarity of the candidate document to the user-query and the similarity of the at least one candidate document sentence to the user-query.
1 Assignment
0 Petitions
Accused Products
Abstract
A method for processing a search query uses the results of a search performed on a high quality, controlled database to assess the relevance of documents retrieved from a search of an uncontrolled public database having documents of highly variable quality. The method includes the steps of parsing the search query and then searching the authoritative database to generate authoritative database results. The search query is also used to search the public database, thereby generating public database results. The quality or relevance of the public database results are then quantified on the basis of the authoritative database results, thereby generating a quality index. The results from both the authoritative and the public databases are then ranked on the basis of this quality index.
-
Citations
47 Claims
-
1. A method for ranking a plurality of candidate documents on the basis of the similarity each of the plurality of candidate documents to a user-query, said method comprising the steps of
parsing the user-query, thereby generating a query-word and a distribution of the query-word in the user-query, assessing an importance of the query-word on the basis of the frequency with which the query-word occurs in the plurality of candidate documents, and the distribution of the query-word in the user-query, evaluating the similarity of a candidate document to the user-query on the basis of a distribution of the query-word in the candidate document, the candidate document having at least one candidate document sentence, evaluating the similarity of the at least one candidate document sentence to the user-query on the basis of the frequency with which the query-word occurs in the at least one candidate document sentence, ranking the candidate document relative to the plurality of candidate documents on the basis of the similarity of the candidate document to the user-query and the similarity of the at least one candidate document sentence to the user-query.
-
7. A computer-readable medium containing software for ranking a plurality of candidate documents on the basis of the similarity each of the plurality of candidate documents to a user-query, said software comprising instruction for executing the steps of
parsing the user-query, thereby generating a query-word and a distribution of the query-word in the user-query, assessing an importance of the query-word on the basis of the frequency with which the query-word occurs in the plurality of candidate documents, and the distribution of the query-word in the user-query, evaluating the similarity of a candidate document to the user-query on the basis of a distribution of the query-word in the candidate document, the candidate document having at least one candidate document sentence, evaluating the similarity of the at least one candidate document sentence to the user-query on the basis of the frequency with which the query-word occurs in the at least one candidate document sentence, ranking the candidate document relative to the plurality of candidate documents on the basis of the similarity of the candidate document to the user-query and the similarity of the at least one candidate document sentence to the user-query.
-
20. A computer-implemented method for generating a list of candidate documents, said method comprising the steps of
evaluating, on the basis of a search query, a relevance of an authoritative document selected from an authoritative database, evaluating, on the basis of the search query, a relevance of a public document from a public database, and including the public document in the list of candidate documents if the relevance of the public document exceeds a relevance threshold selected on the basis of the relevance of the authoritative document.
-
34. A computer-readable medium containing software for generating a list of candidate documents, the software comprising instructions for executing the steps of
evaluating, on the basis of a search query, a relevance of an authoritative document selected from an authoritative database, evaluating, on the basis of the search query, a relevance of a public document, and including the public document in the list of candidate documents if the relevance of the public document exceeds a relevance threshold selected on the basis of the relevance of the authoritative document.
Specification