Word deletion for searches
First Claim
1. In a computerized search system which queries are submitted by users who receive, in response, a list of documents selected from a corpus of documents wherein the list comprises documents deemed responsive to a user'"'"'s query, a method of preprocessing the query comprising:
- obtaining a base query from a user, wherein the base query comprises a plurality of words;
determining a base distribution of nodes of a taxonomy that have non-zero probabilities of being relevant to the base query, wherein the taxonomy is a taxonomy of topics into which documents of the corpus of documents might be assigned;
modifying the base query to form a truncated query when it is determined that the base query will return no results, wherein modifying the base query to form the truncated query comprises;
identifying word pairs in the base query,determining pair distributions for word pairs over the taxonomy,selecting a desired word pair based in part on the pair distributions,generating a first query by omitting from the base query a first word of the desired word pair,generating a second query by omitting from the base query a second word from the desired word pair,determining a first count of documents corresponding to the first query,determining a second count of documents corresponding to the second query, anddetermining at least one word to remove from the base query based on the first and second counts such that the truncated query comprises a portion of the base query from which the at least one word is removed;
running the truncated query against the corpus of documents to obtain a results list of one or more documents in the document corpus deemed responsive to the truncated query; and
outputting the results list as the list comprising documents deemed responsive to the user'"'"'s query.
16 Assignments
0 Petitions
Accused Products
Abstract
A searcher can be configured to improve search results through the use of intelligent word deletion. A search auto categorizer (SAC) operates on the original query and returns a list of leaf categories and a distribution of probabilities among the leaf categories. The original query is parsed into word pairs and each word pair is run through the search engine. The search results for each word pair are weighted by the leaf category probabilities. A word pair is selected from the results and one of the two words is deleted from the original query. The searcher can perform exhaustive deletion where multiple truncated queries are generated from the original query and the results list from one truncated query is returned as the results list. The searcher can build up a truncated query from the original query by iteratively appending a word selected from the original query to the truncated query.
-
Citations
5 Claims
-
1. In a computerized search system which queries are submitted by users who receive, in response, a list of documents selected from a corpus of documents wherein the list comprises documents deemed responsive to a user'"'"'s query, a method of preprocessing the query comprising:
-
obtaining a base query from a user, wherein the base query comprises a plurality of words; determining a base distribution of nodes of a taxonomy that have non-zero probabilities of being relevant to the base query, wherein the taxonomy is a taxonomy of topics into which documents of the corpus of documents might be assigned; modifying the base query to form a truncated query when it is determined that the base query will return no results, wherein modifying the base query to form the truncated query comprises; identifying word pairs in the base query, determining pair distributions for word pairs over the taxonomy, selecting a desired word pair based in part on the pair distributions, generating a first query by omitting from the base query a first word of the desired word pair, generating a second query by omitting from the base query a second word from the desired word pair, determining a first count of documents corresponding to the first query, determining a second count of documents corresponding to the second query, and determining at least one word to remove from the base query based on the first and second counts such that the truncated query comprises a portion of the base query from which the at least one word is removed; running the truncated query against the corpus of documents to obtain a results list of one or more documents in the document corpus deemed responsive to the truncated query; and outputting the results list as the list comprising documents deemed responsive to the user'"'"'s query. - View Dependent Claims (2, 3, 4)
-
-
5. In a computerized search system which queries are submitted by users who receive, in response, a list of documents selected from a corpus of documents wherein the list comprises documents deemed responsive to a user'"'"'s query, a method of preprocessing the query comprising:
-
obtaining a base query from a user input, wherein the base query comprises a plurality of words; determining a base distribution of nodes of a taxonomy of the computerized search system having non-zero probabilities of being relevant to the base query; forming a plurality of truncated queries each comprising a distinct pair of words from the plurality of words; running the truncated queries at the base distribution of nodes to obtain a results list comprising a count of documents deemed responsive to each truncated query in one or more categories of the taxonomy; selecting one of the truncated queries based on the count of the documents in its corresponding results list; modifying the base query by removing from the base query a first word of the pair of words of the selected truncated query and running the modified base query to obtain a first results list; modifying the base query by removing from the base query a second word of the pair of words of the selected truncated query and running the modified base query to obtain a second results list; and outputting the first results list or the second results list as the list comprising documents deemed responsive to the user'"'"'s query based on an optimization policy of the computerized search system.
-
Specification