Query phrasification
First Claim
1. A method comprising:
- decomposing a search query that includes three or more words into a plurality of candidate phrasifications, each candidate phrasification having a different grouping of component phrases, a component phrase including a non-zero quantity of the words and each candidate phrasification including all of the words of the search query;
scoring one or more candidate phrasifications, a candidate phrasification being scored by applying a scoring model, the scoring model based on a number of component phrases in the phrasification, a probability of occurrence of each of the component phrases in the candidate phrasification from a valid phrase table, and parameters for adjusting precision and recall of the candidate phrasification, the parameters including a first parameter to adjust precision of candidate phrasifications and a second parameter to adjust bias against obtaining too many phrases;
selecting at least one highest scoring candidate phrasification; and
executing the at least one highest scoring candidate phrasification against an index that includes posting lists for phrases, the executing identifying documents associated with each of the component phrases of the highest scoring candidate phrasification.
3 Assignments
0 Petitions
Accused Products
Abstract
An information retrieval system uses phrases to index, retrieve, organize and describe documents. Phrases are extracted from the document collection. Documents are the indexed according to their included phrases, using phrase posting lists. The phrase posting lists are stored in an cluster of index servers. The phrase posting lists can be tiered into groups, and sharded into partitions. Phrases in a query are identified based on possible phrasifications. A query schedule based on the phrases is created from the phrases, and then optimized to reduce query processing and communication costs. The execution of the query schedule is managed to further reduce or eliminate query processing operations at various ones of the index servers.
220 Citations
12 Claims
-
1. A method comprising:
-
decomposing a search query that includes three or more words into a plurality of candidate phrasifications, each candidate phrasification having a different grouping of component phrases, a component phrase including a non-zero quantity of the words and each candidate phrasification including all of the words of the search query; scoring one or more candidate phrasifications, a candidate phrasification being scored by applying a scoring model, the scoring model based on a number of component phrases in the phrasification, a probability of occurrence of each of the component phrases in the candidate phrasification from a valid phrase table, and parameters for adjusting precision and recall of the candidate phrasification, the parameters including a first parameter to adjust precision of candidate phrasifications and a second parameter to adjust bias against obtaining too many phrases; selecting at least one highest scoring candidate phrasification; and executing the at least one highest scoring candidate phrasification against an index that includes posting lists for phrases, the executing identifying documents associated with each of the component phrases of the highest scoring candidate phrasification. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
Specification