Query phrasification
First Claim
1. A computer-implemented method comprising:
- decomposing, by at least one processor of a computer system, a search query that includes three or more words into a plurality of candidate phrasifications, including different groupings of words of the search query, each candidate phrasification comprising a disjoint union of component phrases, and each component phrase including at least one word or related word of the search query;
scoring, by at least one of the processors of the computer system, at least two of the candidate phrasifications, wherein the candidate phrasifications include one or more component phrases, and wherein the scoring is based on a probability of occurrence of each of the candidate phrasification'"'"'s component phrases, and is based on the number of component phrases constituting the candidate phrasification, wherein candidate phrasifications having relatively fewer component phrases are weighted higher than candidate phrasifications having relatively more component phrases;
selecting, by at least one of the processors of the computer system and based on scores of the candidate phrasifications, a subset of the candidate phrasification; and
executing a query of a document indexing, by at least one of the processors of the computer system, using the selected subset of candidate phrasifications, wherein the query comprises the component phrases of each selected phrasification.
2 Assignments
0 Petitions
Accused Products
Abstract
An information retrieval system uses phrases to index, retrieve, organize and describe documents. Phrases are extracted from the document collection. Documents are the indexed according to their included phrases, using phrase posting lists. The phrase posting lists are stored in an cluster of index servers. The phrase posting lists can be tiered into groups, and sharded into partitions. Phrases in a query are identified based on possible phrasifications. A query schedule based on the phrases is created from the phrases, and then optimized to reduce query processing and communication costs. The execution of the query schedule is managed to further reduce or eliminate query processing operations at various ones of the index servers.
-
Citations
40 Claims
-
1. A computer-implemented method comprising:
-
decomposing, by at least one processor of a computer system, a search query that includes three or more words into a plurality of candidate phrasifications, including different groupings of words of the search query, each candidate phrasification comprising a disjoint union of component phrases, and each component phrase including at least one word or related word of the search query; scoring, by at least one of the processors of the computer system, at least two of the candidate phrasifications, wherein the candidate phrasifications include one or more component phrases, and wherein the scoring is based on a probability of occurrence of each of the candidate phrasification'"'"'s component phrases, and is based on the number of component phrases constituting the candidate phrasification, wherein candidate phrasifications having relatively fewer component phrases are weighted higher than candidate phrasifications having relatively more component phrases; selecting, by at least one of the processors of the computer system and based on scores of the candidate phrasifications, a subset of the candidate phrasification; and executing a query of a document indexing, by at least one of the processors of the computer system, using the selected subset of candidate phrasifications, wherein the query comprises the component phrases of each selected phrasification. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
-
-
21. A system comprising:
-
a computer program product stored on a tangible computer readable medium and comprising instructions that when executed cause a computer system to; decompose a search query that includes three or more words into a plurality of candidate phrasifications, including different groupings of words of the search query, each candidate phrasification comprising a disjoint union of component phrases, and each component phrase including at least one word or related word of the search query; score at least two of the candidate phrasifications, wherein the candidate phrasifications include one or more component phrases, and wherein the scoring is based on a probability of occurrence of each of the candidate phrasification'"'"'s component phrases, and is based on the number of component phrases constituting the candidate phrasification, wherein candidate phrasifications having relatively fewer component phrases are weighted higher than candidate phrasifications having relatively more component phrases; select, based on the scores of the candidate phrasifications, at least one candidate phrasification, select, based on the scores of the candidate phrasifications, a subset of the candidate phrasifications; and execute a query of a document indexing using the selected subset of candidate phrasifications, wherein the query comprises the component phrases of each selected phrasification; and one or more processors configured for executing the instructions. - View Dependent Claims (22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40)
-
Specification