Predictive Stemming for Web Search with Statistical Machine Translation Models
First Claim
1. A method, comprising:
- receiving a particular query comprising a plurality of words;
determining stems to at least one of the words in the particular query;
based on the stems of the plurality words in the particular query, determining whether one or more stems of particular words in the particular query occurs in a dictionary comprising one or more transformations based upon stems of words;
selecting, from the dictionary, one or more transformations of the one or more stems of the particular words;
generating at least one candidate query that includes a transformation of one or more particular words;
computing a value for each candidate query;
selecting at least one candidate query to execute based upon the computed value for each candidate query;
executing the particular query and the at least one selected candidate query to generate search results across a plurality of documents; and
displaying at least a portion of the search results.
9 Assignments
0 Petitions
Accused Products
Abstract
Techniques for determining when and how to transform words in a query to return the most relevant search results while minimizing computational overhead are provided. A dictionary is generated based upon words used in a specified number of previous most frequent search queries and comprises lists of transformations that may include variants based upon the stems of words, synonyms, and abbreviation expansions. When a query is received from a user, candidate queries are generated based upon replacing particular words in the query with a transformation of the particular words. Candidate queries are selected that have a high probability of returning relevant results by computing values of the query using language model scoring and translation scoring. The selected candidate queries and the original query are executed to return search results. The search results are displayed to the user with the words in the original query and the transformed words in bold.
214 Citations
36 Claims
-
1. A method, comprising:
-
receiving a particular query comprising a plurality of words; determining stems to at least one of the words in the particular query; based on the stems of the plurality words in the particular query, determining whether one or more stems of particular words in the particular query occurs in a dictionary comprising one or more transformations based upon stems of words; selecting, from the dictionary, one or more transformations of the one or more stems of the particular words; generating at least one candidate query that includes a transformation of one or more particular words; computing a value for each candidate query; selecting at least one candidate query to execute based upon the computed value for each candidate query; executing the particular query and the at least one selected candidate query to generate search results across a plurality of documents; and displaying at least a portion of the search results. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32)
-
-
15. A method, comprising:
-
obtaining a vocabulary of all words used in a specified number of most frequently entered search queries submitted by users; generating, based on the vocabulary, a dictionary comprising one or more transformations of words; receiving a particular query from a user; based on the particular query, determining whether one or more particular words in the particular query occur in the dictionary; selecting, from the dictionary, a different form of a particular word that is indicated by a transformation in which the particular word occurs; generating search results across a plurality of documents based on executing a version of the particular query that includes both the original and the different form of the one or more particular words; and displaying at least a portion of the search results. - View Dependent Claims (16, 33, 34)
-
-
17. A method, comprising:
-
receiving a particular query from a user; based on the particular query, determining whether one or more particular words in the particular query is able to be transformed; determining one or more transformed forms of the one or more particular words; determining whether using the one or more transformed forms of the one or more particular words has a high probability to produce relevant search results; in response to determining that transforming the one or more particular words has a high probability to produce relevant search results, using a particular word and a transformed word for the particular word within a version of the particular query; and generating search results across a plurality of documents based on executing the particular query and the version of the particular query; and displaying at least a portion of the search results. - View Dependent Claims (18, 35, 36)
-
Specification