Model generation for ranking documents based on large data sets
First Claim
Patent Images
1. A system, comprising:
- a repository to store training data that includes a plurality of features; and
one or more devices to;
select a candidate condition that includes one or more of the features,estimate, after selection of the candidate condition, a weight for the candidate condition,form a new rule from the candidate condition and the weight,compare a likelihood of occurrence of the training data given a model with the new rule and a likelihood of occurrence of the training data given the model without the new rule,determine, based on the comparison, that a difference of the likelihood of occurrence of the training data in the repository given the model with the new rule and the likelihood of occurrence of the data in the repository given the model without the new rule is greater than a cost, andadd the new rule to the model upon determining that the difference of the likelihood of occurrence of the training data in the repository given the new rule and the likelihood of occurrence of the training data in the repository given the model without the new rule is greater than the cost.
2 Assignments
0 Petitions
Accused Products
Abstract
A system ranks documents based, at least in part, on a ranking model. The ranking model may be generated to predict the likelihood that a document will be selected. The system may receive a search query and identify documents relating to the search query. The system may then rank the documents based, at least in part, on the ranking model and form search results for the search query from the ranked documents.
55 Citations
19 Claims
-
1. A system, comprising:
-
a repository to store training data that includes a plurality of features; and one or more devices to; select a candidate condition that includes one or more of the features, estimate, after selection of the candidate condition, a weight for the candidate condition, form a new rule from the candidate condition and the weight, compare a likelihood of occurrence of the training data given a model with the new rule and a likelihood of occurrence of the training data given the model without the new rule, determine, based on the comparison, that a difference of the likelihood of occurrence of the training data in the repository given the model with the new rule and the likelihood of occurrence of the data in the repository given the model without the new rule is greater than a cost, and add the new rule to the model upon determining that the difference of the likelihood of occurrence of the training data in the repository given the new rule and the likelihood of occurrence of the training data in the repository given the model without the new rule is greater than the cost. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 18)
-
-
13. A method, comprising:
-
selecting, by one or more processors of one or more computing devices, a candidate condition from a set of training data; estimating, by one or more processors of the one or more computing devices, a weight for the selected candidate condition; forming, by one or more processors of the one or more computing devices, a new rule from the selected candidate condition and the estimated weight; comparing, by one or more processors of the one or more computing devices, a likelihood of occurrence of the training data given a model with the new rule and a likelihood of occurrence of the training data given the model without the new rule; determining, by one or more processors of the one or more computing devices, based on the comparison, that a difference of the likelihood of occurrence of the training data in the repository given the model with the new rule and the likelihood of occurrence of the training data in the repository given the model without the new rule is greater than a cost, and adding, by one or more processors of the one or more computing devices, the new rule to the model upon determining that the difference of the likelihood of occurrence of the training data in the repository given the new rule and the likelihood of occurrence of the training data in the repository given the model without the new rule is greater than the cost.
-
-
14. A system, comprising:
-
a repository to store data corresponding to a plurality of prior searches, the data including information regarding users, information regarding query data provided by the users, and information regarding documents retrieved based, at least in part, on searches performed using the query data; and one or more devices to; select a candidate condition based, at least in part, on the data in the repository, estimate a weight for the candidate condition, form a new rule from the candidate condition and the weight, compare a likelihood of occurrence of the data in the repository given a model with the new rule and a likelihood of occurrence of the data in the repository given the model without the new rule, determine, based on the comparison, that a difference of the likelihood of occurrence of the data in the repository given the model with the new rule and the likelihood of occurrence of the data in the repository given the model without the new rule is greater than a cost, and add the new rule to the model upon determining that the difference of the likelihood of occurrence of the data in the repository given the new rule and the likelihood of occurrence of the data in the repository given the model without the new rule is greater than the cost. - View Dependent Claims (15, 16)
-
-
17. A method, comprising:
-
storing, by one or more processors of one or more computing devices, a plurality of instances, each of the instances including information regarding a first user who previously requested a search, information regarding first query data provided by the first user in requesting the search, and information regarding one or more documents retrieved based, at least in part, on the search performed using the first query data or which of the one or more documents were selected by the first user; creating, by one or more processors of the one or more computing devices, rules for a model based, at least in part, on the instances and features extracted for the instances; receiving, by one or more processors of one or more server devices, second query data from a second user; identifying, by one or more processors of the one or more server devices, documents related to the second query data; assigning, by one or more processors of the one or more server devices, a score to each of the identified documents based, at least in part, on the model, the score for one of the identified documents reflecting a prediction of whether the second user will select the one of the identified documents when the second user provides the second query data; and outputting, by one or more processors of the one or more server devices, information regarding one or more of the identified documents based, at least in part, on the score for the one or more of the identified documents, where the creating the rules for the model includes; selecting a candidate condition based, at least in part, on the instances, estimating a weight for the candidate condition, forming a new rule from the candidate condition and the weight, comparing a likelihood of occurrence of the instances given the model with the new rule and a likelihood of occurrence of the instances given the model without the new rule, determining, based on the comparison, that a difference of the likelihood of occurrence of the instances given the model with the new rule and the likelihood of occurrence of the instances given the model without the new rule is greater than a cost, and adding the new rule to the model upon determining that the difference of the likelihood of occurrence of the instances given the new rule and the likelihood of occurrence of the instances given the model without the new rule is greater than the cost.
-
-
19. A computer-readable memory device storing instructions executable by one or more processors, the instructions comprising:
-
one or more instructions for selecting a candidate condition from a set of training data; one or more instructions for estimating a weight for the selected candidate condition; one or more instructions for forming a new rule from the selected candidate condition and the estimated weight; one or more instructions for comparing a likelihood of occurrence of the training data given a model with the new rule and a likelihood of occurrence of the training data given the model without the new rule; one or more instructions for determining, based on the comparison, that a difference of the likelihood of occurrence of the training data in the repository given the model with the new rule and the likelihood of occurrence of the training data in the repository given the model without the new rule is greater than a cost, and one or more instructions for adding the new rule to the model upon determining that the difference of the likelihood of occurrence of the training data in the repository given the new rule and the likelihood of occurrence of the training data in the repository given the model without the new rule is greater than the cost.
-
Specification