Ranking documents based on large data sets

US 7,231,399 B1
Filed: 11/14/2003
Issued: 06/12/2007
Est. Priority Date: 11/14/2003
Status: Active Grant

First Claim

Patent Images

1. A computer implemented method, comprising:

creating a ranking model that predicts a likelihood that a document will be selected by;

storing information associated with a plurality of prior searches,determining a prior probability of selection based, at least in part, on the information associated with the prior searches, andgenerating the ranking model based, at least in part on the prior probability of selection;

training the ranking model using a data set that includes approximately tens of millions of instances;

identifying documents relating to a search query;

scoring the documents based, at least in part, on the ranking model;

forming search results for the search query from the scored documents; and

outputting the search results.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system ranks documents based, at least in part, on a ranking model. The ranking model may be generated to predict the likelihood that a document will be selected. The system may receive a search query and identify documents relating to the search query. The system may then rank the documents based, at least in part, on the ranking model and form search results for the search query from the ranked documents.

Citations

25 Claims

1. A computer implemented method, comprising:
- creating a ranking model that predicts a likelihood that a document will be selected by;
  
  storing information associated with a plurality of prior searches,determining a prior probability of selection based, at least in part, on the information associated with the prior searches, andgenerating the ranking model based, at least in part on the prior probability of selection;
  
  training the ranking model using a data set that includes approximately tens of millions of instances;
  
  identifying documents relating to a search query;
  
  scoring the documents based, at least in part, on the ranking model;
  
  forming search results for the search query from the scored documents; and
  
  outputting the search results.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
- - 2. The method of claim 1, wherein the information associated with the prior searches includes, for each of a plurality of documents associated with the prior searches, at least one of a position occupied by the document within prior search results, a score assigned to the document, or a number of documents listed above the document in the prior search results that were selected.
  - 3. The method of claim 1, wherein the creating a ranking model includes:
    - storing training data,extracting features from the training data, andgenerating conditions that include one or more of the extracted features.
  - 4. The method of claim 3, wherein the creating a ranking model further includes:
    - selecting one of the conditions as a candidate condition,estimating a weight for the candidate condition,forming a new rule from the candidate condition and the estimated weight,comparing a likelihood of the training data between a current model with the new rule and the current model without the new rule, andselectively adding the new rule to the current model based, at least in part, on a result of the comparison.
  - 5. The method of claim 4, wherein the selecting one of the conditions as a candidate condition includes at least one of:
    - creating the candidate condition from combinations of features or complements of features in the training data,randomly selecting one of the conditions as the candidate condition,selecting one of the conditions that includes a single one of the features as the candidate condition, oraugmenting one of the conditions by adding one or more features to the one condition to form the candidate condition.
  - 6. The method of claim 4, wherein the estimating a weight includes determining a weight that maximizes a likelihood of the training data given the model.
  - 7. The method of claim 4, wherein the selectively adding the new rule to the current model includes adding the new rule to the current model when a likelihood of the training data occurring when the current model includes the new rule is greater than when the current model does not include the new rule.
  - 8. The method of claim 4, wherein the selectively adding the new rule to the current model further includes:
    - associating a cost with each of the conditions, anddetermining whether to add the new rule to the current model based, at least in part, on the cost associated with the candidate condition.
  - 9. The method of claim 4, further comprising:
    - performing a number of iterations including estimating the weight, forming the new rule, and comparing the likelihood of the training data.
  - 10. The method of claim 1, wherein the data set also includes approximately millions of features.
  - 11. The method of claim 1, wherein the scoring the documents includes:
    - forming an instance that corresponds to the search query and one of the documents,extracting features associated with the instance,identifying rules in the ranking model that apply based, at least in part, on the extracted features, each of the identified rules including a weight, andcombining the weights of the identified rules with a prior probability of selection corresponding to the instance to generate a score for the one document.
  - 12. The method of claim 11, wherein the instance includes user information corresponding to a user who provided the search query, query data corresponding to the search query, and document information corresponding to the one document.
  - 13. The method of claim 1, wherein the scoring the documents includes:
    - determining a prior probability of selection corresponding to the search query and one of the documents, andgenerating a score for the one document based, at least in part, on the determined prior probability of selection.
  - 14. The method of claim 13, wherein the generating a score for the one document includes using the determined prior probability of selection as one of a plurality of factors in determining the score for the one document.

15. A system implemented within one or more computer devices, comprising:
- means for receiving a search query;
  
  means for identifying documents relating to the search query;
  
  means for ranking the documents based, at least in part, on a ranking model trained on a large data set that includes approximately millions of features, the means for ranking includes;
  
  means for determining a prior probability of selection corresponding to the search query and one of the identified documents, andmeans for determining a rank for the one document based, at least in part, on the determined prior probability of selection;
  
  means for forming search results for the search query from the ranked documents; and
  
  means for outputting the search results.

16. A system, comprising:
- a repository configured to store information corresponding to a plurality of prior searches; and
  
  a server configured to;
  
  receive a search query from a user,identify documents corresponding to the search query,rank the identified documents based, at least in part, on a ranking model that includes rules that maximize a likelihood of the repository, when ranking the identified documents, the server is configured to;
  
  determine a prior probability of selection corresponding to the search query and one of the identified documents, anddetermine a rank for the one document based, at least in part, on the determined prior probability of selection, andoutput the ranked documents.
- View Dependent Claims (21, 22)
- - 21. The system of claim 16 wherein when determining a rank four the one document, the server is configured to use the determined prior probability of selection as one of a plurality of factors in determining the rank for the one document.
  - 22. The system of claim 16, wherein the repository stores approximately tens of millions of instances and approximately millions of features associated with the plurality of prior searches.

17. A system, comprising:
- a repository configured to store information corresponding to a plurality of prior searches; and
  
  a server configured to;
  
  receive a search query from a user,identify documents corresponding to the search query,rank the identified documents based, at least in part, on a ranking model that includes rules that maximize a likelihood of the repository, when ranking the identified documents, the server is configured to;
  
  determine a prior probability of selection corresponding to the search query and one of the identified documents, anddetermine a rank for the one document based, at least in part, on the determined prior probability of selection, andoutput the ranked documents.
- View Dependent Claims (18, 19, 20)
- - 18. The system of claim 17, wherein the repository is further configured to store a plurality of features associated with the instances.
  - 19. The system of claim 18, wherein when ranking the documents, the server is configured to:
    - identify one of the instances that corresponds to the search query and one of the identified documents,determine features associated with the identified instance,identify rules in the ranking model that apply based, at least in part, on the determined features, each of the identified rules including a weight, andcombine the weights of the identified rules with a prior probability of selection corresponding to the identified instance to determine a rank for the one document.
  - 20. The system of claim 19, wherein the user information of the identified instance includes information corresponding to the user who provided the search query, the query data of the identified instance includes information corresponding to the search query, and the document information of the identified instance includes information corresponding to the one document.

23. A method, comprising:
- receiving a search query;
  
  identifying documents relating to the search query;
  
  determining prior probabilities of selecting each of the documents, where the prior probability of selecting one of the documents is determined based, at least in part, on data regarding at least one of a position of the document within search results, a prior score assigned to the document, or a number of documents above the document in the search results that were selected;
  
  determining a score for each of the documents based, at least in part, on the prior probability of selecting the document;
  
  generating search results for the search query from the scored documents, andoutputting the search results.

24. A method, comprising:
- creating a ranking model that predicts a likelihood that a document will be selected by;
  
  storing information associated with a plurality of prior searches,determining a prior probability of selection based, at least in part, on the information associated with the prior searches, andgenerating the ranking model based, at least in part, on the prior probability of selection;
  
  identifying documents relating to a search query;
  
  scoring the documents based, at least in part, on the ranking model;
  
  forming search results for the search query from the scored documents; and
  
  outputting the search results.

25. A method, comprising:
- receiving a search query;
  
  identifying documents relating to the search query;
  
  determining a prior probability of selecting one of the documents, the prior probability of selecting the one document is determined based, at least in part, on data regarding at least one of a position of the one document within search results, a prior score assigned to the one document, or a number of documents above the one document in the search results that were selected;
  
  determining a score for the one document based, at least in part, on the prior probability of selecting the one document;
  
  generating a list of search results that includes the one document based on the determined score; and
  
  outputting the list of search results.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Tong, Simon, Shazeer, Noam, Harik, Georges R., Bem, Jeremy, Levenberg, Joshua L.
Primary Examiner(s)
Robinson; Greta
Assistant Examiner(s)
Veillard; Jacques

Application Number

US10/706,991
Time in Patent Office

1,306 Days
Field of Search

707 1- 7, 707/10, 707/100, 707/101, 707/104.1, 707/102, 709218-219, 709/225, 709/230, 709/232, 706/12, 706/25, 706/47
US Class Current

1/1
CPC Class Codes

G06F 16/24575   using context

G06F 16/24578   using ranking

G06F 16/3346   using probabilistic model

G06F 16/355   Class or cluster creation o...

G06F 16/951   Indexing; Web crawling tech...

G06F 16/9535   Search customisation based ...

G06N 20/00   Machine learning

G06N 7/01   Probabilistic graphical mod...

Y10S 707/99933   Query processing, i.e. sear...

Y10S 707/99935   Query augmenting and refini...

Y10S 707/99943   Generating database or data...

Ranking documents based on large data sets

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

25 Claims

Specification

Solutions

Use Cases

Quick Links

Ranking documents based on large data sets

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

25 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links