Ranking documents based on large data sets

US 10,055,461 B1
Filed: 07/31/2015
Issued: 08/21/2018
Est. Priority Date: 11/14/2003
Status: Expired due to Fees

First Claim

Patent Images

1. A computer-implemented method comprising:

receiving, by a distributed search system, a collection of training data comprising a plurality of training instances that each identify a respective first document selected by a particular user when the first document was identified in search results provided by the search system to the particular user in response to particular search query issued by the particular user;

partitioning the collection of training data over a plurality of computing devices of the distributed search system;

generating, by the distributed search system, a ranking model that produces a likelihood that a particular user will select a particular document when identified by one or more search results provided in response to a particular search query submitted by the particular user, including processing, by each computing device of the plurality of computing devices, training instances assigned to the computing device, including;

selecting, by the computing device, a candidate condition, wherein the candidate condition specifies values for one or more user features, one or more query features, and one or more document features,sending, by the computing device, to each other computing device of the plurality of computing devices, a request to compute local statistics for the candidate condition,receiving, by the computing device from each other computing device of one or more other computing devices, respective computed statistics for the candidate condition computed by the other computing device using values of local training instances assigned to the other computing device,computing, by the computing device, a weight for the candidate condition according to the computed statistics received from the one or more other computing devices for the candidate condition;

determining, by the computing device, that a new rule comprising the candidate condition and the computed weight should be added to the ranking model, andin response, adding the new rule to the ranking model and providing, by the computing device, to each other computing device of the plurality of computing devices, an indication that the new rule comprising the candidate condition and the computed weight should be added to the ranking model;

receiving a search query submitted by a first user;

obtaining a plurality of search results that satisfy the search query, wherein each search result identifies a respective document of a plurality of documents;

determining one or more features of the first user and one or more features of the search query submitted by the first user;

using the one or more features of the first user and the one or more features of the search query as input to the ranking model to compute, for each document identified by the search results, a respective likelihood that the first user will select the document when provided in response to the search query; and

ranking the plurality of search results based on a respective computed likelihood for each document, the computed likelihood for each document being a likelihood that the first user will select the document when provided in response to the search query.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system ranks documents based, at least in part, on a ranking model. The ranking model may be generated to predict the likelihood that a document will be selected. The system may receive a search query and identify documents relating to the search query. The system may then rank the documents based, at least in part, on the ranking model and form search results for the search query from the ranked documents.

Citations

20 Claims

1. A computer-implemented method comprising:
- receiving, by a distributed search system, a collection of training data comprising a plurality of training instances that each identify a respective first document selected by a particular user when the first document was identified in search results provided by the search system to the particular user in response to particular search query issued by the particular user;
  
  partitioning the collection of training data over a plurality of computing devices of the distributed search system;
  
  generating, by the distributed search system, a ranking model that produces a likelihood that a particular user will select a particular document when identified by one or more search results provided in response to a particular search query submitted by the particular user, including processing, by each computing device of the plurality of computing devices, training instances assigned to the computing device, including;
  
  selecting, by the computing device, a candidate condition, wherein the candidate condition specifies values for one or more user features, one or more query features, and one or more document features,sending, by the computing device, to each other computing device of the plurality of computing devices, a request to compute local statistics for the candidate condition,receiving, by the computing device from each other computing device of one or more other computing devices, respective computed statistics for the candidate condition computed by the other computing device using values of local training instances assigned to the other computing device,computing, by the computing device, a weight for the candidate condition according to the computed statistics received from the one or more other computing devices for the candidate condition;
  
  determining, by the computing device, that a new rule comprising the candidate condition and the computed weight should be added to the ranking model, andin response, adding the new rule to the ranking model and providing, by the computing device, to each other computing device of the plurality of computing devices, an indication that the new rule comprising the candidate condition and the computed weight should be added to the ranking model;
  
  receiving a search query submitted by a first user;
  
  obtaining a plurality of search results that satisfy the search query, wherein each search result identifies a respective document of a plurality of documents;
  
  determining one or more features of the first user and one or more features of the search query submitted by the first user;
  
  using the one or more features of the first user and the one or more features of the search query as input to the ranking model to compute, for each document identified by the search results, a respective likelihood that the first user will select the document when provided in response to the search query; and
  
  ranking the plurality of search results based on a respective computed likelihood for each document, the computed likelihood for each document being a likelihood that the first user will select the document when provided in response to the search query.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1, wherein the one or more features of the first user include a location of the first user, a language of the first user, one or more previous queries issued by the first user, or a number of times the first user has accessed a particular document.
  - 3. The method of claim 1, wherein the one or more features of the search query include a language of the query and one or more terms of the query.
  - 4. The method of claim 1, further comprising:
    - generating, by each computing device of the plurality of computing device using local training instances assigned to the computing device, a feature-to-instance index that maps each value of a feature to one or more training instances having the value for the feature;
      
      receiving, by a first computing device of the plurality of computing devices, a request to compute local statistics for the candidate condition;
      
      obtaining, by the first computing device, training instances matching the candidate condition by using one or more features of the candidate condition as input to the feature-to-instance index;
      
      computing local statistics for the candidate condition using matching training instances obtained using the feature-to-instance index; and
      
      providing, by the first computing device, the computed local statistics in response to the request to compute local statistics for the candidate condition.
  - 5. The method of claim 4, wherein each training instance identifies one or more second documents that the particular user did not select when the one or more second documents were identified by the search results provided to the particular user in response to the particular search query.
  - 6. The method of claim 4, wherein each training instance includes data representing a position of the selected first document in an order of the search results provided to the particular user in response to the particular query.
  - 7. The method of claim 4, wherein each training instance includes data representing a previously computed score for the selected first document.
  - 8. The method of claim 4, wherein each training instance comprises data representing a number of documents ranked above the selected first document in the search results provided to the particular user in response to the particular search query.

9. A system comprising:
- one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising;
  
  receiving, by a distributed search system, a collection of training data comprising a plurality of training instances that each identify a respective first document selected by a particular user when the first document was identified in search results provided by the search system to the particular user in response to particular search query issued by the particular user;
  
  partitioning the collection of training data over a plurality of computing devices of the distributed search system;
  
  generating, by the distributed search system, a ranking model that produces a likelihood that a particular user will select a particular document when identified by one or more search results provided in response to a particular search query submitted by the particular user, including processing, by each computing device of the plurality of computing devices, training instances assigned to the computing device, including;
  
  selecting, by the computing device, a candidate condition, wherein the candidate condition specifies values for one or more user features, one or more query features, and one or more document features,sending, by the computing device, to each other computing device of the plurality of computing devices, a request to compute local statistics for the candidate condition,receiving, by the computing device from each other computing device of one or more other computing devices, respective computed statistics for the candidate condition computed by the other computing device using values of local training instances assigned to the other computing device,computing, by the computing device, a weight for the candidate condition according to the computed statistics received from the one or more other computing devices for the candidate condition;
  
  determining, by the computing device, that a new rule comprising the candidate condition and the computed weight should be added to the ranking model, andin response, adding the new rule to the ranking model and providing, by the computing device, to each other computing device of the plurality of computing devices, an indication that the new rule comprising the candidate condition and the computed weight should be added to the ranking model;
  
  receiving a search query submitted by a first user;
  
  obtaining a plurality of search results that satisfy the search query, wherein each search result identifies a respective document of a plurality of documents;
  
  determining one or more features of the first user and one or more features of the search query submitted by the first user;
  
  using the one or more features of the first user and the one or more features of the search query as input to the ranking model to compute, for each document identified by the search results, a respective likelihood that the first user will select the document when provided in response to the search query; and
  
  ranking the plurality of search results based on a respective computed likelihood for each document, the computed likelihood for each document being a likelihood that the first user will select the document when provided in response to the search query.
- View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
- - 10. The system of claim 9, wherein the one or more features of the first user include a location of the first user, a language of the first user, one or more previous queries issued by the first user, or a number of times the first user has accessed a particular document.
  - 11. The system of claim 9, wherein the one or more features of the search query include a language of the query and one or more terms of the query.
  - 12. The system of claim 9, wherein the operations further comprise:
    - generating, by each computing device of the plurality of computing device using local training instances assigned to the computing device, a feature-to-instance index that maps each value of a feature to one or more training instances having the value for the feature;
      
      receiving, by a first computing device of the plurality of computing devices, a request to compute local statistics for the candidate condition;
      
      obtaining, by the first computing device, training instances matching the candidate condition by using one or more features of the candidate condition as input to the feature-to-instance index;
      
      computing local statistics for the candidate condition using matching training instances obtained using the feature-to-instance index; and
      
      providing, by the first computing device, the computed local statistics in response to the request to compute local statistics for the candidate condition.
  - 13. The system of claim 12, wherein each training instance identifies one or more second documents that the particular user did not select when the one or more second documents were identified by the search results provided to the particular user in response to the particular search query.
  - 14. The system of claim 12, wherein each training instance includes data representing a position of the selected first document in an order of the search results provided to the particular user in response to the particular query.
  - 15. The system of claim 12, wherein each training instance includes data representing a previously computed score for the selected first document.
  - 16. The system of claim 12, wherein each training instance comprises data representing a number of documents ranked above the selected first document in the search results provided to the particular user in response to the particular search query.

17. A computer program product, encoded on one or more non-transitory computer storage media, comprising instructions that when executed by one or more computers cause the one or more computers to perform operations comprising:
- receiving, by a distributed search system, a collection of training data comprising a plurality of training instances that each identify a respective first document selected by a particular user when the first document was identified in search results provided by the search system to the particular user in response to particular search query issued by the particular user;
  
  partitioning the collection of training data over a plurality of computing devices of the distributed search system;
  
  generating, by the distributed search system, a ranking model that produces a likelihood that a particular user will select a particular document when identified by one or more search results provided in response to a particular search query submitted by the particular user, including processing, by each computing device of the plurality of computing devices, training instances assigned to the computing device, including;
  
  selecting, by the computing device, a candidate condition, wherein the candidate condition specifies values for one or more user features, one or more query features, and one or more document features,sending, by the computing device, to each other computing device of the plurality of computing devices, a request to compute local statistics for the candidate condition,receiving, by the computing device from each other computing device of one or more other computing devices, respective computed statistics for the candidate condition computed by the other computing device using values of local training instances assigned to the other computing device,computing, by the computing device, a weight for the candidate condition according to the computed statistics received from the one or more other computing devices for the candidate condition;
  
  determining, by the computing device, that a new rule comprising the candidate condition and the computed weight should be added to the ranking model, andin response, adding the new rule to the ranking model and providing, by the computing device, to each other computing device of the plurality of computing devices, an indication that the new rule comprising the candidate condition and the computed weight should be added to the ranking model;
  
  receiving a search query submitted by a first user;
  
  obtaining a plurality of search results that satisfy the search query, wherein each search result identifies a respective document of a plurality of documents;
  
  determining one or more features of the first user and one or more features of the search query submitted by the first user;
  
  using the one or more features of the first user and the one or more features of the search query as input to the ranking model to compute, for each document identified by the search results, a respective likelihood that the first user will select the document when provided in response to the search query; and
  
  ranking the plurality of search results based on a respective computed likelihood for each document, the computed likelihood for each document being a likelihood that the first user will select the document when provided in response to the search query.
- View Dependent Claims (18, 19, 20)
- - 18. The computer program product of claim 17, wherein the one or more features of the first user include a location of the first user, a language of the first user, one or more previous queries issued by the first user, or a number of times the first user has accessed a particular document.
  - 19. The computer program product of claim 17, wherein the one or more features of the search query include a language of the query and one or more terms of the query.
  - 20. The computer program product of claim 17, wherein the operations further comprise:
    - generating, by each computing device of the plurality of computing device using local training instances assigned to the computing device, a feature-to-instance index that maps each value of a feature to one or more training instances having the value for the feature;
      
      receiving, by a first computing device of the plurality of computing devices, a request to compute local statistics for the candidate condition;
      
      obtaining, by the first computing device, training instances matching the candidate condition by using one or more features of the candidate condition as input to the feature-to-instance index;
      
      computing local statistics for the candidate condition using matching training instances obtained using the feature-to-instance index; and
      
      providing, by the first computing device, the computed local statistics in response to the request to compute local statistics for the candidate condition.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google LLC (Alphabet Inc.)
Inventors
Bem, Jeremy, Harik, Georges R., Levenberg, Joshua L., Shazeer, Noam M., Tong, Simon
Primary Examiner(s)
Kerzhner, Aleksandr
Assistant Examiner(s)
Cheung, Eddy

Application Number

US14/815,736
Time in Patent Office

1,117 Days
Field of Search

None
US Class Current
CPC Class Codes

G06F 16/24575   using context

G06F 16/24578   using ranking

G06F 16/3346   using probabilistic model

G06F 16/355   Class or cluster creation o...

G06F 16/951   Indexing; Web crawling tech...

G06F 16/9535   Search customisation based ...

G06N 20/00   Machine learning

G06N 7/01   Probabilistic graphical mod...

Y10S 707/99933   Query processing, i.e. sear...

Y10S 707/99935   Query augmenting and refini...

Y10S 707/99943   Generating database or data...

Ranking documents based on large data sets

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Ranking documents based on large data sets

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links