Generating a related set of documents for an initial set of documents

US 8,977,612 B1
Filed: 09/14/2012
Issued: 03/10/2015
Est. Priority Date: 07/20/2009
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method for identifying one or more second documents related to one or more documents of a set of first documents, the method comprising:

for each candidate document in a plurality of candidate documents and each of the first documents, aggregating user selection data for multiple users, the first documents and the candidate documents being in a corpus of web documents, the user selection data indicating, for each of the multiple users, an amount of time the user viewed the candidate document during a window of time after the first document was presented to the user on a search results web page in response to a query;

determining a respective strength of relationship score between each candidate document in the plurality of candidate documents and each of the first documents based on the aggregated user selection data, wherein the strength of relationship score is a probability that the candidate document will be viewed given that the first document was presented to a user as a search result in response to a query;

calculating an aggregate strength of relationship score for each candidate document from the respective strength of relationship scores for the candidate document; and

selecting the one or more second documents from the candidate documents according to the aggregate strength of relationship scores for the candidate documents.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for identifying one or more second documents related to one or more first documents. Strength of relationship scores between candidate documents in a group of candidate documents and each first document are determined by aggregating user selection data for users, the user selection data indicating, for each user, whether the user viewed the candidate document during a window of time after the first document is presented to the user on a search results web page in response to a query. An aggregate strength of relationship score is calculated for each candidate document from the strength of relationship scores for the candidate document. Second documents are selected from the candidate documents according to the aggregate strength of relationship scores for the candidate documents.

Citations

36 Claims

1. A computer-implemented method for identifying one or more second documents related to one or more documents of a set of first documents, the method comprising:
- for each candidate document in a plurality of candidate documents and each of the first documents, aggregating user selection data for multiple users, the first documents and the candidate documents being in a corpus of web documents, the user selection data indicating, for each of the multiple users, an amount of time the user viewed the candidate document during a window of time after the first document was presented to the user on a search results web page in response to a query;
  
  determining a respective strength of relationship score between each candidate document in the plurality of candidate documents and each of the first documents based on the aggregated user selection data, wherein the strength of relationship score is a probability that the candidate document will be viewed given that the first document was presented to a user as a search result in response to a query;
  
  calculating an aggregate strength of relationship score for each candidate document from the respective strength of relationship scores for the candidate document; and
  
  selecting the one or more second documents from the candidate documents according to the aggregate strength of relationship scores for the candidate documents.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. The method of claim 1 wherein aggregating the user selection data further comprises scaling the user selection data for a first user of the multiple users by a scoring factor when the first user views the candidate document during the window of time after the first document is selected by the first user from the search results web page.
  - 3. The method of claim 1 wherein determining a respective strength of relationship score between each candidate document and each of the first documents further comprises:
    - using a popularity of the candidate document to normalize the respective strength of relationship score.
  - 4. The method of claim 1 wherein determining a respective strength of relationship score between each candidate document and each of the first documents further comprises:
    - dividing a sum of the amounts of time users viewed the candidate document during the window of time after the first document was presented to the users by a count of times the first document was presented to the users.
  - 5. The method of claim 1 wherein the one or more second documents are associated with a natural language, and wherein determining a respective strength of relationship score between each candidate document and each of the first documents further comprises:
    - scaling the strength of relationship score by a percentage of the multiple users who viewed the candidate document and are associated with the natural language.
  - 6. The method of claim 1, further comprising:
    - identifying documents responsive to a query as the first documents; and
      
      generating an augmented set of documents responsive to the query by including one or more of the second documents in the first documents.
  - 7. The method of claim 6, further comprising:
    - receiving the query; and
      
      providing the augmented set of documents in response to the received query to a client device.
  - 8. The method of claim 1, further comprising:
    - selecting the first documents from documents a first user has viewed for a second period of time; and
      
      providing the one or more second documents as suggested documents to a client device.
  - 9. The method of claim 1, further comprising:
    - receiving input from a second user indicating that one or more of the first documents are preferred documents;
      
      calculating a respective document weight for each of the preferred documents; and
      
      wherein calculating the aggregate strength of relationship score for each candidate document comprises weighting the strength of relationship scores for the candidate document and each of the preferred documents by the respective document weight for the preferred document.
  - 10. The method of claim 1, further comprising:
    - receiving input from a second user indicating that one or more of the first documents are disliked documents;
      
      calculating a respective document weight for each of the disliked documents; and
      
      wherein calculating the aggregate strength of relationship score for each candidate document comprises weighting the strength of relationship scores for the candidate document and each of the disliked documents by the respective document weight for the disliked document.
  - 11. The method of claim 10, further comprising providing one or more of the second documents as suggested documents to a client device.
  - 12. The method of claim 1, further comprising:
    - selecting the first documents based on one or more first queries issued during a session, where each of the first documents is responsive to at least one of the one or more first queries;
      
      identifying one or more second queries corresponding to the one or more second documents from data associating queries and documents; and
      
      providing the one or more second queries as suggested queries to a client device.

13. A system comprising:
- one or more computers programmed to perform operations comprising;
  
  for each candidate document in a plurality of candidate documents and each of the first documents, aggregating user selection data for multiple users, the first documents and the candidate documents being in a corpus of web documents, the user selection data indicating, for each of the multiple users, an amount of time the user viewed the candidate document during a window of time after the first document was presented to the user on a search results web page in response to a query;
  
  determining a respective strength of relationship score between each candidate document in the plurality of candidate documents and each of the first documents based on the aggregated user selection data, wherein the strength of relationship score is a probability that the candidate document will be viewed given that the first document was presented to a user as a search result in response to a query;
  
  calculating an aggregate strength of relationship score for each candidate document from the respective strength of relationship scores for the candidate document; and
  
  selecting the one or more second documents from the candidate documents according to the aggregate strength of relationship scores for the candidate documents.
- View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24)
- - 14. The system of claim 13 wherein aggregating the user selection data further comprises scaling the user selection data for a first user of the multiple users by a scoring factor when the first user views the candidate document during the window of time after the first document is selected by the first user from the search results web page.
  - 15. The system of claim 13 wherein determining a respective strength of relationship score between each candidate document and each of the first documents further comprises:
    - using a popularity of the candidate document to normalize the respective strength of relationship score.
  - 16. The system of claim 13 wherein determining a respective strength of relationship score between each candidate document and each of the first documents further comprises:
    - dividing a sum of the amounts of time users viewed the candidate document during the window of time after the first document was presented to the users by a count of times the first document was presented to the users.
  - 17. The system of claim 13 wherein the one or more second documents are associated with a natural language, and wherein determining a respective strength of relationship score between each candidate document and each of the first documents further comprises:
    - scaling the strength of relationship score by a percentage of the multiple users who viewed the candidate document and are associated with the natural language.
  - 18. The system of claim 13, wherein the operations further comprise:
    - identifying documents responsive to a query as the first documents; and
      
      generating an augmented set of documents responsive to the query by including one or more of the second documents in the first documents.
  - 19. The system of claim 18, wherein the operations further comprise:
    - receiving the query; and
      
      providing the augmented set of documents in response to the received query to a client device.
  - 20. The system of claim 13, wherein the operations further comprise:
    - selecting the first documents from documents a first user has viewed for a second period of time; and
      
      providing the one or more second documents as suggested documents to a client device.
  - 21. The system of claim 13, wherein the operations further comprise:
    - receiving input from a second user indicating that one or more of the first documents are preferred documents;
      
      calculating a respective document weight for each of the preferred documents; and
      
      wherein calculating the aggregate strength of relationship score for each candidate document comprises weighting the strength of relationship scores for the candidate document and each of the preferred documents by the respective document weight for the preferred document.
  - 22. The system of claim 13, wherein the operations further comprise:
    - receiving input from a second user indicating that one or more of the first documents are disliked documents;
      
      calculating a respective document weight for each of the disliked documents; and
      
      wherein calculating the aggregate strength of relationship score for each candidate document comprises weighting the strength of relationship scores for the candidate document and each of the disliked documents by the respective document weight for the disliked document.
  - 23. The system of claim 22, wherein the operations further comprise providing one or more of the second documents as suggested documents to a client device.
  - 24. The system of claim 13, wherein the operations further comprise:
    - selecting the first documents based on one or more first queries issued during a session, where each of the first documents is responsive to at least one of the one or more first queries;
      
      identifying one or more second queries corresponding to the one or more second documents from data associating queries and documents; and
      
      providing the one or more second queries as suggested queries to a client device.

25. A non-transitory computer-readable storage medium having instructions stored thereon that, when executed by one or more computers, cause the one or more computers to perform operations comprising:
- for each candidate document in a plurality of candidate documents and each of the first documents, aggregating user selection data for multiple users, the first documents and the candidate documents being in a corpus of web documents, the user selection data indicating, for each of the multiple users, an amount of time the user viewed the candidate document during a window of time after the first document was presented to the user on a search results web page in response to a query;
  
  determining a respective strength of relationship score between each candidate document in the plurality of candidate documents and each of the first documents based on the aggregated user selection data, wherein the strength of relationship score is a probability that the candidate document will be viewed given that the first document was presented to a user as a search result in response to a query;
  
  calculating an aggregate strength of relationship score for each candidate document from the respective strength of relationship scores for the candidate document; and
  
  selecting the one or more second documents from the candidate documents according to the aggregate strength of relationship scores for the candidate documents.
- View Dependent Claims (26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36)
- - 26. The non-transitory storage medium of claim 25 wherein aggregating the user selection data further comprises scaling the user selection data for a first user of the multiple users by a scoring factor when the first user views the candidate document during the window of time after the first document is selected by the first user from the search results web page.
  - 27. The non-transitory storage medium of claim 25 wherein determining a respective strength of relationship score between each candidate document and each of the first documents further comprises:
    - using a popularity of the candidate document to normalize the respective strength of relationship score.
  - 28. The non-transitory storage medium of claim 25 wherein determining a respective strength of relationship score between each candidate document and each of the first documents further comprises:
    - dividing a sum of the amounts of time users viewed the candidate document during the window of time after the first document was presented to the users by a count of times the first document was presented to the users.
  - 29. The non-transitory storage medium of claim 25 wherein the one or more second documents are associated with a natural language, and wherein determining a respective strength of relationship score between each candidate document and each of the first documents further comprises:
    - scaling the strength of relationship score by a percentage of the multiple users who viewed the candidate document and are associated with the natural language.
  - 30. The non-transitory storage medium of claim 25, wherein the operations further comprise:
    - identifying documents responsive to a query as the first documents; and
      
      generating an augmented set of documents responsive to the query by including one or more of the second documents in the first documents.
  - 31. The non-transitory storage medium of claim 30, wherein the operations further comprise:
    - receiving the query; and
      
      providing the augmented set of documents in response to the received query to a client device.
  - 32. The non-transitory storage medium of claim 25, wherein the operations further comprise:
    - selecting the first documents from documents a first user has viewed for a second period of time; and
      
      providing the one or more second documents as suggested documents to a client device.
  - 33. The non-transitory storage medium of claim 25, wherein the operations further comprise:
    - receiving input from a second user indicating that one or more of the first documents are preferred documents;
      
      calculating a respective document weight for each of the preferred documents; and
      
      wherein calculating the aggregate strength of relationship score for each candidate document comprises weighting the strength of relationship scores for the candidate document and each of the preferred documents by the respective document weight for the preferred document.
  - 34. The non-transitory storage medium of claim 25, wherein the operations further comprise:
    - receiving input from a second user indicating that one or more of the first documents are disliked documents;
      
      calculating a respective document weight for each of the disliked documents; and
      
      wherein calculating the aggregate strength of relationship score for each candidate document comprises weighting the strength of relationship scores for the candidate document and each of the disliked documents by the respective document weight for the disliked document.
  - 35. The non-transitory storage medium of claim 34, wherein the operations further comprise providing one or more of the second documents as suggested documents to a client device.
  - 36. The non-transitory storage medium of claim 25, wherein the operations further comprise:
    - selecting the first documents based on one or more first queries issued during a session, where each of the first documents is responsive to at least one of the one or more first queries;
      
      identifying one or more second queries corresponding to the one or more second documents from data associating queries and documents; and
      
      providing the one or more second queries as suggested queries to a client device.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Tong, Simon, Lee, Benjamin N., Altendorf, Eric E.
Primary Examiner(s)
Mofiz, Apu
Assistant Examiner(s)
Nguyen, Cindy

Application Number

US13/617,019
Time in Patent Office

907 Days
Field of Search

707/728, 707/748
US Class Current

707/728
CPC Class Codes

G06F 16/24578 using ranking

G06F 16/355 Class or cluster creation o...

Generating a related set of documents for an initial set of documents

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

36 Claims

Specification

Solutions

Use Cases

Quick Links

Generating a related set of documents for an initial set of documents

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

36 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links