Generating a related set of documents for an initial set of documents

US 8,972,394 B1
Filed: 05/20/2013
Issued: 03/03/2015
Est. Priority Date: 07/20/2009
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method for identifying one or more second documents related to one or more documents of a set of first documents, the method comprising:

aggregating user selection data for multiple users, the first documents and a group of candidate documents being in a corpus of web documents, the user selection data indicating, for each of the multiple users, whether the user viewed one of the candidate documents during a window of time after a search result corresponding to one of the first documents was presented to the user on a search results web page in response to a query;

determining, using the aggregated user selection data, a respective strength of relationship score between each candidate document in the group of candidate documents and each first document in the set of first documents, each respective strength of relationship score being determined based on whether each user of the multiple users viewed the candidate document during the window of time after a search result corresponding to the first document was presented to the user on a search results web page in response to a query;

calculating an aggregate strength of relationship score for each candidate document from the respective strength of relationship scores for the candidate document; and

selecting the one or more second documents from the candidate documents according to the aggregate strength of relationship scores for the candidate documents.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for identifying one or more second documents related to one or more first documents. Strength of relationship scores between candidate documents in a group of candidate documents and each first document are determined by aggregating user selection data for users, the user selection data indicating, for each user, whether the user viewed the candidate document during a window of time after the first document is presented to the user on a search results web page in response to a query. An aggregate strength of relationship score is calculated for each candidate document from the strength of relationship scores for the candidate document. Second documents are selected from the candidate documents according to the aggregate strength of relationship scores for the candidate documents.

323 Citations

29 Claims

1. A computer-implemented method for identifying one or more second documents related to one or more documents of a set of first documents, the method comprising:
- aggregating user selection data for multiple users, the first documents and a group of candidate documents being in a corpus of web documents, the user selection data indicating, for each of the multiple users, whether the user viewed one of the candidate documents during a window of time after a search result corresponding to one of the first documents was presented to the user on a search results web page in response to a query;
  
  determining, using the aggregated user selection data, a respective strength of relationship score between each candidate document in the group of candidate documents and each first document in the set of first documents, each respective strength of relationship score being determined based on whether each user of the multiple users viewed the candidate document during the window of time after a search result corresponding to the first document was presented to the user on a search results web page in response to a query;
  
  calculating an aggregate strength of relationship score for each candidate document from the respective strength of relationship scores for the candidate document; and
  
  selecting the one or more second documents from the candidate documents according to the aggregate strength of relationship scores for the candidate documents.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
- - 2. The method of claim 1 wherein the user selection data further indicates whether each of the multiple users viewed the candidate document for a threshold period of time.
  - 3. The method of claim 1 wherein aggregating user selection data further comprises scaling the user selection data for one of the multiple users by a scoring factor when the one of the multiple users views the candidate document during the window of time after the first document is selected by the one of the multiple users from the search results web page.
  - 4. The method of claim 1 wherein determining a respective strength of relationship score between each candidate document in the group of candidate documents and each first document of the first documents further comprises using a popularity of the candidate document to normalize the respective strength of relationship score.
  - 5. The method of claim 1 further comprising:
    - identifying documents responsive to the query as the first documents; and
      
      generating an augmented set of documents responsive to the query, by the augmented set of documents including one or more of the second documents in and the first documents.
  - 6. The method of claim 5, further comprising:
    - receiving the query; and
      
      presenting the augmented set of documents in response to the received query.
  - 7. The method of claim 1, further comprising:
    - selecting the first documents from documents a first user has viewed for a second period of time; and
      
      presenting the one or more second documents as suggested documents.
  - 8. The method of claim 1 wherein the strength of relationship score is a probability that the candidate document will be viewed given that the first document has been presented to a user on a search results web page in response to a query.
  - 9. The method of claim 1 wherein the respective strength of relationship score is a count of the users who viewed the candidate document during the window of time after the first document was presented divided by a count of the users who viewed the first document.
  - 10. The method of claim 1, further comprising:
    - receiving input from a second user indicating that one or more of the first documents are preferred documents;
      
      calculating a respective document weight for each of the preferred documents; and
      
      wherein calculating the aggregate strength of relationship score for each candidate document includes weighting the strength of relationship scores for the candidate document and each of the preferred documents by the respective document weight for the preferred document.
  - 11. The method of claim 1 wherein the aggregated user selection data for the multiple users comprises a sum of weights wherein each weight corresponds to a presentation of the first document to a user or selection of the first document by the user, and wherein the respective strength of relationship score is the sum of weights divided by the total number of times the source document was presented.
  - 12. The method of claim 1 wherein calculating the aggregate strength of relationship score for each candidate document from the respective strength of relationship scores for the candidate document further comprises scaling the aggregate strength of relationship score for the candidate document.
  - 13. The method of claim 1 wherein calculating an aggregate strength of relationship score for each candidate document from the respective strength of relationship scores for the candidate document further comprises normalizing the aggregate strength of relationship score by a popularity of the candidate document.
  - 14. The method of claim 1 wherein calculating an aggregate strength of relationship score for each candidate document from the respective strength of relationship scores for the candidate document further comprises normalizing the aggregate strength of relationship score by subtracting a logarithm of a popularity of the candidate document from a sum of logarithms of the respective strength of relationship scores for the candidate document.

15. A system for identifying one or more second documents related to one or more documents of a set of first documents, the system comprising:
- one or more computers configured to perform operations comprising;
  
  aggregating user selection data for multiple users, the first documents and a group of candidate documents being in a corpus of web documents, the user selection data indicating, for each of the multiple users, whether the user viewed one of the candidate documents during a window of time after a search result corresponding to one of the first is documents was presented to the user on a search results web page in response to a query;
  
  determining, using the aggregated user selection data, a respective strength of relationship score between each candidate document in the group of candidate documents and each first document in the set of first documents, each respective strength of relationship score being determined based on whether each user of the multiple users viewed the candidate document during the window of time after a search result corresponding to the first document was presented to the user on a search results web page in response to a query;
  
  calculating an aggregate strength of relationship score for each candidate document from the respective strength of relationship scores for the candidate document; and
  
  selecting the one or more second documents from the candidate documents according to the aggregate strength of relationship scores for the candidate documents.
- View Dependent Claims (16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28)
- - 16. The system of claim 15 wherein the user selection data further indicates whether each of the multiple users viewed the candidate document for a threshold period of time.
  - 17. The system of claim 15 wherein aggregating user selection data further comprises scaling the user selection data for one of the multiple users by a scoring factor when the one of the multiple users views the candidate document during the window of time after the first document is selected by the one of the multiple users from the search results web page.
  - 18. The system of claim 15 wherein determining a respective strength of relationship score between each candidate document in the group of candidate documents and each first document of the first documents further comprises using a popularity of the candidate document to normalize the respective strength of relationship score.
  - 19. The system of claim 15 wherein the operations further comprise:
    - identifying documents responsive to the query as the first documents; and
      
      generating an augmented set of documents responsive to the query, the augmented set of documents including one or more of the second documents and the first documents.
  - 20. The system of claim 19 wherein the operations further comprise:
    - receiving the query; and
      
      presenting the augmented set of documents in response to the received query.
  - 21. The system of claim 15 wherein the operations further comprise:
    - selecting the first documents from documents a first user has viewed for a second period of time; and
      
      presenting the one or more second documents as suggested documents.
  - 22. The system of claim 15 wherein the strength of relationship score is a probability that the candidate document will be viewed given that the first document has been presented to a user on a search results web page in response to a query.
  - 23. The system of claim 15 wherein the respective strength of relationship score is a count of the users who viewed the candidate document during the window of time after the first document was presented divided by a count of the users who viewed the first document.
  - 24. The system of claim 15 wherein the operations further comprise:
    - receiving input from a second user indicating that one or more of the first documents are preferred documents;
      
      calculating a respective document weight for each of the preferred documents; and
      
      wherein calculating the aggregate strength of relationship score for each candidate document includes weighting the strength of relationship scores for the candidate document and each of the preferred documents by the respective document weight for the preferred document.
  - 25. The system of claim 15 wherein the aggregated user selection data for the multiple users comprises a sum of weights wherein each weight corresponds to a presentation of the first document to a user or selection of the first document by the user, and wherein the respective strength of relationship score is the sum of weights divided by the total number of times the source document was presented.
  - 26. The system of claim 15 wherein calculating the aggregate strength of relationship score for each candidate document from the respective strength of relationship scores for the candidate document further comprises scaling the aggregate strength of relationship score for the candidate document.
  - 27. The system of claim 15 wherein calculating an aggregate strength of relationship score for each candidate document from the respective strength of relationship scores for the candidate document further comprises normalizing the aggregate strength of relationship score by a popularity of the candidate document.
  - 28. The system of claim 15 wherein calculating an aggregate strength of relationship score for each candidate document from the respective strength of relationship scores for the candidate document further comprises normalizing the aggregate strength of relationship score by subtracting a logarithm of a popularity of the candidate document from a sum of logarithms of the respective strength of relationship scores for the candidate document.

29. A non-transitory computer storage medium having instructions stored thereon that, when executed by data processing apparatus, cause the data processing apparatus to perform operations comprising:
- aggregating user selection data for multiple users, the first documents and a group of candidate documents being in a corpus of web documents, the user selection data indicating, for each of the multiple users, whether the user viewed one of the candidate documents during a window of time after a search result corresponding to one of the first documents was presented to the user on a search results web page in response to a query;
  
  determining, using the aggregated user selection data, a respective strength of relationship score between each candidate document in the group of candidate documents and each first document in the set of first documents, each respective strength of relationship score being determined based on whether each user of the multiple users viewed the candidate document during the window of time after a search result corresponding to the first document was presented to the user on a search results web page in response to a query;
  
  calculating an aggregate strength of relationship score for each candidate document from the respective strength of relationship scores for the candidate document; and
  
  selecting the one or more second documents from the candidate documents according to the aggregate strength of relationship scores for the candidate documents.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Tong, Simon, Lee, Benjamin N., Altendorf, Eric E.
Primary Examiner(s)
Mofiz, Apu
Assistant Examiner(s)
Nguyen, Cindy

Application Number

US13/898,363
Time in Patent Office

652 Days
Field of Search

707/728, 707/748
US Class Current

707/728
CPC Class Codes

G06F 16/24578 using ranking

G06F 16/355 Class or cluster creation o...

Generating a related set of documents for an initial set of documents

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

323 Citations

29 Claims

Specification

Solutions

Use Cases

Quick Links

Generating a related set of documents for an initial set of documents

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

323 Citations

29 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links