Generating a related set of documents for an initial set of documents

US 8,447,760 B1
Filed: 07/20/2009
Issued: 05/21/2013
Est. Priority Date: 07/20/2009
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method for identifying one or more second documents related to one or more documents of a set of first documents, the method comprising:

determining a respective strength of relationship score between each candidate document in a group of candidate documents and each of the first documents by aggregating user selection data for multiple users, the first documents and the candidate documents being in a corpus of web documents, the user selection data indicating, for each of the multiple users, whether the user viewed the candidate document during a window of time after the first document is presented to the user on a search results web page in response to a query, wherein the strength of relationship score is a probability that the candidate document will be viewed given that the first document has been presented to a user on a search results web page in response to a query;

calculating an aggregate strength of relationship score for each candidate document from the respective strength of relationship scores for the candidate document; and

selecting the one or more second documents from the candidate documents according to the aggregate strength of relationship scores for the candidate documents.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for identifying one or more second documents related to one or more first documents. Strength of relationship scores between candidate documents in a group of candidate documents and each first document are determined by aggregating user selection data for users, the user selection data indicating, for each user, whether the user viewed the candidate document during a window of time after the first document is presented to the user on a search results web page in response to a query. An aggregate strength of relationship score is calculated for each candidate document from the strength of relationship scores for the candidate document. Second documents are selected from the candidate documents according to the aggregate strength of relationship scores for the candidate documents.

Citations

60 Claims

1. A computer-implemented method for identifying one or more second documents related to one or more documents of a set of first documents, the method comprising:
- determining a respective strength of relationship score between each candidate document in a group of candidate documents and each of the first documents by aggregating user selection data for multiple users, the first documents and the candidate documents being in a corpus of web documents, the user selection data indicating, for each of the multiple users, whether the user viewed the candidate document during a window of time after the first document is presented to the user on a search results web page in response to a query, wherein the strength of relationship score is a probability that the candidate document will be viewed given that the first document has been presented to a user on a search results web page in response to a query;
  
  calculating an aggregate strength of relationship score for each candidate document from the respective strength of relationship scores for the candidate document; and
  
  selecting the one or more second documents from the candidate documents according to the aggregate strength of relationship scores for the candidate documents.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 47, 48, 49, 50, 51, 52, 53)
- - 2. The method of claim 1, wherein the user selection data further indicates whether each of the multiple users viewed the candidate document for a threshold period of time.
  - 3. The method of claim 1, wherein aggregating user selection data further comprises scaling the user selection data for one of the multiple users by a scoring factor when the one of the multiple users views the candidate document during the window of time after the first document is selected by the one of the multiple users from the search results web page.
  - 4. The method of claim 1, wherein determining a respective strength of relationship score between each candidate document and each of the first documents further comprises using a popularity of the candidate document to normalize the respective strength of relationship score.
  - 5. The method of claim 1, wherein:
    - the one or more second documents are associated with a natural language; and
      
      determining a respective strength of relationship score between each candidate document and each of the first documents further includes scaling the strength of relationship score by a percentage of the multiple users who viewed the candidate document and are associated with the natural language.
  - 6. The method of claim 1, further comprising:
    - identifying documents responsive to a query as the first documents; and
      
      generating an augmented set of documents responsive to the query by including one or more of the second documents in the first documents.
  - 7. The method of claim 6, further comprising:
    - receiving the query; and
      
      presenting the augmented set of documents in response to the received query.
  - 8. The method of claim 1, further comprising:
    - selecting the first documents from documents a first user has viewed for a second period of time; and
      
      presenting the one or more second documents as suggested documents.
  - 9. The method of claim 8, wherein presenting the one or more second documents as suggested documents includes presenting the one or more second documents in a toolbar.
  - 10. The method of claim 1, further comprising:
    - receiving input from a second user indicating that one or more of the first documents are preferred documents;
      
      calculating a respective document weight for each of the preferred documents; and
      
      wherein calculating the aggregate strength of relationship score for each candidate document includes weighting the strength of relationship scores for the candidate document and each of the preferred documents by the respective document weight for the preferred document.
  - 11. The method of claim 10, further comprising:
    - receiving input from the second user indicating that one or more of the first documents are disliked documents;
      
      calculating a respective document weight for each of the disliked documents; and
      
      wherein calculating the aggregate strength of relationship score for each candidate document includes weighting the strength of relationship scores for the candidate document and each of the disliked document documents by the respective document weight for the disliked document.
  - 12. The method of claim 10, further comprising presenting one or more of the second documents as suggested documents.
  - 13. The method of claim 1, further comprising:
    - selecting the first documents based on one or more first queries issued during a session, where each of the first documents is responsive to at least one of the one or more first queries;
      
      identifying one or more second queries corresponding to the one or more second documents from data associating queries and documents; and
      
      presenting the one or more second queries as suggested queries.
  - 47. The method of claim 1 wherein the first document or a document below the first document in the search results web page was selected by the user before the user viewed the candidate document.
  - 48. The method of claim 1 wherein the respective strength of relationship score is a count of the users who viewed the candidate document during the window of time after the first document was presented divided by a count of the users who viewed the first document.
  - 49. The method of claim 1 wherein the aggregated user selection data for the multiple users comprises a sum of weights wherein each weight corresponds to a presentation of the first document to a user or selection of the first document by the user, and wherein the respective strength of relationship score is the sum of weights divided by the total number of times the source document was presented.
  - 50. The method of claim 49 wherein the weight corresponding to selection of the first document is greater than the weight corresponding to the presentation of the first document.
  - 51. The method of claim 1 wherein calculating the aggregate strength of relationship score for each candidate document from the respective strength of relationship scores for the candidate document further comprises scaling the aggregate strength of relationship score for the candidate document.
  - 52. The method of claim 1 wherein calculating an aggregate strength of relationship score for each candidate document from the respective strength of relationship scores for the candidate document further comprises normalizing the aggregate strength of relationship score by a popularity of the candidate document.
  - 53. The method of claim 1 wherein calculating an aggregate strength of relationship score for each candidate document from the respective strength of relationship scores for the candidate document further comprises normalizing the aggregate strength of relationship score by subtracting a logarithm of a popularity of the candidate document from a sum of logarithms of the respective strength of relationship scores for the candidate document.

14. A system for identifying one or more second documents related to one or more documents of a set of first documents, the system comprising:
- one or more computers configured to perform operations comprising;
  
  determining a respective strength of relationship score between each candidate document in a group of candidate documents and each of the first documents by aggregating user selection data for multiple users, the first documents and the candidate documents being in a corpus of web documents, the user selection data indicating, for each of the multiple users, whether the user viewed the candidate document during a window of time after the first document is presented to the user on a search results web page in response to a query, wherein the strength of relationship score is a probability that the candidate document will be viewed given that the first document has been presented to a user on a search results web page in response to a query;
  
  calculating an aggregate strength of relationship score for each candidate document from the respective strength of relationship scores for the candidate document; and
  
  selecting the one or more second documents from the candidate documents according to the aggregate strength of relationship scores for the candidate documents.
- View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 54, 55, 56, 57, 58, 59, 60)
- - 15. The system of claim 14, wherein the user selection data further indicates whether each of the multiple users viewed the candidate document for a threshold period of time.
  - 16. The system of claim 14, wherein aggregating user selection data further comprises scaling the user selection data for one of the multiple users by a scoring factor when the one of the multiple users views the candidate document during the window of time after the first document is selected by the one of the multiple users from the search results web page.
  - 17. The system of claim 14, wherein determining a respective strength of relationship score between each candidate document and each of the first documents further comprises using a popularity of the candidate document to normalize the respective strength of relationship score.
  - 18. The system of claim 14, wherein:
    - the one or more second documents are associated with a natural language; and
      
      determining a respective strength of relationship score between each candidate document and each of the first documents further includes scaling the strength of relationship score by a percentage of the multiple users who viewed the candidate document and are associated with the natural language.
  - 19. The system of claim 14, further configured to perform operations comprising:
    - identifying documents responsive to a query as the first documents; and
      
      generating an augmented set of documents responsive to the query by including one or more of the second documents in the first documents.
  - 20. The system of claim 19, further configured to perform operations comprising:
    - receiving the query; and
      
      presenting the augmented set of documents in response to the received query.
  - 21. The system of claim 14, further operable to perform operations comprising:
    - selecting the first documents from documents a first user has viewed for a second period of time; and
      
      presenting the one or more second documents as suggested documents.
  - 22. The system of claim 21, wherein presenting the one or more second documents as suggested documents includes presenting the one or more second documents in a toolbar.
  - 23. The system of claim 14, further operable to perform operations comprising:
    - receiving input from a second user indicating that one or more of the first documents are preferred documents;
      
      calculating a respective document weight for each of the preferred documents; and
      
      wherein calculating the aggregate strength of relationship score for each candidate document includes weighting the strength of relationship scores for the candidate document and each of the preferred documents by the respective document weight for the preferred document.
  - 24. The system of claim 23, further operable to perform operations comprising:
    - receiving input from the second user indicating that one or more of the first documents are disliked documents;
      
      calculating a respective document weight for each of the disliked documents; and
      
      wherein calculating the aggregate strength of relationship score for each candidate document includes weighting the strength of relationship scores for the candidate document and each of the disliked documents by the respective document weight for the disliked document.
  - 25. The system of claim 23, further operable to perform operations comprising presenting one or more of the second documents as suggested documents.
  - 26. The system of claim 14, further operable to perform operations comprising:
    - selecting the first documents based on one or more first queries issued during a session, where each of the first documents is responsive to at least one of the one or more first queries;
      
      identifying one or more second queries corresponding to the one or more second documents from data associating queries and documents; and
      
      presenting the one or more second queries as suggested queries.
  - 54. The system of claim 14 wherein the first document or a document below the first document in the search results web page was selected by the user before the user viewed the candidate document.
  - 55. The system of claim 14 wherein the respective strength of relationship score is a count of the users who viewed the candidate document during the window of time after the first document was presented divided by a count of the users who viewed the first document.
  - 56. The system of claim 14 wherein the aggregated user selection data for the multiple users comprises a sum of weights wherein each weight corresponds to a presentation of the first document to a user or selection of the first document by the user, and wherein the respective strength of relationship score is the sum of weights divided by the total number of times the source document was presented.
  - 57. The system of claim 56 wherein the weight corresponding to selection of the first document is greater than the weight corresponding to the presentation of the first document.
  - 58. The system of claim 14 wherein calculating the aggregate strength of relationship score for each candidate document from the respective strength of relationship scores for the candidate document further comprises scaling the aggregate strength of relationship score for the candidate document.
  - 59. The system of claim 14 wherein calculating an aggregate strength of relationship score for each candidate document from the respective strength of relationship scores for the candidate document further comprises normalizing the aggregate strength of relationship score by a popularity of the candidate document.
  - 60. The system of claim 14 wherein calculating an aggregate strength of relationship score for each candidate document from the respective strength of relationship scores for the candidate document further comprises normalizing the aggregate strength of relationship score by subtracting a logarithm of a popularity of the candidate document from a sum of logarithms of the respective strength of relationship scores for the candidate document.

27. A non-transitory computer storage medium having instructions stored thereon that, when executed by data processing apparatus, cause the data processing apparatus to perform operations comprising:
- determining a respective strength of relationship score between each candidate document in a group of candidate documents and each of the first documents by aggregating user selection data for multiple users, the first documents and the candidate documents being in a corpus of web documents, the user selection data indicating, for each of the multiple users, whether the user viewed the candidate document during a window of time after the first document is presented to the user on a search results web page in response to a query, wherein the strength of relationship score is a probability that the candidate document will be viewed given that the first document has been presented to a user on a search results web page in response to a query;
  
  calculating an aggregate strength of relationship score for each candidate document from the respective strength of relationship scores for the candidate document; and
  
  selecting the one or more second documents from the candidate documents according to the aggregate strength of relationship scores for the candidate documents.
- View Dependent Claims (28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46)
- - 28. The non-transitory computer storage medium of claim 27, wherein the user selection data further indicates whether each of the multiple users viewed the candidate document for a threshold period of time.
  - 29. The non-transitory computer storage medium of claim 27, wherein aggregating user selection data further comprises scaling the user selection data for one of the multiple users by a scoring factor when the one of the multiple users views the candidate document during the window of time after the first document is selected by the one of the multiple users from the search results web page.
  - 30. The non-transitory computer storage medium of claim 27, wherein determining a respective strength of relationship score between each candidate document and each of the first documents further comprises using a popularity of the candidate document to normalize the respective strength of relationship score.
  - 31. The non-transitory computer storage medium of claim 27, wherein:
    - the one or more second documents are associated with a natural language; and
      
      determining a respective strength of relationship score between each candidate document and each of the first documents further includes scaling the strength of relationship score by a percentage of the multiple users who viewed the candidate document and are associated with the natural language.
  - 32. The non-transitory computer storage medium of claim 27, wherein the operations further comprise:
    - identifying documents responsive to a query as the first documents; and
      
      generating an augmented set of documents responsive to the query by including one or more of the second documents in the first documents.
  - 33. The non-transitory computer storage medium of claim 32, wherein the operations further comprise:
    - receiving the query; and
      
      presenting the augmented set of documents in response to the received query.
  - 34. The non-transitory computer storage medium of claim 27, wherein the operations further comprise:
    - selecting the first documents from documents a first user has viewed for a second period of time; and
      
      presenting the one or more second documents as suggested documents.
  - 35. The non-transitory computer storage medium of claim 34, wherein presenting the one or more second documents as suggested documents includes presenting the one or more second documents in a toolbar.
  - 36. The non-transitory computer storage medium of claim 27, wherein the operations further comprise:
    - receiving input from a second user indicating that one or more of the first documents are preferred documents;
      
      calculating a respective document weight for each of the preferred documents;
      
      nd wherein calculating the aggregate strength of relationship score for each candidate document includes weighting the strength of relationship scores for the candidate document and each of the preferred documents by the respective document weight for the preferred document.
  - 37. The non-transitory computer storage medium of claim 36, wherein the operations further comprise:
    - receiving input from the second user indicating that one or more of the first documents are disliked documents;
      
      calculating a respective document weight for each of the disliked documents; and
      
      wherein calculating the aggregate strength of relationship score for each candidate document includes weighting the strength of relationship scores for the candidate document and each of the disliked documents by the respective document weight for the disliked document.
  - 38. The non-transitory computer storage medium of claim 36, wherein the operations further comprise presenting one or more of the second documents as suggested documents.
  - 39. The non-transitory computer storage medium of claim 27, wherein the operations further comprise:
    - selecting the first documents based on one or more first queries issued during a session, where each of the first documents is responsive to at least one of the one or more first queries;
      
      identifying one or more second queries corresponding to the one or more second documents from data associating queries and documents; and
      
      presenting the one or more second queries as suggested queries.
  - 40. The non-transitory computer storage medium of claim 27 wherein the first document or a document below the first document in the search results web page was selected by the user before the user viewed the candidate document.
  - 41. The non-transitory computer storage medium of claim 27 wherein the respective strength of relationship score is a count of the users who viewed the candidate document during the window of time after the first document was presented divided by a count of the users who viewed the first document.
  - 42. The non-transitory computer storage medium of claim 27 wherein the aggregated user selection data for the multiple users comprises a sum of weights wherein each weight corresponds to a presentation of the first document to a user or selection of the first document by the user, and wherein the respective strength of relationship score is the sum of weights divided by the total number of times the source document was presented.
  - 43. The non-transitory computer storage medium of claim 42 wherein the weight corresponding to selection of the first document is greater than the weight corresponding to the presentation of the first document.
  - 44. The non-transitory computer storage medium of claim 27 wherein calculating the aggregate strength of relationship score for each candidate document from the respective strength of relationship scores for the candidate document further comprises scaling the aggregate strength of relationship score for the candidate document.
  - 45. The non-transitory computer storage medium of claim 27 wherein calculating an aggregate strength of relationship score for each candidate document from the respective strength of relationship scores for the candidate document further comprises normalizing the aggregate strength of relationship score by a popularity of the candidate document.
  - 46. The non-transitory computer storage medium of claim 27 wherein calculating an aggregate strength of relationship score for each candidate document from the respective strength of relationship scores for the candidate document further comprises normalizing the aggregate strength of relationship score by subtracting a logarithm of a popularity of the candidate document from a sum of logarithms of the respective strength of relationship scores for the candidate document.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Tong, Simon, Lee, Benjamin N., Altendorf, Eric E.
Primary Examiner(s)
Nguyen, Cindy

Application Number

US12/506,203
Time in Patent Office

1,401 Days
Field of Search

707/714, 707/725, 707/728, 707/737
US Class Current

707/728
CPC Class Codes

G06F 16/24578 using ranking

G06F 16/355 Class or cluster creation o...

Generating a related set of documents for an initial set of documents

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

60 Claims

Specification

Solutions

Use Cases

Quick Links

Generating a related set of documents for an initial set of documents

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

60 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links