Clustering documents based on common document selections

US 8,650,196 B1
Filed: 09/30/2011
Issued: 02/11/2014
Est. Priority Date: 09/30/2011
Status: Active Grant

First Claim

Patent Images

1. A method, performed by one or more server devices, the method comprising:

receiving, by at least one of one or more server devices, first navigation information identifying a first set of documents that are selected after a first document is provided,the first navigation information identifying a first plurality of documents, of the first set of documents, that are selected,each of the first plurality of documents being selected after the first document is provided, andeach of the first plurality of documents being selected based on information associated with the first document, andthe first navigation information including information identifying a quantity of selections of the first plurality of documents after the first document is provided;

receiving, by at least one of the one or more server devices, second navigation information identifying a second set of documents that are selected after a second document is provided,the second navigation information identifying a second plurality of documents, of the second set of documents, that are selected,each of the second plurality of documents being selected after the second document is provided, andeach of the second plurality of documents being selected based on information associated with the second document;

generating, by at least one of the one or more server devices, a first data structure that includes information associating the first document with the first navigation information;

generating, by at least one of the one or more server devices, a second data structure that includes information associating the second document with the second navigation information;

comparing, by at least one of the one or more server devices and using the first data structure and the second data structure, the first set of documents to the second set of documents;

generating, by at least one of the one or more server devices, a similarity score based on the comparing and based on the information identifying the quantity of selections of the first plurality of documents after the first document is provided;

determining, by at least one of the one or more server devices, based on the similarity score, that the first document is similar to the second document; and

generating, by at least one of the one or more server devices and based on determining that the first document is similar to the second document, a cluster that includes identification information identifying the first document and the second document.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

One or more server devices may receive first navigation information identifying a first set of documents that are selected after a first document is presented and second navigation information identifying a second set of documents that are selected after a second document is presented; compare the first set of documents to the second set of documents; generate a similarity score based on the comparing; determine based on the similarity score, that the first document is similar to the second document; and generate, based on determining that the first document is similar to the second document, a cluster that includes identification information identifying the first document and the second document.

Citations

18 Claims

1. A method, performed by one or more server devices, the method comprising:
- receiving, by at least one of one or more server devices, first navigation information identifying a first set of documents that are selected after a first document is provided,the first navigation information identifying a first plurality of documents, of the first set of documents, that are selected,each of the first plurality of documents being selected after the first document is provided, andeach of the first plurality of documents being selected based on information associated with the first document, andthe first navigation information including information identifying a quantity of selections of the first plurality of documents after the first document is provided;
  
  receiving, by at least one of the one or more server devices, second navigation information identifying a second set of documents that are selected after a second document is provided,the second navigation information identifying a second plurality of documents, of the second set of documents, that are selected,each of the second plurality of documents being selected after the second document is provided, andeach of the second plurality of documents being selected based on information associated with the second document;
  
  generating, by at least one of the one or more server devices, a first data structure that includes information associating the first document with the first navigation information;
  
  generating, by at least one of the one or more server devices, a second data structure that includes information associating the second document with the second navigation information;
  
  comparing, by at least one of the one or more server devices and using the first data structure and the second data structure, the first set of documents to the second set of documents;
  
  generating, by at least one of the one or more server devices, a similarity score based on the comparing and based on the information identifying the quantity of selections of the first plurality of documents after the first document is provided;
  
  determining, by at least one of the one or more server devices, based on the similarity score, that the first document is similar to the second document; and
  
  generating, by at least one of the one or more server devices and based on determining that the first document is similar to the second document, a cluster that includes identification information identifying the first document and the second document.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1, where the first navigation information includes:
    - information identifying a set of selections of documents in the first set of documents, each selection being made in one of a plurality of sessions during which the first document was provided.
  - 3. The method of claim 2, where a first selection, of the set of selections, is made in a first session during which the first document was provided, andwhere a second selection, of the set of selections, is made in a different second session during which the first document was provided.
  - 4. The method of claim 2, where a particular session, of the plurality of sessions, begins when the first document is provided to a client device from which the selection is received.
  - 5. The method of claim 4, where the particular session ends upon:
    - receiving a particular quantity of selections of documents from the client device,an expiration of a particular duration of time after the first document is provided to the client device, orwhen a browser, via which the first document is displayed, is closed.
  - 6. The method of claim 2, where a particular session, of the plurality of sessions, begins when:
    - the first document is provided to a client device as a search result based on a search query received from the client device, orthe search query is received from the client device, andwhere the particular session ends upon receiving another search query from the client device.
  - 7. The method of claim 1, wherethe first data structure is a first vector,the second data structure is a second vector, andcomparing the first set of documents to the second set of documents includes:
    - determining a similarity of the first vector and the second vector.
  - 8. The method of claim 7, where determining the similarity of the first vector and the second vector includes:
    - determining a cosine similarity of the first vector and the second vector.
  - 9. The method of claim 7, where determining the similarity of the first vector and the second vector includes:
    - determining a proportion of documents, of the first set of documents, that appear in the second set of documents.

10. A non-transitory computer-readable medium storing instructions, the instructions comprising:
- a set of instructions, which, when executed by one or more processors, cause the one or more processors to;
  
  receive first navigation information identifying a first set of documents that are selected after a first document is provided,the first navigation information identifying a first plurality of documents, of the first set of documents, that are selected,each of the first plurality of documents being selected after the first document is provided, andeach of the first plurality of documents being selected based on information associated with the first document, andthe first navigation information including information identifying a quantity of selections of the first plurality of documents after the first document is provided;
  
  receive second navigation information identifying a second set of documents that are selected after a second document is provided,the second navigation information identifying a second plurality of documents, of the second set of documents, that are selected,each of the second plurality of documents being selected after the second document is provided, andeach of the second plurality of documents being selected based on information associated with the second document;
  
  generate a first data structure that includes information associating the first document with the first navigation information;
  
  generate a second data structure that includes information associating the second document with the second navigation information;
  
  compare, using the first data structure and the second data structure, the first set of documents to the second set of documents;
  
  generate a similarity score based on the comparing and based on the information identifying the quantity of selections of the first plurality of documents after the first document is provided;
  
  determine, based on the similarity score, that the first document is similar to the second document; and
  
  assign, based on determining that the first document is similar to the second document, the first document and the second document to a cluster,the cluster including identification information identifying the first document and the second document.
- View Dependent Claims (11, 12, 13)
- - 11. The non-transitory computer-readable medium of claim 10, where the first navigation information includes:
    - information identifying a set of selections of documents in the first set of documents, each selection, of the set of selections, being made in one of a plurality of sessions during which the first document was provided.
  - 12. The non-transitory computer-readable medium of claim 10, wherethe first data structure is a first vector, the second data structure is a second vector, andthe instructions to compare the first set of documents to the second set of documents include instructions that cause the one or more processors to:
    - determine a similarity of the first vector and the second vector.
  - 13. The non-transitory computer-readable medium of claim 12, where the instructions to determine the similarity of the first vector and the second vector include:
    - instructions to determine a cosine similarity of the first vector and the second vector.

14. A system comprising:
- one or more memory devices storing instructions; and
  
  one or more processors to execute the instructions to;
  
  receive first navigation information identifying a first set of documents that are selected after a first document is provided,the first navigation information further identifying a quantity of times that each document, in the first set of documents, was selected after the first document was provided,the first navigation information further identifying a first plurality of documents, of the first set of documents, that are selected,each of the first plurality of documents being selected after the first document is provided, andeach of the first plurality of documents being selected based on information associated with the first document;
  
  receive second navigation information identifying a second set of documents that are selected after a second document is provided,the second navigation information further identifying a quantity of times that each document, in the second set of documents, was selected after the second document was provided,the second navigation information further identifying a second plurality of documents, of the second set of documents, that are selected,
  
  each of the second plurality of documents being selected after the second document is provided, and
  
  each of the second plurality of documents being selected based on information associated with the second document;
  
  generate a first data structure that includes information associating the first document with the first navigation information;
  
  generate a second data structure that includes information associating the second document with the second navigation information;
  
  compare, using the first data structure and the second data structure, the first set of documents to the second set of documents, when comparing the first set of documents to the second set of documents, the one or more processors are to;
  
  generate a similarity score based on the comparing, the similarity score being based on at least one of;
  
  the quantity of times each document, in the first set of documents, was selected after the first document was provided, orthe quantity of times each document, in the second set of documents, was selected after the second document was provided;
  
  determine, based on the similarity score, that the first document is similar to the second document; and
  
  generate, based on determining that the first document is similar to the second document, a cluster that includes identification information identifying the first document and the second document.
- View Dependent Claims (15, 16, 17, 18)
- - 15. The system of claim 14, where the first navigation includes:
    - information identifying a set of selections of documents in the first set of documents, each selection being made in one of a plurality of sessions during which the first document was provided.
  - 16. The system of claim 14, wherethe first data structure is a first vector,the second data structure is a second vector, andwhen comparing the first set of documents to the second set of documents, the one or more processors are to:
    - determine a similarity of the first vector and the second vector.
  - 17. The system of claim 16, where, when determining the similarity of the first vector and the second vector, the one or more processors are to:
    - determine a cosine similarity of the first vector and the second vector.
  - 18. The system of claim 14, where, when generating the similarity score, the one or more processors are to:
    - weight selections of one or more documents, that were selected after a respective one of the first document or the second document, based on a quantity of times that the one or more documents were selected after the respective one of the first document or the second document.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Zhou, Yun, Majkowska, Anna Dagna
Primary Examiner(s)
Ly, Cheyne D

Application Number

US13/251,056
Time in Patent Office

865 Days
Field of Search

707/737, 707/749
US Class Current

707/737
CPC Class Codes

G06F 16/35 Clustering; Classification

Clustering documents based on common document selections

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

Clustering documents based on common document selections

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links