Clustering of search results

US 9,443,008 B2
Filed: 07/14/2010
Issued: 09/13/2016
Est. Priority Date: 07/14/2010
Status: Active Grant

First Claim

Patent Images

1. A method, comprising:

clustering a plurality of documents to obtain one or more first sets of clusters, wherein a first cluster of the one or more first sets of clusters comprises at least two first individual documents of the plurality of documents;

accessing a search query after the clustering the plurality of documents;

identifying a search result in response to the search query, wherein the search result comprises the at least two first individual documents of the plurality of documents; and

clustering the search result to obtain a second set of clusters, wherein second individual documents of the search result belong to one second cluster of the second set of clusters, the clustering the search result comprising;

for a unique pair of the second individual documents, computing a similarity measure for the second individual documents with respect to the search query based, at least in part, on the one or more first sets of clusters, wherein the similarity measure for the second individual documents is computed based, at least in part, on a weighted sum of a clustering similarity between the second individual documents with respect to the one or more first sets of clusters and a query-based similarity between the second individual documents with respect to the search query; and

clustering the second individual documents based, at least in part, on the similarity measure;

wherein the query-based similarity between the second individual documents is based, at least in part, on a fraction of a sum of;

a textual match between the search query and the second individual documents to the textual match between the query, andthe intersection of the documents; and

wherein the clustering similarity between the second individual documents with respect to the one or more first sets of clusters is based, at least in part, on a weighted combination of agreements between the one or more first sets of clusters and the second individual documents.

View all claims

9 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

One particular embodiment clusters a plurality of documents using one or more clustering algorithms to obtain one or more first sets of clusters, wherein: each first set of clusters results from clustering the documents using one of the clustering algorithms; and with respect to each first set of clusters, each of the documents belongs to one of the clusters from the first set of clusters; accesses a search query; identifies a search result in response to the search query, wherein the search result comprises two or more of the documents; and clusters the search result to obtain a second set of clusters, wherein each document of the search result belongs to one of the clusters from the second set of clusters.

18 Citations

View as Search Results

18 Claims

1. A method, comprising:
- clustering a plurality of documents to obtain one or more first sets of clusters, wherein a first cluster of the one or more first sets of clusters comprises at least two first individual documents of the plurality of documents;
  
  accessing a search query after the clustering the plurality of documents;
  
  identifying a search result in response to the search query, wherein the search result comprises the at least two first individual documents of the plurality of documents; and
  
  clustering the search result to obtain a second set of clusters, wherein second individual documents of the search result belong to one second cluster of the second set of clusters, the clustering the search result comprising;
  
  for a unique pair of the second individual documents, computing a similarity measure for the second individual documents with respect to the search query based, at least in part, on the one or more first sets of clusters, wherein the similarity measure for the second individual documents is computed based, at least in part, on a weighted sum of a clustering similarity between the second individual documents with respect to the one or more first sets of clusters and a query-based similarity between the second individual documents with respect to the search query; and
  
  clustering the second individual documents based, at least in part, on the similarity measure;
  
  wherein the query-based similarity between the second individual documents is based, at least in part, on a fraction of a sum of;
  
  a textual match between the search query and the second individual documents to the textual match between the query, andthe intersection of the documents; and
  
  wherein the clustering similarity between the second individual documents with respect to the one or more first sets of clusters is based, at least in part, on a weighted combination of agreements between the one or more first sets of clusters and the second individual documents.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method recited in claim 1, wherein for the unique pair of result documents, the computing of the similarity measure for the unique pair of result documents as a weighted sum is further based, at least in part, on a cosine similarity between the two documents.
  - 3. The method recited in claim 1, further comprising:
    - accessing a new document; and
      
      determining whether the new document belongs to a cluster from the first set of clusters;
      
      in response to determining that the new document belongs to the cluster from the first set of clusters, adding the new document to the cluster from first set of clusters; and
      
      in response to determining that the new document does not belong to any cluster from the first set of clusters, creating a new cluster, adding the new document to the new cluster, and adding the new cluster to the first set of clusters.
  - 4. The method recited in claim 1, further comprising grouping clusters from the first set of clusters into a plurality of topic models, wherein individual clusters from the first set of clusters belong to one of the topic models.
  - 5. The method recited in claim 4, further comprising:
    - accessing a new document; and
      
      determining one of the topic models corresponding to a first clustering associated with the new document;
      
      determining whether the new document belongs to a cluster of the one of the topic models;
      
      in response to determining that the new document belongs to the cluster of the one of the topic models, adding the new document to the cluster of the one of the topic models; and
      
      in response to determining that the new document does not belong to any clusters of the one of the topic models, creating a new cluster, adding the new document to the new cluster, adding the new cluster to a first set of clusters, and assigning the new cluster to the one of the topic models.
  - 6. The method recited in claim 1, further comprising presenting the second individual documents of the search result according to the second set of clusters.

7. A system, comprising:
- a memory comprising instructions executable by one or more processors; and
  
  one or more processors coupled to the memory, the one or more processors to execute the instructions to;
  
  cluster a plurality of documents to obtain one or more first sets of clusters, wherein a first cluster of the one or more first sets of clusters is to comprise at least two first individual documents of the plurality of documents;
  
  access a search query after the cluster of the plurality of documents;
  
  identify a search result in response to the search query, the search result to comprise the at least two first individual documents of the plurality of documents;
  
  cluster the search result to obtain a second set of clusters, second individual documents of the search result to belong to one second cluster of the second set of clusters, the cluster of the search result to comprise;
  
  for a unique pair of the second individual documents a similarity measure for the result documents with respect to the search query to be computed to be based, at least in part, on the one or more first sets of clusters, wherein the similarity measure for the second individual documents is to be computed to be based, at least in part, on a weighted sum of a clustering similarity between the second individual documents with respect to the one or more first sets of clusters and a query-based similarity between the second individual documents with respect to the search query; and
  
  the second individual documents to be clustered to be based, at least in part, on the similarity measure;
  
  wherein the query-based similarity between the second individual documents is to be based, at least in part, on a fraction of a sum of;
  
  a textual match between the search query and the second individual documents to the textual match between the query, andthe intersection of the documents; and
  
  wherein the clustering similarity between the second individual documents with respect to the one or more first sets of clusters is to be based, at least in part, on a weighted combination of agreements between the one or more first sets of clusters and the second individual documents.
- View Dependent Claims (8, 9, 10, 11, 12)
- - 8. The system recited in claim 7, wherein for the unique pair of result documents, to compute the similarity measure as a weighted sum is to be further based, at least in part, on a cosine similarity between the two documents.
  - 9. The system recited in claim 7, wherein the instructions are further executable by the one or more processors to:
    - access a new document; and
      
      determine whether the new document is to belong to a cluster from the first set of clusters;
      
      in response to a determination that the new document is to belong to the cluster from the first set of clusters, to add the new document to the one of the clusters from the first set of clusters; and
      
      in response to a determination that the new document does not belong to any cluster from the first set of clusters, to create a new cluster, to add the new document to the new cluster, and to add the new cluster to the first set of clusters.
  - 10. The system recited in claim 7, wherein the instructions are further executable by the one or more processors to group clusters from the first set of clusters into a plurality of topic models, individual clusters from the first set of clusters to belong to one of the topic models.
  - 11. The system recited in claim 10, wherein the instructions are further executable by the one or more processors to:
    - access a new document; and
      
      determine one of the topic models corresponding to a first clustering to be associated with the new document;
      
      determine whether the new document is to belong to a cluster of the one of the topic models;
      
      in response to a determination that the new document is to belong to the cluster of the one of the topic models, add the new document to the cluster of the one of the topic models; and
      
      in response to a determination that the new document is to not belong to any of the clusters of the one of the topic models, to create a new cluster, to add the new document to the new cluster, to add the new cluster to the first set of clusters, and to assign the new cluster to the one of the topic models.
  - 12. The system recited in claim 7, wherein the instructions are further executable by the one or more processors to present the second individual documents of the search result to be according to the second set of clusters.

13. One or more computer-readable tangible storage media comprising:
- instructions executable by one or more computer systems to;
  
  cluster a plurality of documents to obtain one or more first sets of clusters, wherein a first cluster of the one or more first sets of clusters is to comprise at least two first individual documents of the plurality of documents;
  
  access a search query after the cluster of the plurality of documents;
  
  identify a search result in response to the search query, the search result to comprise the at least two first individual documents of the plurality of documents; and
  
  cluster the search result to obtain a second set of clusters, second individual result documents of the search result to belong to one second cluster of the second set of clusters, the cluster of the search result to comprise;
  
  for a unique pair of the second individual documents, a similarity measure for the second individual documents with respect to the search query to be computed to be based, at least in part, on the one or more first sets of clusters, wherein the similarity measure for the second individual documents is to be computed to be based, at least in part, on a weighted sum of a clustering similarity between the second individual documents with respect to the one or more first sets of clusters and a query-based similarity between the second individual documents with respect to the search query; and
  
  the second individual documents to be clustered to be based, at least in part, on the similarity measure;
  
  wherein the query-based similarity between the second individual documents is to be based, at least in part, on a fraction of a sum of;
  
  a textual match between the search query and the second individual documents to the textual match between the query, andthe intersection of the documents; and
  
  wherein the clustering similarity between the second individual documents with respect to the one or more first sets of clusters is to be based, at least in part, on a weighted combination of agreements between the one or more first sets of clusters and the second individual documents.
- View Dependent Claims (14, 15, 16, 17, 18)
- - 14. The media recited in claim 13, wherein for the unique pair of result documents, to compute the similarity measure as a weighted sum is to be further based, at least in part, on a cosine similarity between the two documents.
  - 15. The media recited in claim 13, wherein the instructions are further executable by the one or more computer systems to:
    - access a new document; and
      
      determine whether the new document is to belong to a cluster from the first set of clusters;
      
      in response to a determination that the new document is to belong to the cluster from the first set of clusters, add the new document to the cluster from the first set of clusters; and
      
      in response to a determination that the new document does not belong to any cluster from the first set of clusters, create a new cluster, add the new document to the new cluster, and add the new cluster to the first set of clusters.
  - 16. The media recited in claim 13, wherein the instructions are further executable by the one or more computer systems to group clusters from the first set of clusters into a plurality of topic models, individual cluster from the first set of clusters to belong to one of the topic models.
  - 17. The media recited in claim 16, wherein the instructions are further executable by the one or more computer systems to:
    - access a new document; and
      
      determine one of the topic models to correspond to a first clustering to be associated with the new document;
      
      determine whether the new document is to belong to a cluster of the one of the topic models;
      
      in response to a determination that the new document is to belong to the cluster of the one of the topic models, add the new document to the cluster of the one of the topic models; and
      
      in response to a determination that the new document is to not belong to any of the clusters of the one of the topic models, to create a new cluster, to add the new document to the new cluster, to add the new cluster to the first set of clusters, and to assign the new cluster to the one of the topic models.
  - 18. The media recited in claim 13, wherein the instructions are further executable by the one or more computer systems to present the second individual documents of the search result to be according to the second set of clusters.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
R2 Solutions LLC (Acacia Research Corporation)
Original Assignee
Yahoo! Inc. (Apollo Global Management, Inc.)
Inventors
Vadrevu, Srinivas, Chang, Yi, Zheng, Zhaohui, Long, Bo
Primary Examiner(s)
LIN, SHEW FEN

Application Number

US12/835,954
Publication Number

US 20120016877A1
Time in Patent Office

2,253 Days
Field of Search
US Class Current

1/1
CPC Class Codes

G06F 16/334   Query execution G06F16/335 ...

G06F 16/338   Presentation of query results

G06F 16/35   Clustering; Classification

G06F 16/93   Document management systems

Clustering of search results

First Claim

9 Assignments

0 Petitions

Accused Products

Abstract

18 Citations

18 Claims

Specification

Use Cases

Quick Links

Others

Clustering of search results

First Claim

9 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

18 Citations

18 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others