CLUSTERING AGGREGATOR FOR RSS FEEDS

US 20090327320A1
Filed: 06/26/2008
Published: 12/31/2009
Est. Priority Date: 06/26/2008
Status: Active Grant

First Claim

Patent Images

1. A method for merging really simple syndication (RSS) feeds, comprising:

(a) merging stories containing one or more terms into one or more clusters based on one or more links between the stories;

(b) determining a cluster frequency with which the terms occur in each cluster;

(c) determining a diameter for each cluster; and

(d) determining a cluster that is most similar to one of the clusters based on the cluster frequency; and

(e) merging the most similar cluster with the one of the clusters based on the diameter and the cluster frequency.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method for merging really simple syndication (RSS) feeds. Stories containing one or more terms may be merged into one or more clusters based on one or more links between the stories. A cluster frequency with which the terms occur in each cluster may be determined. A diameter for each cluster may be determined. A cluster that is most similar to one of the clusters may be determined based on the cluster frequency. The most similar cluster with the one of the clusters may be determined based on each diameter, and each cluster frequency.

Citations

20 Claims

1. A method for merging really simple syndication (RSS) feeds, comprising:
- (a) merging stories containing one or more terms into one or more clusters based on one or more links between the stories;
  
  (b) determining a cluster frequency with which the terms occur in each cluster;
  
  (c) determining a diameter for each cluster; and
  
  (d) determining a cluster that is most similar to one of the clusters based on the cluster frequency; and
  
  (e) merging the most similar cluster with the one of the clusters based on the diameter and the cluster frequency.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method of claim 1, further comprising:
    - determining a story frequency with which the terms occur in each story;
      
      determining a similarity between two linked stories based on the story frequency; and
      
      splitting the two linked stories into two clusters based on the similarity.
  - 3. The method of claim 2, wherein the story frequency comprises a term vector having a weight for each of the terms in the stories.
  - 4. The method of claim 3, wherein the similarity is a cosine similarity between each term vector of the two linked stories.
  - 5. The method of claim 3, wherein the weight for each of the terms is based on a term frequency and inverse document frequency algorithm.
  - 6. The method of claim 1, wherein the cluster frequency comprises a centroid vector having an average weight for each of the terms in all stories within each cluster.
  - 7. The method of claim 6, wherein determining the cluster that is most similar to the one of the clusters is further based on a cosine similarity between a centroid vector of the cluster that is most similar to the one of the clusters and a centroid vector of the one of the clusters.
  - 8. The method of claim 6, wherein the average weight for each of the terms in all stories within each cluster is based on a term frequency and inverse document frequency algorithm and an amount of stories within each cluster.
  - 9. The method of claim 1, further comprising:
    - (f) determining a cluster frequency of the merged clusters; and
      
      (g) determining a diameter of the merged clusters.
  - 10. The method of claim 9, further comprising recursively repeating steps d-g for multiple levels of a cluster hierarchy based on the merged clusters.

11. A computer-readable medium having stored thereon computer-executable instructions which, when executed by a computer, cause the computer to:
- (a) determine a story frequency with which one or more terms occur in one or more stories of one or more really simple syndication (RSS) feeds;
  
  (b) merge the stories into one or more clusters based on one or more links between the stories;
  
  (c) determine a similarity between two linked stories based on the story frequency;
  
  (d) split the two linked stories into two different clusters based on the similarity;
  
  (e) determine a cluster frequency with which the terms occur in each cluster;
  
  (f) determine a diameter for each cluster; and
  
  (g) determine a cluster that is most similar to one of the clusters based on the cluster frequency; and
  
  (h) merge the most similar cluster with the one of the clusters based on the diameter and the cluster frequency.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18)
- - 12. The computer-readable medium of claim 11, wherein the story frequency comprises a term vector having a weight for each of the terms in the stories, and the weight is based on a term frequency and inverse document frequency algorithm.
  - 13. The computer-readable medium of claim 12, wherein the similarity is a cosine similarity between each term vector of the two linked stories.
  - 14. The computer-readable medium of claim 11, wherein the cluster frequency comprises a centroid vector having an average weight for each of the terms in all stories within each cluster.
  - 15. The computer-readable medium of claim 14, wherein the cluster is determined to be most similar to the one of the clusters based on a cosine similarity between a centroid vector of the cluster that is most similar to the one of the clusters and a centroid vector of the one of the clusters.
  - 16. The computer-readable medium of claim 14, wherein the average weight for each of the terms in all stories within each cluster is based on a term frequency and inverse document frequency algorithm, and an amount of stories within each cluster.
  - 17. The computer-readable medium of claim 11, further comprising computer-executable instructions which, when executed by a computer, cause the computer to:
    - (i) determine a cluster frequency of the merged clusters; and
      
      (j) determine a diameter of the merged clusters.
  - 18. The computer-readable medium of claim 17, further comprising computer-executable instructions which, when executed by a computer, cause the computer to recursively repeat steps g-j for multiple levels of a cluster hierarchy based on the merged clusters.

19. A computer system, comprising:
- a processor; and
  
  a memory comprising program instructions executable by the processor to;
  
  (a) determine a term vector for each of one or more stories of one or more really simple syndication (RSS) feeds, the term vector comprising a weight for each term in the stories, and the weight being based on a term frequency and inverse document frequency algorithm;
  
  (b) merge the stories into one or more clusters based on one or more links between the stories;
  
  (c) determine a story cosine similarity between two linked stories based on each term vector of the two linked stories;
  
  (d) split the two linked stories into two different clusters based on the story cosine similarity;
  
  (e) determine a centroid vector for each cluster that is an average of each term vector for all stories within each cluster;
  
  (f) determine a diameter for each cluster; and
  
  (g) determine a cluster that is most similar to one of the clusters based on a cluster cosine similarity of a centroid vector of the cluster that is most similar to the one of the clusters and a centroid vector of the one of the clusters; and
  
  (h) merge the most similar cluster with the one of the clusters based on the diameter and the cluster cosine similarity.
- View Dependent Claims (20)
- - 20. The computer system of claim 19, wherein the memory further comprises program instructions executable by the processor to:
    - (i) determine a cluster frequency of the merged clusters;
      
      (j) determine a diameter of the merged clusters; and
      
      (k) recursively repeat steps g-j for multiple levels of a cluster hierarchy based on the merged clusters.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Liu, Ning, Chen, Zheng, Yan, Jun, Wang, Jian, Ji, Lei

Granted Patent

US 7,958,125 B2
Time in Patent Office

Days
Field of Search
US Class Current

1/1
CPC Class Codes

G06F 16/35 Clustering; Classification

CLUSTERING AGGREGATOR FOR RSS FEEDS

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

CLUSTERING AGGREGATOR FOR RSS FEEDS

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links