Cluster-based identification of news stories
First Claim
1. A method in a content recommendation system, the method comprising:
- identifying a news story about an event, the news story including multiple related content items that each give an account of the event and that each reference multiple entities or categories that are each electronically represented by the content recommendation system, comprising;
processing content items to determine semantic information that includes identified entities and relations between the identified entities;
storing the identified entities and relations in a repository of the content recommendation system;
generating a cluster that includes the multiple related content items, based at least in part on how many entities each of the multiple related content items has in common with one or more other of the multiple related content items, wherein generating the cluster includes;
finding a candidate cluster of a plurality of clusters that is nearest to one of the multiple related content items by computing a cosine distance between a term vector that represents the one content item and a term vector that represents a content item of the candidate cluster; and
determining whether the candidate cluster is a suitable cluster for the one content item, based on all of;
cosine distances between the one content item and content items of the candidate cluster, a quantity of common keyterms between the one content item and content items of the candidate cluster, and on whether a sufficiently high percentage of content items of the candidate cluster have a cosine distance to the content item that is below a predetermined threshold;
if the candidate cluster is determined to be a suitable cluster, adding the one content item to the candidate cluster; and
if the candidate cluster is not determined to be a suitable cluster, creating a new cluster that includes the one content item as a seed; and
storing an indication of the identified news story and the generated cluster.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods, systems, and techniques for cluster-based content recommendation are described. Some embodiments provide a content recommendation system (“CRS”) configured to recommend news stories about events or occurrences. In some embodiments, a news story about an event includes multiple related content items that each include an account of the event and that each reference one or more entities or categories that are represented by the CRS. In one embodiment, the CRS identifies news stories by generating clusters of related content items. Then, in response to a received query that indicates a keyterm, entity, or category, the CRS determines and provides indications of one or more news stories that are relevant to the received query. In some embodiments, at least some of these techniques are employed to implement a news story recommendation facility in an online news service.
147 Citations
21 Claims
-
1. A method in a content recommendation system, the method comprising:
identifying a news story about an event, the news story including multiple related content items that each give an account of the event and that each reference multiple entities or categories that are each electronically represented by the content recommendation system, comprising; processing content items to determine semantic information that includes identified entities and relations between the identified entities; storing the identified entities and relations in a repository of the content recommendation system; generating a cluster that includes the multiple related content items, based at least in part on how many entities each of the multiple related content items has in common with one or more other of the multiple related content items, wherein generating the cluster includes; finding a candidate cluster of a plurality of clusters that is nearest to one of the multiple related content items by computing a cosine distance between a term vector that represents the one content item and a term vector that represents a content item of the candidate cluster; and determining whether the candidate cluster is a suitable cluster for the one content item, based on all of;
cosine distances between the one content item and content items of the candidate cluster, a quantity of common keyterms between the one content item and content items of the candidate cluster, and on whether a sufficiently high percentage of content items of the candidate cluster have a cosine distance to the content item that is below a predetermined threshold;if the candidate cluster is determined to be a suitable cluster, adding the one content item to the candidate cluster; and if the candidate cluster is not determined to be a suitable cluster, creating a new cluster that includes the one content item as a seed; and storing an indication of the identified news story and the generated cluster. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
-
19. A computing system configured to recommend content, comprising:
-
a memory; a module stored on the memory that is configured, when executed, to identify a news story about an event, the news story including multiple related content items that each give an account of the event and that each reference multiple entities or categories that are each electronically represented by the content recommendation system, by; processing content items to determine semantic information that includes identified entities and relations between the identified entities; storing the identified entities and relations in a repository of the content recommendation system; generating a cluster that includes the multiple related content items, based at least in part on how many entities each of the multiple related content items has in common with one or more other of the multiple related content items, wherein generating the cluster includes; finding a candidate cluster of a plurality of clusters that is nearest to one of the multiple related content items by computing a cosine distance between a term vector that represents the one content item and a term vector that represents a content item of the candidate cluster; determining whether the candidate cluster is a suitable cluster for the one content item, based on all of;
cosine distances between the one content item and content items of the candidate cluster, a quantity of common keyterms between the one content item and content items of the candidate cluster, and on whether a sufficiently high percentage of content items of the candidate cluster have a cosine distance to the content item that is below a predetermined threshold;if the candidate cluster is determined to be a suitable cluster, adding the one content item to the candidate cluster; and if the candidate cluster is not determined to be a suitable cluster, creating a new cluster that includes the one content item as a seed; and storing an indication of the identified news story and the generated cluster. - View Dependent Claims (20, 21)
-
Specification