×

Cluster-based identification of news stories

  • US 9,116,995 B2
  • Filed: 03/29/2012
  • Issued: 08/25/2015
  • Est. Priority Date: 03/30/2011
  • Status: Active Grant
First Claim
Patent Images

1. A method in a content recommendation system, the method comprising:

  • identifying a news story about an event, the news story including multiple related content items that each give an account of the event and that each reference multiple entities or categories that are each electronically represented by the content recommendation system, comprising;

    processing content items to determine semantic information that includes identified entities and relations between the identified entities;

    storing the identified entities and relations in a repository of the content recommendation system;

    generating a cluster that includes the multiple related content items, based at least in part on how many entities each of the multiple related content items has in common with one or more other of the multiple related content items, wherein generating the cluster includes;

    finding a candidate cluster of a plurality of clusters that is nearest to one of the multiple related content items by computing a cosine distance between a term vector that represents the one content item and a term vector that represents a content item of the candidate cluster; and

    determining whether the candidate cluster is a suitable cluster for the one content item, based on all of;

    cosine distances between the one content item and content items of the candidate cluster, a quantity of common keyterms between the one content item and content items of the candidate cluster, and on whether a sufficiently high percentage of content items of the candidate cluster have a cosine distance to the content item that is below a predetermined threshold;

    if the candidate cluster is determined to be a suitable cluster, adding the one content item to the candidate cluster; and

    if the candidate cluster is not determined to be a suitable cluster, creating a new cluster that includes the one content item as a seed; and

    storing an indication of the identified news story and the generated cluster.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×