Methods and apparatus for clustering news content
First Claim
1. A method of scoring a source, comprising:
- identifying the source;
detecting a plurality of documents published by the source;
calculating measures of freshness of the plurality of documents based on determining a difference between times that events occur and times that the identified source published the plurality of documents that includes content relating to the events;
calculating measures of quality for the plurality of documents based on at least one of;
numbers of views of the plurality of documents during a time period, ornumbers of links pointing to the plurality of documents; and
determining a source score for the source based on the measures of freshness and the measures of quality.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods and apparatus are described for scoring documents in response, in part, to parameters related to the document, source, and/or cluster score. Methods and apparatus are also described for scoring a cluster in response, in part, to parameters related to documents within the cluster and/or sources corresponding to the documents within the cluster. In one embodiment, the invention may identify the source; detect a plurality of documents published by the source; analyze the plurality of documents with respect to at least one parameter; and determine a source score for the source in response, in part, to the parameter. In another embodiment, the invention may identify a topic; identify a plurality of clusters in response to the topic; analyze at least one parameter corresponding to each of the plurality of clusters; and calculate a cluster score for each of the plurality of clusters in response, in part, to the parameter.
-
Citations
28 Claims
-
1. A method of scoring a source, comprising:
-
identifying the source; detecting a plurality of documents published by the source; calculating measures of freshness of the plurality of documents based on determining a difference between times that events occur and times that the identified source published the plurality of documents that includes content relating to the events; calculating measures of quality for the plurality of documents based on at least one of; numbers of views of the plurality of documents during a time period, or numbers of links pointing to the plurality of documents; and determining a source score for the source based on the measures of freshness and the measures of quality. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A computer-readable medium having computer executable instructions for performing a method comprising:
-
identifying a news source; detecting a plurality of documents published by the news source; calculating a measure of freshness of a first document of the plurality of documents based on determining a difference between a time that an event occurred and a time that the identified news source published the first document that includes content relating to the event; calculating a measure of quality for a second document of the plurality of documents based on at least one of; numbers of views of the second document during a time period, or numbers of links pointing to the second document; and determining a source score for the news source based on the measure of freshness and the measure of quality. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21)
-
-
22. A system for scoring a source, comprising:
-
means for identifying the source; means for detecting a plurality of documents published by the source; means for calculating a measure of freshness of a first document of the plurality of documents based on determining a difference between a time that an event occurred and a time that the source published the first document that includes content relating to the event; means for calculating a measure of quality for a second document of the plurality of documents based on at least one of; numbers of views of the second document during a time period, or numbers of links pointing to the second documents; means for determining a source score for the source based on the measure of freshness and the measure of quality; and a memory device to store the source score. - View Dependent Claims (24)
-
-
23. The system of 22, further comprising:
means for removing duplicate documents in the plurality of documents.
-
25. A computer-implemented method comprising:
-
identifying a news source; detecting a plurality of news articles published by the news source; removing news articles in the plurality of news articles that are determined to be duplicates of other news articles in the plurality of news articles to form a second plurality of news articles; calculating measures of freshness of the second plurality of news articles based on determining a difference between times that events occur and the news source published the second plurality of news articles that includes content relating to the events; calculating measures of quality for the second plurality of news articles based on at least one of; numbers of views of the second plurality of news articles during a time period, or numbers of links pointing to the second plurality of news articles; determining a score for the news source based on the measures of freshness and the measures of quality; and storing the score for the news source. - View Dependent Claims (26, 27, 28)
-
Specification