Methods and apparatus for ranking documents
First Claim
Patent Images
1. A method comprising:
- identifying, by one or more server devices, a first document;
analyzing, by the one or more server devices, content of the first document to determine a first topic;
identifying a set of sources by determining, for each source of a plurality of sources, a source score for the source and including the source in the set when the source score determined for the source satisfies a threshold, wherein determining the source score for each source of the plurality of sources comprises;
detecting, by the one or more server devices, original articles published by the source, wherein each original article is a document that was first published by the source;
analyzing, by the one or more server devices, content of the original articles to determine a given category for the content of the original articles from the source;
assigning, by the one or more server devices, a source score for the source for the given category;
determining, by the one or more server devices, that the source score for the source satisfies the threshold for the given category; and
selecting the source to include in the set of sources;
determining, by the one or more server devices and from the set of sources satisfying the threshold, a number of documents from the set of sources, each of the documents having content related to a topic that is similar to the first topic;
forming, by the one or more server devices, a first subject cluster for the first topic including the number of documents from the set of sources satisfying the threshold and the first document;
receiving, by the one or more server devices, a second document;
analyzing, by the one or more server device, content of the second document to determining a second topic; and
responsive to the second topic being similar to the first topic of the first subject cluster, placing the second document in the first subject cluster.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods and apparatus are described for scoring documents in response, in part, to parameters related to the document, source, and/or cluster score. Methods and apparatus are also described for scoring a cluster in response, in part, to parameters related to documents within the cluster and/or sources corresponding to the documents within the cluster. In one embodiment, the invention may detect at least one document within the cluster; analyze a parameter corresponding to the document; and compute a cluster score based, in part, on the parameter, wherein the cluster score corresponds with at least one document within the cluster.
-
Citations
21 Claims
-
1. A method comprising:
-
identifying, by one or more server devices, a first document; analyzing, by the one or more server devices, content of the first document to determine a first topic; identifying a set of sources by determining, for each source of a plurality of sources, a source score for the source and including the source in the set when the source score determined for the source satisfies a threshold, wherein determining the source score for each source of the plurality of sources comprises; detecting, by the one or more server devices, original articles published by the source, wherein each original article is a document that was first published by the source; analyzing, by the one or more server devices, content of the original articles to determine a given category for the content of the original articles from the source; assigning, by the one or more server devices, a source score for the source for the given category; determining, by the one or more server devices, that the source score for the source satisfies the threshold for the given category; and selecting the source to include in the set of sources; determining, by the one or more server devices and from the set of sources satisfying the threshold, a number of documents from the set of sources, each of the documents having content related to a topic that is similar to the first topic; forming, by the one or more server devices, a first subject cluster for the first topic including the number of documents from the set of sources satisfying the threshold and the first document; receiving, by the one or more server devices, a second document; analyzing, by the one or more server device, content of the second document to determining a second topic; and responsive to the second topic being similar to the first topic of the first subject cluster, placing the second document in the first subject cluster. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A computer program product comprising a non-transitory computer readable medium including a computer readable program, wherein the computer readable program when executed on a computer causes the computer to perform operations comprising:
-
identifying a first document; analyzing content of the first document to determine a first topic; identifying a set of sources by determining, for each source of a plurality of sources, a source score for the source and including the source in the set when the source score determined for the source satisfies a threshold, wherein determining the source score for each source of the plurality of sources comprises; detecting, by the one or more server devices, original articles published by the source, wherein each original article is a document that was first published by the source; analyzing, by the one or more server devices, content of the original articles to determine a given category for the content of the original articles from the source; assigning, by the one or more server devices, a source score for the source for the given category; determining, by the one or more server devices, that the source score for the source satisfies the threshold for the given category; and selecting the source to include in the set of sources; determining from the set of sources satisfying the threshold a number of documents from the set of sources, each of the documents having content related to a topic that is similar to the first topic; forming a first subject cluster for the first topic including the number of documents from the set of sources satisfying the threshold and the first document; receiving a second document; analyzing content of the second document to determining a second topic; and responsive to the second topic being similar to the first topic of the first subject cluster, placing the second document in the first subject cluster. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A system comprising:
-
a processor; and a memory storing instructions that, when executed, cause the system to perform operations comprising; identifying a first document; analyzing content of the first document to determine a first topic; identifying a set of sources by determining, for each source of a plurality of sources, a source score for the source and including the source in the set when the source score determined for the source satisfies a threshold, wherein determining the source score for each source of the plurality of sources comprises; detecting, by the one or more server devices, original articles published by the source, wherein each original article is a document that was first published by the source; analyzing, by the one or more server devices, content of the original articles to determine a given category for the content of the original articles from the source; assigning, by the one or more server devices, a source score for the source for the given category; determining, by the one or more server devices, that the source score for the source satisfies the threshold for the given category; and
selecting the source to include in the set of sources;determining from the set of sources satisfying the threshold a number of documents from the set of sources, each of the documents having content related to a topic that is similar to the first topic; forming a first subject cluster for the first topic including the number of documents from the set of sources satisfying the threshold and the first document;
receiving a second document;analyzing content of the second document to determining a second topic; and responsive to the second topic being similar to the first topic of the first subject cluster, placing the second document in the first subject cluster. - View Dependent Claims (16, 17, 18, 19, 20, 21)
-
Specification