Enhancing a document with supplemental information from another document
First Claim
1. Computer-storage media storing computer-executable instructions that, when executed by a computing device, cause the computing device to perform a method of identifying, from within a corpus of documents, a subject that is relevant to a topic and that is usable to enhance a topic-describing document, the method comprising:
- retrieving the topic-describing document that describes the topic and a linked document that is hyperlinked to the topic-describing document and that describes a subject;
parsing text of the linked document to identify the subject and to identify within the text of the linked document a data-page hyperlink that is associated with a date relevant to the subject and that navigates to a date-describing reference document;
storing an association between the subject and the linked document, wherein the association enables retrieval of the linked document;
determining that the subject in the linked document is relevant to the topic described by the topic-describing document by applying a set of one or more rules to at least content of the linked document and content of the topic-describing document, wherein a grammatical context of the subject in the linked document suggests a degree of relevance of the subject to the topic;
utilizing an association between the subject and the linked document to retrieve the linked document when the subject represented by the linked document is deemed to be relevant to the topic described by the topic-describing document;
analyzing metadata of the linked document that was retrieved to identify temporally significant information and location information of the subject;
using the temporally significant information and the location information of the subject that was obtained from the metadata of the linked document to generate a timeline and a map depicting a geographic location of the subject; and
transforming a first version of the topic-describing document into an enhanced version of the topic-describing document by inserting the timeline and the map into the first version.
2 Assignments
0 Petitions
Accused Products
Abstract
The present technology is related to identifying, from within a corpus of documents, a subject (e.g., person, location, date, etc.) that is relevant to a topic and that is usable to enhance a topic-describing document. Documents within the corpus of documents share a link structure, such that some documents include hyperlinks that enable navigation to the topic-describing document, and the topic-describing document includes hyperlinks that enable navigation to other documents. Text of documents within the corpus is parsed to identify the subject, and a context of the subject suggests a degree of relevance of the subject to the topic. An enhancement type of the subject is determined, and a version of the topic-describing document is enhanced to include a presentation of the subject.
-
Citations
13 Claims
-
1. Computer-storage media storing computer-executable instructions that, when executed by a computing device, cause the computing device to perform a method of identifying, from within a corpus of documents, a subject that is relevant to a topic and that is usable to enhance a topic-describing document, the method comprising:
-
retrieving the topic-describing document that describes the topic and a linked document that is hyperlinked to the topic-describing document and that describes a subject; parsing text of the linked document to identify the subject and to identify within the text of the linked document a data-page hyperlink that is associated with a date relevant to the subject and that navigates to a date-describing reference document; storing an association between the subject and the linked document, wherein the association enables retrieval of the linked document; determining that the subject in the linked document is relevant to the topic described by the topic-describing document by applying a set of one or more rules to at least content of the linked document and content of the topic-describing document, wherein a grammatical context of the subject in the linked document suggests a degree of relevance of the subject to the topic; utilizing an association between the subject and the linked document to retrieve the linked document when the subject represented by the linked document is deemed to be relevant to the topic described by the topic-describing document; analyzing metadata of the linked document that was retrieved to identify temporally significant information and location information of the subject; using the temporally significant information and the location information of the subject that was obtained from the metadata of the linked document to generate a timeline and a map depicting a geographic location of the subject; and transforming a first version of the topic-describing document into an enhanced version of the topic-describing document by inserting the timeline and the map into the first version. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A system, which includes a processor and computer-storage media, that identifies, from within a corpus of documents, a subject that is relevant to a topic and that is usable to enhance a topic-describing document, the system comprising:
-
a document retriever that leverages a processor coupled with a computer-memory device to retrieve a first set of documents from a datastore, each document of the first set of documents including a respective in-hyperlink that, when input, navigates to the topic-describing document, and a second set of documents, each of which is referenced by a respective out-hyperlink that is embedded in text of the topic-describing document; a parser that leverages the processor to parse text or HTML of the topic-describing document, text or HTML of the first set of documents, text or HTML of the second set of documents, or a combination thereof, to generate a set of potentially relevant subjects, wherein a respective link is stored that links each potentially relevant subject with a respective document; a subject evaluator that applies a rule to a respective context of each potentially relevant subject, wherein application of the rule identifies a relevant subject that is relevant to the topic, and wherein a link stored in association with the relevant subject is used to retrieve a document describing the relevant subject; an enhancement-type identifier that parses the document to identify within text of the document a data-page hyperlink that is associated with a date relevant to the subject and that navigates to a date-describing reference document wherein the date relevant to the subject is used to generate a timeline depicting a temporal significance relevant to the subject; and a document enhancer that transforms a first version of the topic-describing document into an enhanced version of the topic-describing document by adding the timeline to the topic-describing document. - View Dependent Claims (10, 11, 12)
-
-
13. A computer-implemented method of identifying, from within a corpus of documents, a subject that is relevant to a topic and that is usable to enhance a topic-describing document, the method comprising:
-
retrieving the topic-describing document and a linked document, wherein a hyperlink is embedded within text of the linked document that, when input, enables navigation to the topic-describing document; grammatically parsing text of the linked document to identify the subject; determining that the subject in the linked document is relevant to the topic described by the topic-describing document by applying a set of one or more rules to at least content of the linked document and content of the topic-describing document, wherein a grammatical context of the subject suggests a degree of relevance of the subject to the topic; storing an association between the subject and the linked document to represent the subject using the linked document, wherein the association enables retrieval of the linked document; utilizing the association to retrieve the linked document when the subject represented by the linked document is deemed to be relevant to the topic described by the topic-describing document; identifying within text of the linked document a date-page hyperlink that forms at least part of the grammatical context of the subject and that navigates to a date-describing reference document, which suggests that the subject is of temporal significance; using the temporal significance of the subject that was obtained from the date-describing reference document to generate a timeline indicating the temporal significance of the subject; and transforming a first version of the topic-describing document into an enhanced version by inserting the timeline into the topic-describing document.
-
Specification