Utilizing classification and text analytics for annotating documents to allow quick scanning
First Claim
Patent Images
1. A system for annotating a document stored on a non-transitory computer readable storage medium of a computer comprising:
- a classifier executed by the computer, wherein the classifier;
(i) determines a type of the document, wherein the type of document is a format within which the document is written;
(ii) determines a subject domain of the document, wherein the subject domain is a field, topic or genre of content set forth in the document; and
(iii) segments the document into one or more paragraphs and sections based on the document'"'"'s structure;
(b) an annotation model, executed by the computer, with information to determine and drive an annotation strategy based on the type of the document, wherein the annotation strategy specifies one or more annotations to utilize and a location in the document'"'"'s original text and on a margin of the document for the one or more annotations;
(c) a text analytics system, executed by the computer, wherein the subject domain determines which domain model to load into the text analytics system and the domain model identifies terms, phrases, entities, and concepts of the subject domain to be annotated in the document, and wherein the text analytics system;
(i) provides the one or more annotations in the document'"'"'s original text and on the margin of the document for the paragraphs of the document based on the domain model and the annotation model, wherein the one or more annotations comprise domain-specific keywords and concepts; and
(ii) aggregates the one or more annotations in the margin of the document for the paragraphs of the document into one or more section-level aggregated annotations for the sections of the document based on the annotation model; and
(d) a custom viewer/renderer application, executed by the computer, that annotates the document with the one or more annotations in the document'"'"'s original text and on the margin of the document for the paragraphs, and with the one or more section-level aggregated annotations for the sections of the document, and renders the document including the one or more annotations and the one or more section-level aggregated annotations.
1 Assignment
0 Petitions
Accused Products
Abstract
Classification, text analytics, and natural language processing are used to evaluate passages, extract text, identify concepts, and provide visual cues and notations to assist readers in scanning and evaluating large amounts of information in a document.
47 Citations
15 Claims
-
1. A system for annotating a document stored on a non-transitory computer readable storage medium of a computer comprising:
-
a classifier executed by the computer, wherein the classifier; (i) determines a type of the document, wherein the type of document is a format within which the document is written; (ii) determines a subject domain of the document, wherein the subject domain is a field, topic or genre of content set forth in the document; and (iii) segments the document into one or more paragraphs and sections based on the document'"'"'s structure; (b) an annotation model, executed by the computer, with information to determine and drive an annotation strategy based on the type of the document, wherein the annotation strategy specifies one or more annotations to utilize and a location in the document'"'"'s original text and on a margin of the document for the one or more annotations; (c) a text analytics system, executed by the computer, wherein the subject domain determines which domain model to load into the text analytics system and the domain model identifies terms, phrases, entities, and concepts of the subject domain to be annotated in the document, and wherein the text analytics system; (i) provides the one or more annotations in the document'"'"'s original text and on the margin of the document for the paragraphs of the document based on the domain model and the annotation model, wherein the one or more annotations comprise domain-specific keywords and concepts; and (ii) aggregates the one or more annotations in the margin of the document for the paragraphs of the document into one or more section-level aggregated annotations for the sections of the document based on the annotation model; and (d) a custom viewer/renderer application, executed by the computer, that annotates the document with the one or more annotations in the document'"'"'s original text and on the margin of the document for the paragraphs, and with the one or more section-level aggregated annotations for the sections of the document, and renders the document including the one or more annotations and the one or more section-level aggregated annotations. - View Dependent Claims (2, 3, 4, 5, 6, 7, 14)
-
-
8. A computer program product for annotating a document, the computer program product comprising:
a non-transitory computer readable storage medium having computer readable program code embodied therewith, the computer readable program code comprising; computer readable program code configured to obtain the document; computer readable program code configured to determine a type of the document, wherein the type of document is a format within which the document is written; computer readable program code configured to determine a subject domain of the document, wherein the subject domain is a field, topic or genre of content set forth in the document; computer readable program code configured to segment the document into paragraphs and sections based on the document'"'"'s structure; computer readable program code configured to determine and drive an annotation strategy based on the type of document using information from an annotation model, wherein the annotation strategy specifies one or more annotations to utilize and a location in the document'"'"'s original text and on a margin of the document for the one or more annotations; computer readable program code configured to determine a domain model to load based on the subject domain, wherein the domain model identifies terms, phrases, entities, and concepts of the subject domain to be annotated in the document; computer readable program code configured to provide the one or more annotations in the document'"'"'s original text and on the margin of the document for the paragraphs of the document based on the domain model and annotation model, wherein the one or more annotations comprise domain-specific keywords and concepts; computer readable program code configured to aggregate the one or more annotations for the paragraphs of the document into one or more section-level aggregated annotations for the sections of the document based on the annotation model; computer readable program code configured to annotate the document with the one or more annotations in the document'"'"'s original text and on the margin of the document for the paragraphs, and with the one or more section-level aggregated annotations for the sections of the document; and computer readable program code configured to render the document including the one or more annotations and the one or more section-level aggregated annotations. - View Dependent Claims (9, 10, 11, 12, 13, 15)
Specification