Utilizing classification and text analytics for annotating documents to allow quick scanning
First Claim
Patent Images
1. A computer-implemented method for annotating a document comprising:
- determining a type of the document using a classifier, wherein the type of document is a format within which the document is written;
determining a subject domain of the document using the classifier, wherein the subject domain is a field, topic or genre of content set forth in the document;
segmenting the document into one or more paragraphs and sections based on the document'"'"'s structure using the classifier;
determining and driving an annotation strategy based on the type of document using information from an annotation model, wherein the annotation strategy specifies one or more annotations to utilize and a location in the document'"'"'s original text and on a margin of the document for the one or more annotations;
loading a domain model based on the subject domain into a text analytics system, wherein the subject domain determines which domain model to load into the text analytics system and the domain model identifies terms, phrases, entities, and concepts of the subject domain to be annotated in the document;
providing the one or more annotations in the document'"'"'s original text and on the margin of the document for the paragraphs of the document based on the domain model and the annotation model using the text analytics system, wherein the one or more annotations comprise domain-specific keywords and concepts;
aggregating the one or more annotations in the margin of the document for the paragraphs of the document into one or more section-level aggregated annotations for the sections of the document based on the annotation model using the text analytics system;
annotating the document with the one or more annotations in the document'"'"'s original text and on the margin of the document for the paragraphs, and with the one or more section-level aggregated annotations for the sections of the document, using a custom viewer/renderer application; and
rendering the document including the one or more annotations and the one or more section-level aggregated annotations using the custom viewer/renderer application.
1 Assignment
0 Petitions
Accused Products
Abstract
Classification, text analytics, and natural language processing are used to evaluate passages, extract text, identify concepts, and provide visual cues and notations to assist readers in scanning and evaluating large amounts of information in a document.
48 Citations
8 Claims
-
1. A computer-implemented method for annotating a document comprising:
-
determining a type of the document using a classifier, wherein the type of document is a format within which the document is written; determining a subject domain of the document using the classifier, wherein the subject domain is a field, topic or genre of content set forth in the document; segmenting the document into one or more paragraphs and sections based on the document'"'"'s structure using the classifier; determining and driving an annotation strategy based on the type of document using information from an annotation model, wherein the annotation strategy specifies one or more annotations to utilize and a location in the document'"'"'s original text and on a margin of the document for the one or more annotations; loading a domain model based on the subject domain into a text analytics system, wherein the subject domain determines which domain model to load into the text analytics system and the domain model identifies terms, phrases, entities, and concepts of the subject domain to be annotated in the document; providing the one or more annotations in the document'"'"'s original text and on the margin of the document for the paragraphs of the document based on the domain model and the annotation model using the text analytics system, wherein the one or more annotations comprise domain-specific keywords and concepts; aggregating the one or more annotations in the margin of the document for the paragraphs of the document into one or more section-level aggregated annotations for the sections of the document based on the annotation model using the text analytics system; annotating the document with the one or more annotations in the document'"'"'s original text and on the margin of the document for the paragraphs, and with the one or more section-level aggregated annotations for the sections of the document, using a custom viewer/renderer application; and rendering the document including the one or more annotations and the one or more section-level aggregated annotations using the custom viewer/renderer application. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
Specification