Information extraction and annotation systems and methods for documents
First Claim
Patent Images
1. A method, comprising:
- receiving, by a context analysis module, annotated documents, the annotated documents comprising annotated fields;
analyzing, by the context analysis module, the annotated documents to determine contextual information for each of the annotated fields;
determining discriminative sequences using the contextual information by;
determining, by a contiguity heuristics module, contiguous common subsequences between aligned pairs of strings of the annotated documents;
determining, by the contiguity heuristics module, a frequency of occurrence of similar contiguous common subsequences; and
wherein the contiguity heuristics module generates a proposed rule from contiguous common subsequences having a desired frequency of occurrence;
providing, by the context analysis module, the proposed rule to a document annotator; and
applying, by the document annotator, the proposed rule to a target document to annotate the target document.
2 Assignments
0 Petitions
Accused Products
Abstract
Information extraction and annotation systems and methods for use in annotating and determining annotation instances are provided herein. Exemplary methods include receiving annotated documents, the annotated documents comprising annotated fields, analyzing the annotated documents to determine contextual information for each of the annotated fields, determining discriminative sequences using the contextual information, generating a proposed rule or a feature set using the discriminative sequences and annotated fields, and providing the proposed rule or the feature set to a document annotator.
37 Citations
21 Claims
-
1. A method, comprising:
-
receiving, by a context analysis module, annotated documents, the annotated documents comprising annotated fields; analyzing, by the context analysis module, the annotated documents to determine contextual information for each of the annotated fields; determining discriminative sequences using the contextual information by; determining, by a contiguity heuristics module, contiguous common subsequences between aligned pairs of strings of the annotated documents; determining, by the contiguity heuristics module, a frequency of occurrence of similar contiguous common subsequences; and wherein the contiguity heuristics module generates a proposed rule from contiguous common subsequences having a desired frequency of occurrence; providing, by the context analysis module, the proposed rule to a document annotator; and applying, by the document annotator, the proposed rule to a target document to annotate the target document. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A system, comprising:
-
a processor; and logic encoded in one or more tangible media for execution by the processor, the logic when executed by the processor causing the system to perform operations comprising; receiving annotated documents comprising annotated fields; analyzing the annotated documents to determine contextual information for each of the annotated fields; determining discriminative sequences using the contextual information by; determining, by a contiguity heuristics module, longest contiguous common subsequences between aligned pairs of strings of the annotated documents; determining, by the contiguity heuristics module, a frequency of occurrence of similar longest contiguous common subsequences; and wherein the contiguity heuristics module generates a proposed rule from longest contiguous common subsequences having a desired frequency of occurrence; providing the proposed rule to a document annotator; and applying, by the document annotator, the proposed rule to a target document to automatically annotate the target document. - View Dependent Claims (15, 16, 17, 18, 19, 20)
-
-
21. A method, comprising:
-
receiving, by a context analysis module, annotated documents, the annotated documents comprising annotated fields; analyzing, by the context analysis module, the annotated documents to determine contextual information for each of the annotated fields; determining discriminative sequences using the contextual information by; determining, by a contiguity heuristics module, longest contiguous common subsequences between aligned pairs of strings of the annotated documents; determining, by the contiguity heuristics module, a frequency of occurrence of similar longest contiguous common subsequences; and wherein the contiguity heuristics module generates a proposed rule from longest contiguous common subsequences having a desired frequency of occurrence; providing, by the context analysis module, the proposed rule to a document annotator; and applying, by the document annotator, the proposed rule to a target document to automatically annotate the target document.
-
Specification