Extraction of facts from text
First Claim
1. A fact extraction tool set for extracting information from a document, comprising:
- means for annotating a text; and
means for extracting facts from the annotated text.
1 Assignment
0 Petitions
Accused Products
Abstract
A fact extraction tool set (“FEX”) finds and extracts targeted pieces of information from text using linguistic and pattern matching technologies, and in particular, text annotation and fact extraction. Text annotation tools break a text, such as a document, into its base tokens and annotate those tokens or patterns of tokens with orthographic, syntactic, semantic, pragmatic and other attributes. A user-defined “Annotation Configuration” controls which annotation tools are used in a given application. XML is used as the basis for representing the annotated text. A tag uncrossing tool resolves conflicting (crossed) annotation boundaries in an annotated text to produce well-formed XML from the results of the individual annotators. The fact extraction tool is a pattern matching language which is used to write scripts that find and match patterns of attributes that correspond to targeted pieces of information in the text, and extract that information.
365 Citations
55 Claims
-
1. A fact extraction tool set for extracting information from a document, comprising:
-
means for annotating a text; and
means for extracting facts from the annotated text. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
- 16. A rule-based information extraction language for use in identifying and extracting potentially interesting pieces of information in aligned annotations in a text, comprising at least one text pattern recognition rule that queries for at least one of literal text, attributes, and relationships found in the aligned annotations to define the facts to be extracted.
-
20. A text annotation tool comprising:
-
means for assigning syntactic and semantic attributes to a text passage by at least one of parsing the text passage and applying text annotation processes other than parsing the text passage, including means for breaking the text passage into its base tokens and annotating the base tokens and patterns of base tokens with a number of orthographic, syntactic, semantic, pragmatic and dictionary-based attributes; and
means for associating all annotations assigned to a particular piece of text with the base tokens for that text to generate aligned annotations. - View Dependent Claims (21, 22, 23, 24, 25)
-
-
26. A computer program product for extracting information from a document, the computer program product comprising a computer usable storage medium having computer readable program code means embodied in the medium, the computer readable program code means comprising:
-
computer readable program code means for annotating a text; and
computer readable program code means for extracting facts from the annotated text. - View Dependent Claims (27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40)
-
-
41. A method of extracting information from a document, comprising the steps of:
-
annotating a text; and
extracting facts from the annotated text. - View Dependent Claims (42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55)
-
Specification