Building and maintaining information extraction rules
First Claim
1. A method comprising:
- opening one or more documents for extraction;
providing an interface, comprising a query language (AQL) editor, to create a label and thereupon label a portion of the one or more documents using at least one labeled example provided by a user and at least one labeled clue provided by a user, wherein the at least one labeled example label provided by the user comprises text within the one or more documents and wherein the at least one labeled clue comprises a clue of interest indicating why the labeled example is desirable for extraction;
storing the created label;
developing an extractor based on the labeling, wherein the developing an extractor comprises inserting a template statement into the AQL editor, receiving input from the user to complete the inserted template statement, and developing AQL for capturing label examples for extraction in an extraction plan, wherein the developing AQL comprises capturing intent and semantics for the extractor based upon the at least one labeled example;
providing a test interface for the extractor;
displaying results of a test conducted through the test interface; and
exporting the extractor.
1 Assignment
0 Petitions
Accused Products
Abstract
Methods and arrangements for managing development of information extraction rules. One or more documents are opened for extraction. An interface is provided to create a label and thereupon label a portion of the document. The created label is stored, and an extractor is developed based on the labeling. A test interface is provided for the extractor, and results of a test conducted through the test interface are displayed. The extractor is exported. In accordance with at least one embodiment, developers are presented with eased automated guidance to write extractors, which thereby reduces an overall manual effort involved in extractor development. Generally, a focused, tutorial-type environment serves as a guide based on previously developed best practices.
18 Citations
19 Claims
-
1. A method comprising:
-
opening one or more documents for extraction; providing an interface, comprising a query language (AQL) editor, to create a label and thereupon label a portion of the one or more documents using at least one labeled example provided by a user and at least one labeled clue provided by a user, wherein the at least one labeled example label provided by the user comprises text within the one or more documents and wherein the at least one labeled clue comprises a clue of interest indicating why the labeled example is desirable for extraction; storing the created label; developing an extractor based on the labeling, wherein the developing an extractor comprises inserting a template statement into the AQL editor, receiving input from the user to complete the inserted template statement, and developing AQL for capturing label examples for extraction in an extraction plan, wherein the developing AQL comprises capturing intent and semantics for the extractor based upon the at least one labeled example; providing a test interface for the extractor; displaying results of a test conducted through the test interface; and
exporting the extractor. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
-
-
17. An apparatus comprising:
-
at least one processor; and a computer readable storage medium having computer readable program code embodied therewith and executable by the at least one processor, the computer readable program code comprising; computer readable program code configured to open one or more documents for extraction; computer readable program code configured to provide an interface, comprising a query language (AOL) editor, to create a label and thereupon label a portion of the one or more documents using at least one labeled example provided by a user and at least one labeled clue provided by a user, wherein the at least one labeled example provided by the user comprises text within the one or more documents and wherein the at least one labeled clue comprises includes a clue of interest indicating why the labeled example is desirable for extraction; computer readable program code configured to store the created label; computer readable program code configured to develop an extractor based on the labeling, wherein the developing an extractor comprises inserting a template statement into the AQL editor, receiving input from the user to complete the inserted template statement, and developing AQL for capturing label examples for extraction in an extraction plan, wherein the developing AQL comprises capturing intent and semantics for the extractor based upon the at least one labeled example; computer readable program code configured to provide a test interface for the extractor; computer readable program code configured to display results of a test conducted through the test interface; and computer readable program code configured to export the extractor.
-
-
18. A computer program product comprising:
-
a computer readable storage medium having computer readable program code embodied therewith, the computer readable program code comprising; computer readable program code configured to open one or more documents for extraction; computer readable program code configured to provide an interface, comprising a query language (AQL) editor, to create a label and thereupon label a portion of the one or more documents using at least one labeled example provided by a user and at least one labeled clue provided by a user, wherein the at least one labeled example provided by the user comprises text within the one or more documents and wherein the at least one labeled clue comprises a clue of interest indicating why the labeled example is desirable for extraction; computer readable program code configured to store the created label; computer readable program code configured to develop an extractor based on the labeling, wherein the developing an extractor comprises inserting a template statement into the AQL editor, receiving input from the user to complete the inserted template statement, and developing AQL for capturing label examples for extraction in an extraction plan, wherein the developing AOL comprises capturing intent and semantics for the extractor based upon the at least one labeled example; computer readable program code configured to provide a test interface for the extractor; computer readable program code configured to display results of a test conducted through the test interface; and computer readable program code configured to export the extractor. - View Dependent Claims (19)
-
Specification