Automated generation of text analysis systems
First Claim
1. A method for generating a text analysis program for recognizing patterns appearing in text and extracting information from said patterns, the method comprising the steps of(a) providing a sample hierarchy, said sample hierarchy comprising samples of text wherein the samples are associated with offset values, said offset values identifying locations in a parse tree data structure, said parse tree containing concepts stored at locations identified by said offsets;
- (b) extracting at least one rule from said sample hierarchy, said rule describing how to process a portion of text;
(c) generating a pass from said rule, said pass containing instructions to operate a text analyzer; and
(d) constructing a text analyzer containing said pass.
1 Assignment
0 Petitions
Accused Products
Abstract
A system, method, and computer program for automatically generating text analysis systems is disclosed. Individual passes of a multi-pass text analyzer are created by generating rules from samples supplied by users. Successive passes are created in a cascading fashion by performing partial text analyses employing existing passes. A complete text analyzer interleaves the generated passes with a framework of existing passes. The complete text analysis system can then process texts to identify patterns similar to samples added by users. Generation of rules from samples encompasses a wide range of constructs and granularities that occur in text, from individual words to intrasentential patterns, to sentential, paragraph, section, and other formats that occur in text documents.
40 Citations
4 Claims
-
1. A method for generating a text analysis program for recognizing patterns appearing in text and extracting information from said patterns, the method comprising the steps of
(a) providing a sample hierarchy, said sample hierarchy comprising samples of text wherein the samples are associated with offset values, said offset values identifying locations in a parse tree data structure, said parse tree containing concepts stored at locations identified by said offsets; -
(b) extracting at least one rule from said sample hierarchy, said rule describing how to process a portion of text;
(c) generating a pass from said rule, said pass containing instructions to operate a text analyzer; and
(d) constructing a text analyzer containing said pass. - View Dependent Claims (2, 3)
-
-
4. A computer readable medium containing instructions which, when executed by a computer, generate a text analysis program for recognizing patterns appearing in text and extracting information from said patterns, by:
-
(a) providing a sample hierarchy, said sample hierarchy comprising samples of text, wherein the samples are associated with offset values, said offset values identifying locations in a parse tree data structure, said parse tree containing concepts stored at locations identified by said offsets;
(b) extracting at least one rule from said sample hierarchy, said rule describing how to process a portion of text;
(c) generating a pass from said rule, said pass containing instructions to operate a text analyzer; and
(d) constructing a text analyzer containing said pass.
-
Specification