System, Method, and Apparatus for Information Extraction of Textual Documents
First Claim
1. A method for identifying and retrieving text from a repository of text documents, the method comprising the steps of:
- a) providing the repository, which store a plurality of variables that represent document segments and associated rhetorical relations;
b) interacting with a user to generate query input that specifies at least one rhetorical relation of interest;
c) in response to receipt of said query input, querying the variables stored in the repository to identify zero or more document segments that are associated with a rhetorical relation that matches the at least one rhetorical relation of interest specified by said query input for output to the user.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and system for extraction of text from a set of text document(s). A data repository stores a plurality of variables that represent document segments and associated rhetorical relations. A user interacts with a computer to define query input that specifies at least one rhetorical relation of interest. The query input specified by the user is processed to query the variables stored in the data repository to identify zero or more document segments that are associated with a rhetorical relation that matches the at least one rhetorical relation of interest specified by the query input. Information corresponding to the zero or more matching document segments is returned to the user. In the preferred embodiment, the rhetorical relations represented by the user supplied query input as well as the variables stored in the data repository include a set of RST relations whose meaning is dictated by nuclearity of the associated text. Such RST relations can include a plurality of mononuclear RST relations each having a nucleus and a satellite and a plurality of multinuclear RST relations each having a plurality of nucleus. The rhetorical relations represented by the user supplied query input as well as the variables stored in the data repository can also include a set of Speech Act relations whose meaning extends beyond the situational semantics of the associated text.
62 Citations
27 Claims
-
1. A method for identifying and retrieving text from a repository of text documents, the method comprising the steps of:
-
a) providing the repository, which store a plurality of variables that represent document segments and associated rhetorical relations; b) interacting with a user to generate query input that specifies at least one rhetorical relation of interest; c) in response to receipt of said query input, querying the variables stored in the repository to identify zero or more document segments that are associated with a rhetorical relation that matches the at least one rhetorical relation of interest specified by said query input for output to the user. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A system for extraction of text from a set of text documents comprising:
-
a repository which stores a plurality of variables that represent document segments and associated rhetorical relations; user input query means for receiving query input from a user that specifies at least one rhetorical relation of interest; and query processing logic, operably coupled to the user input query means and the repository, that utilizes said query input to query the variables stored in the repository to identify zero or more document segments that are associated with a rhetorical relation that matches the at least one rhetorical relation of interest specified by said query input for output to the user. - View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27)
-
Specification