Systems and methods for structural indexing of natural language text
First Claim
1. A system for indexing natural language text comprising:
- an input/output circuit that retrieves a text;
a linearization rule storage structure that stores linearization rules;
a processor that segments the retrieved text into text portions;
a constituent structure circuit that determines the constituent structure of the text portions;
a functional structure circuit for determining the functional structure of the text portions;
a characterizing predicative triples circuit that applies linearization transfer rules from the linearization transfer rule storage structure to the functional structure to determine characterizing predicative triples;
a derived feature extraction circuit for extracting at least one of;
named entity, co-reference, lexical entry, semantic-structural relationship, attribution and meronymic information from the text portions;
an index circuit that creates canonized representations of the text portions based on the constituent structures, the characterizing predicative triples and the derived features and stores them in the structural natural language index storage structure.
1 Assignment
0 Petitions
Accused Products
Abstract
A structural natural language index is created by segmenting documents within a repository into text portions and extracting named entity, co-reference, lexical entries, structural-semantic relationships, speaker attribution and meronymic derived features. A constituent structure is determined that contains the constituent elements and ordering information sufficient to reconstruct the text portion. A functional structure of the text portions is determined. A set of characterizing predicative triples are formed from the functional structure by applying linearization transfer rules. The constituent structure, the characterizing predicative triples and the derived features are combined to form a canonical form of the text portion. Each canonical form is added to the structural natural language index. A retrieved question is classified to determine question type and a corresponding canonical form for the question is generated. The entries in the structural natural language index are searched for entries matching the canonical form of the question and relevant to the question type. The characterizing predicative triples are used in conjunction with a generation grammar to create an answer. If the generation fails, some or all of the constituent structure of the matching entry is returned as the answer.
157 Citations
20 Claims
-
1. A system for indexing natural language text comprising:
-
an input/output circuit that retrieves a text;
a linearization rule storage structure that stores linearization rules;
a processor that segments the retrieved text into text portions;
a constituent structure circuit that determines the constituent structure of the text portions;
a functional structure circuit for determining the functional structure of the text portions;
a characterizing predicative triples circuit that applies linearization transfer rules from the linearization transfer rule storage structure to the functional structure to determine characterizing predicative triples;
a derived feature extraction circuit for extracting at least one of;
named entity, co-reference, lexical entry, semantic-structural relationship, attribution and meronymic information from the text portions;
an index circuit that creates canonized representations of the text portions based on the constituent structures, the characterizing predicative triples and the derived features and stores them in the structural natural language index storage structure. - View Dependent Claims (2, 3, 5, 6)
-
-
4. The system of 3, in which the linearization transfer rules perform at least one of:
- canonize passivization, canonize ditransitive constructions, and discard redundant information, from the functional structure.
-
7. A system for creating a question template for searching a structural natural language index, comprising:
-
an input/output circuit that retrieves a question;
a question classification circuit that classifies the question into a question type;
a linearization rule storage structure that stores linearization rules;
a constituent structure circuit that determines the constituent structure of the question;
a functional structure circuit for determining the functional structure of the question;
a characterizing predicative triples circuit that applies linearization transfer rules from the linearization transfer rule storage structure to the functional structure to determine characterizing predicative triples;
a derived feature extraction circuit for extracting at least one of;
named entity, co-reference, lexical entry, semantic-structural relationship, attribution and meronymic information from the question;
an index circuit that creates a canonical representation of the question based on the constituent structures, the characterizing predicative triples and the derived features; and
wherein the processor matches the canonical representation of the question against entries in a retrieved structural natural language index storage structure;
a generation circuit that generates an answer based on a generation grammar and at least one of;
the characterizing predicative triples and the constituent structure of the matching entry from the structural natural language index storage structure and displays the answer. - View Dependent Claims (8)
-
-
9. The system of 8, in which the linearization transfer rules perform at least one of:
- canonize passivization, canonize ditransitive constructions, and discard redundant information, from the functional structure.
-
10. A method for indexing natural language text comprising the steps of:
-
segmenting a text into text portions;
determining a constituent structure for each text portion;
determining a functional structure for each text portion;
determining linearization transfer rules;
determining characterizing predicative triples of each functional structure based on the linearization transfer rules;
extracting derived features including at least one of;
named entity, co-reference, lexical entry, semantic-structural relationship, attribution and meronymic information from each text portion;
determining canonized representations for each text portion based on the constituent structures, the characterizing predicative triples and the derived features; and
determining a structural index based on the canonized representation of the text portion. - View Dependent Claims (11, 12, 14, 15)
-
-
13. The method of 12, in which the linearization transfer rules perform at least one of:
- canonize passivization, canonize ditransitive constructions, and discard redundant information, from the functional structure.
-
16. A method of creating a question template for searching a structural natural language index, comprising the steps of:
-
determining a constituent structure for the question;
determining a functional structure for the question;
determining linearization transfer rules;
determining characterizing predicative triples of each functional structure based on the linearization transfer rules;
extracting derived features including at least one of;
named entity, co-reference, lexical entry, semantic-structural relationship, attribution and meronymic information from the question;
determining a canonized representation of the question based on the constituent structures, the determined predicative triples and the derived features; and
searching the structural index of canonized forms for canonized forms based on the canonized representation of the question and the question type;
generating an answer based on a generation grammar and at least one of the characterizing predicative triples and the constituent structure of any matching entries. - View Dependent Claims (17)
-
-
18. The method of 17, in which the linearization transfer rules perform at least one of:
- canonize passivization, canonize ditransitive constructions, and discard redundant information, from the functional structure.
-
19. Computer readable storage medium comprising:
- computer readable program code embodied on the computer readable medium, the computer readable program code usable to program a computer for structural indexing of natural language text comprising the steps of;
segmenting a text into text portions;
determining a constituent structure for each text portion;
determining a functional structure for each text portion;
determining linearization transfer rules;
determining characterizing predicative triples of each functional structure based on the linearization transfer rules;
extracting derived features including at least one of;
named entity, co-reference, lexical entry, semantic-structural relationship, attribution and meronymic information from each text portion;
determining canonized representations for each text portion based on the constituent structures, the characterizing predicative triples and the derived features; and
determining a structural index based on the canonized representation of the text portion.
- computer readable program code embodied on the computer readable medium, the computer readable program code usable to program a computer for structural indexing of natural language text comprising the steps of;
-
20. Computer readable storage medium comprising:
- computer readable program code embodied on the computer readable medium, the computer readable program code usable to program a computer for searching a structural indexing of natural language text comprising the steps of;
determining a constituent structure for the question;
determining a functional structure for the question;
determining linearization transfer rules;
determining characterizing predicative triples of each functional structure based on the linearization transfer rules;
extracting derived features including at least one of;
named entity, co-reference, lexical entry, semantic-structural relationship, attribution and meronymic information from the question;
determining a canonized representation of the question based on the constituent structures, the determined predicative triples and the derived features; and
searching the structural index of canonized forms for canonized forms based on the canonized representation of the question and the question type;
generating an answer based on a generation grammar and at least one of the characterizing predicative triples and the constituent structure of any matching entries.
- computer readable program code embodied on the computer readable medium, the computer readable program code usable to program a computer for searching a structural indexing of natural language text comprising the steps of;
Specification