Syntactic classification of natural language sentences with respect to a targeted element
First Claim
1. A method of syntactically classifying a natural language sentence comprising:
- receiving the natural language sentence in computer-readable form, by executing first instructions in a computer system;
parsing the natural language sentence to derive a parse tree having a plurality of nodes, by executing second instructions in the computer system;
identifying a particular one of the nodes that corresponds to an element of interest in the natural language sentence, by executing third instructions in the computer system;
extracting syntactic information from the parse tree relative to the particular node corresponding to the element of interest, by executing fourth instructions in the computer system;
recording the syntactic information as a classification for the natural language sentence, by executing fifth instructions in the computer system;
determining that the classification for the natural language sentence is different from classifications of other natural language sentences in a test set according to at least one predetermined similarity criterion related to the syntactic information, by executing sixth instructions in the computer system, wherein the predetermined similarity criterion allows two given sentences to be deemed similar even when the two given sentences have different classifications; and
responsive to said determining, adding the natural language sentence to the test set, by executing seventh instructions in the computer system.
1 Assignment
0 Petitions
Accused Products
Abstract
A grammatically diverse test set of natural language sentences for a deep question answering system is provided by analyzing a given sentence to characterize its syntactical classification, and adding the sentence to the test set if its classification is sufficiently different from other sentences already in the test set. A particular sentence may be selected for inclusion according to a desired syntactic distribution. Multiple sentences having the exact same classification may be allowed subject to a maximum number of such sentences. The test set is adapted to an element of interest by characterizing each syntactical classification relative to the element of interest. The analysis derives a parse tree, identifies a particular node of the tree corresponding to the element of interest, and extracts syntactic information by traversing the tree starting at the particular node and ending at the root node of the tree according to different traversal schemes.
59 Citations
20 Claims
-
1. A method of syntactically classifying a natural language sentence comprising:
-
receiving the natural language sentence in computer-readable form, by executing first instructions in a computer system; parsing the natural language sentence to derive a parse tree having a plurality of nodes, by executing second instructions in the computer system; identifying a particular one of the nodes that corresponds to an element of interest in the natural language sentence, by executing third instructions in the computer system; extracting syntactic information from the parse tree relative to the particular node corresponding to the element of interest, by executing fourth instructions in the computer system; recording the syntactic information as a classification for the natural language sentence, by executing fifth instructions in the computer system; determining that the classification for the natural language sentence is different from classifications of other natural language sentences in a test set according to at least one predetermined similarity criterion related to the syntactic information, by executing sixth instructions in the computer system, wherein the predetermined similarity criterion allows two given sentences to be deemed similar even when the two given sentences have different classifications; and responsive to said determining, adding the natural language sentence to the test set, by executing seventh instructions in the computer system. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A computer system comprising:
-
one or more processors which process program instructions; a memory device connected to said one or more processors; and program instructions residing in said memory device for syntactically classifying a natural language sentence by receiving the natural language sentence, parsing the natural language sentence to derive a parse tree having a plurality of nodes, identifying a particular one of the nodes that corresponds to an element of interest in the natural language sentence, extracting syntactic information from the parse tree relative to the particular node corresponding to the element of interest, recording the syntactic information as a classification for the natural language sentence, determining that the classification for the natural language sentence is different from classifications of other natural language sentences in a test set according to at least one predetermined similarity criterion related to the syntactic information wherein the predetermined similarity criterion allows two given sentences to be deemed similar even when the two given sentences have different classifications, and responsively adding the natural language sentence to the test set. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A computer program product comprising:
-
a computer readable storage medium; and program instructions residing in said storage medium for syntactically classifying a natural language sentence by receiving the natural language sentence, parsing the natural language sentence to derive a parse tree having a plurality of nodes, identifying a particular one of the nodes that corresponds to an element of interest in the natural language sentence, extracting syntactic information from the parse tree relative to the particular node corresponding to the element of interest, recording the syntactic information as a classification for the natural language sentence, determining that the classification for the natural language sentence is different from classifications of other natural language sentences in a test set according to at least one predetermined similarity criterion related to the syntactic information wherein the predetermined similarity criterion allows two given sentences to be deemed similar even when the two given sentences have different classifications, and responsively adding the natural language sentence to the test set. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification