Generation of a grammatically diverse test set for deep question answering systems
First Claim
1. A method of providing a test set of natural language sentences for a deep question answering system comprising:
- receiving a plurality of natural language sentences in computer-readable form wherein each natural language sentence has a composition of certain words, by executing first instructions in a computer system;
analyzing the natural language sentences to characterize different syntactical classifications of the sentences, by executing second instructions in the computer system;
selecting a subset of the natural language sentences according to a desired syntactic distribution based on the different syntactical classifications wherein each natural language sentence in the subset is unchanged from its composition, by executing third instructions in the computer system;
storing the selected subset as the test set, by executing fourth instructions in the computer system;
applying the test set to the deep question answering system to train the deep question answering system, by executing fifth instructions in the computer system;
receiving a query from a user, by executing sixth instructions in the computer system;
using the deep question answering system to generate an answer to the query, by executing seventh instructions in the computer system; and
presenting the answer to the user, by executing eighth instructions in the computer system.
1 Assignment
0 Petitions
Accused Products
Abstract
A grammatically diverse test set of natural language sentences for a deep question answering system is provided by analyzing a given sentence to characterize its syntactical classification, and adding the sentence to the test set if its classification is sufficiently different from other sentences already in the test set. A particular sentence may be selected for inclusion according to a desired syntactic distribution. Multiple sentences having the exact same classification may be allowed subject to a maximum number of such sentences. The test set is adapted to an element of interest by characterizing each syntactical classification relative to the element of interest. The analysis derives a parse tree, identifies a particular node of the tree corresponding to the element of interest, and extracts syntactic information by traversing the tree starting at the particular node and ending at the root node of the tree according to different traversal schemes.
-
Citations
18 Claims
-
1. A method of providing a test set of natural language sentences for a deep question answering system comprising:
-
receiving a plurality of natural language sentences in computer-readable form wherein each natural language sentence has a composition of certain words, by executing first instructions in a computer system; analyzing the natural language sentences to characterize different syntactical classifications of the sentences, by executing second instructions in the computer system; selecting a subset of the natural language sentences according to a desired syntactic distribution based on the different syntactical classifications wherein each natural language sentence in the subset is unchanged from its composition, by executing third instructions in the computer system; storing the selected subset as the test set, by executing fourth instructions in the computer system; applying the test set to the deep question answering system to train the deep question answering system, by executing fifth instructions in the computer system; receiving a query from a user, by executing sixth instructions in the computer system; using the deep question answering system to generate an answer to the query, by executing seventh instructions in the computer system; and presenting the answer to the user, by executing eighth instructions in the computer system. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A computer system comprising:
-
one or more processors which process program instructions; a memory device connected to said one or more processors; and program instructions residing in said memory device for providing a test set of natural language sentences for a deep question answering system by receiving a plurality of natural language sentences wherein each natural language sentence has a composition of certain words, analyzing the natural language sentences to characterize different syntactical classifications of the sentences, selecting a subset of the natural language sentences according to a desired syntactic distribution based on the different syntactical classifications wherein each natural language sentence in the subset is unchanged from its composition, storing the selected subset as the test set, applying the test set to the deep question answering system to train the deep question answering system, receiving a query from a user, using the deep question answering system to generate an answer to the query, and presenting the answer to the user. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. A computer program product comprising:
-
a computer readable storage medium; and program instructions residing in said storage medium for providing a test set of natural language sentences for a deep question answering system by receiving a plurality of natural language sentences wherein each natural language sentence has a composition of certain words, analyzing the natural language sentences to characterize different syntactical classifications of the sentences, selecting a subset of the natural language sentences according to a desired syntactic distribution based on the different syntactical classifications wherein each natural language sentence in the subset is unchanged from its composition, storing the selected subset as the test set, applying the test set to the deep question answering system to train the deep question answering system, receiving a query from a user, using the deep question answering system to generate an answer to the query, and presenting the answer to the user. - View Dependent Claims (14, 15, 16, 17, 18)
-
Specification