System and method for domain adaptation in question answering
First Claim
Patent Images
1. A method for providing adaptation to a question answering system, wherein the question answering system has associated therewith a first corpus of data and a question-answer set, the question-answer set being a collection of questions and correct answers to these questions, such that each question has one or more correct answers associated with it, the method comprising the steps of:
- submitting a set of questions to the question answering system;
receiving back from the question answering system a set of answers generated in response to the set of questions, the set of answers that are received back being based upon at least one document in the first corpus of data;
comparing the set of answers received back from the question answering system to answers in the question-answer set;
identifying, based on the comparison of the set of answers received back from the question answering system to answers in the question-answer set, a plurality of answers from the set of answers received back from the question answering system that are not correct;
generating a plurality of groups by performing automated grouping on at least one of;
(a) a plurality of questions from the question-answer set that correspond to the identified answers that are not correct; and
(b) a plurality of answers from the question-answer set that correspond to the identified answers that are not correct;
creating a collection of related terms associated with the groups;
obtaining, from a second corpus of data, textual information about each of the related terms, wherein the second corpus of data is external relative to the first corpus of data;
creating a plurality of textual resources from the obtained information, each of the plurality of textual resources being associated with one of the related terms;
scoring each of the plurality of textual resources based on whether each textual resource is informative with respect to the at least one document in the first corpus of data; and
adding at least one of the created textual resources to the first corpus of data, wherein the at least one of the created textual resources that is added to the first corpus of data comprises a subset of all of the created plurality of textual resources and wherein the at least one of the created textual resources that is added to the first corpus of data had been scored as more informative with respect to the at least one document in the first corpus of data than at least one of the other created textual resources that is not added to the first corpus of data.
1 Assignment
0 Petitions
Accused Products
Abstract
The present disclosure relates generally to question answering systems and methods and, particularly, to systems and methods for domain adaptation in question answering.
-
Citations
16 Claims
-
1. A method for providing adaptation to a question answering system, wherein the question answering system has associated therewith a first corpus of data and a question-answer set, the question-answer set being a collection of questions and correct answers to these questions, such that each question has one or more correct answers associated with it, the method comprising the steps of:
-
submitting a set of questions to the question answering system; receiving back from the question answering system a set of answers generated in response to the set of questions, the set of answers that are received back being based upon at least one document in the first corpus of data; comparing the set of answers received back from the question answering system to answers in the question-answer set; identifying, based on the comparison of the set of answers received back from the question answering system to answers in the question-answer set, a plurality of answers from the set of answers received back from the question answering system that are not correct; generating a plurality of groups by performing automated grouping on at least one of;
(a) a plurality of questions from the question-answer set that correspond to the identified answers that are not correct; and
(b) a plurality of answers from the question-answer set that correspond to the identified answers that are not correct;creating a collection of related terms associated with the groups; obtaining, from a second corpus of data, textual information about each of the related terms, wherein the second corpus of data is external relative to the first corpus of data; creating a plurality of textual resources from the obtained information, each of the plurality of textual resources being associated with one of the related terms; scoring each of the plurality of textual resources based on whether each textual resource is informative with respect to the at least one document in the first corpus of data; and adding at least one of the created textual resources to the first corpus of data, wherein the at least one of the created textual resources that is added to the first corpus of data comprises a subset of all of the created plurality of textual resources and wherein the at least one of the created textual resources that is added to the first corpus of data had been scored as more informative with respect to the at least one document in the first corpus of data than at least one of the other created textual resources that is not added to the first corpus of data. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A method for providing adaptation to a question answering system, wherein the question answering system has associated therewith a first corpus of data and a question-answer set, the question-answer set being a collection of questions and correct answers to these questions, such that each question has one or more correct answers associated with it, the method comprising the steps of:
-
submitting a set of questions to the question answering system; receiving back from the question answering system a set of answers generated in response to the set of questions, the set of answers that are received back being based upon at least one document in the first corpus of data; comparing the set of answers received back from the question answering system to answers in the question-answer set; identifying, based on the comparison of the set of answers received back from the question answering system to answers in the question-answer set, a plurality of answers from the set of answers received back from the question answering system that are not correct; obtaining, for each question with an answer that is not correct, a corresponding correct answer and computing for each corresponding correct answer an associated semantic data type; finding, for each semantic data type associated with a corresponding correct answer, a collection of words or expressions that are related to the semantic data type; obtaining from a second corpus of data, for each of the words or expressions that are in the collection that is found, additional related information and creating a plurality of textual resources with the additional information, each of the plurality of textual resources being associated with one of the words or expressions that are in the collection that is found, wherein the second corpus of data is external relative to the first corpus of data; scoring each of the plurality of textual resources based on whether each textual resource is informative with respect to the at least one document in the first corpus of data; and adding at least one of the created textual resources to the first corpus of data, wherein the at least one of the created textual resources that is added to the first corpus of data comprises a subset of all of the created plurality of textual resources and wherein the at least one of the created textual resources that is added to the first corpus of data had been scored as more informative with respect to the at least one document in the first corpus of data than at least one of the other created textual resources that is not added to the first corpus of data. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
Specification