Generating secondary questions in an introspective question answering system
First Claim
1. A computerized device, comprising:
- a question-answer system comprising a processor running software for performing a plurality of question answering processes and a corpus of data;
a receiver receiving a first question into the question-answer system; and
a network interface connected to external sources comprising a community of human respondents;
said processor;
comparing the first question to the corpus of data;
generating candidate answers for the first question posed to the question-answer system, the candidate answers for the first question being generated from the corpus of data;
determining a confidence score for each of the candidate answers based on evidence from the corpus of data used to generate the candidate answers, the determining a confidence score further comprises assigning an evidence score to the evidence based on how well the evidence matches the first question, wherein the evidence comprises good evidence, marginal evidence, and bad evidence;
identifying information to supplement the marginal evidence, the information improves the confidence scores for the candidate answers to the first question;
generating a plurality of hypotheses concerning the information that supplements the marginal evidence and improves the confidence scores for the candidate answers to the first question;
generating the at least one secondary question based on each hypothesis of the plurality of hypotheses concerning the information that supplements the marginal evidence and improves the confidence scores for the candidate answers to the first question, an answer to the at least one secondary question improving the ability of the question-answer system to understand and evaluate evidence associated with the candidate answers to the first question;
ranking the hypotheses based on relative utility to determine an order in which to output the at least one secondary question to the external sources;
outputting the at least one secondary question to the external sources using the network interface;
receiving responses to the at least one secondary question from the external sources using the network interface;
validating the responses to the at least one secondary question to extract a piece of data, fact, syntactical relationship, grammatical relationship, logical rule, or taxonomy rule that improves the confidence scores for the candidate answers to the first question, the validating comprises validating that the responses are supported by a threshold number of external sources; and
adding the piece of data, fact, syntactical relationship, grammatical relationship, logical rule, or taxonomy rule extracted from responses to the at least one secondary question to the corpus of data.
0 Assignments
0 Petitions
Accused Products
Abstract
A method of generating secondary questions in a question-answer system. Missing information is identified from a corpus of data using a computerized device. The missing information comprises any information that improves confidence scores for candidate answers to a question. The computerized device automatically generates a plurality of hypotheses concerning the missing information. The computerized device automatically generates at least one secondary question based on each of the plurality of hypotheses. The hypotheses are ranked based on relative utility to determine an order in which the computerized device outputs the at least one secondary question to external sources to obtain responses.
-
Citations
21 Claims
-
1. A computerized device, comprising:
-
a question-answer system comprising a processor running software for performing a plurality of question answering processes and a corpus of data; a receiver receiving a first question into the question-answer system; and a network interface connected to external sources comprising a community of human respondents; said processor; comparing the first question to the corpus of data; generating candidate answers for the first question posed to the question-answer system, the candidate answers for the first question being generated from the corpus of data; determining a confidence score for each of the candidate answers based on evidence from the corpus of data used to generate the candidate answers, the determining a confidence score further comprises assigning an evidence score to the evidence based on how well the evidence matches the first question, wherein the evidence comprises good evidence, marginal evidence, and bad evidence; identifying information to supplement the marginal evidence, the information improves the confidence scores for the candidate answers to the first question; generating a plurality of hypotheses concerning the information that supplements the marginal evidence and improves the confidence scores for the candidate answers to the first question; generating the at least one secondary question based on each hypothesis of the plurality of hypotheses concerning the information that supplements the marginal evidence and improves the confidence scores for the candidate answers to the first question, an answer to the at least one secondary question improving the ability of the question-answer system to understand and evaluate evidence associated with the candidate answers to the first question; ranking the hypotheses based on relative utility to determine an order in which to output the at least one secondary question to the external sources; outputting the at least one secondary question to the external sources using the network interface; receiving responses to the at least one secondary question from the external sources using the network interface; validating the responses to the at least one secondary question to extract a piece of data, fact, syntactical relationship, grammatical relationship, logical rule, or taxonomy rule that improves the confidence scores for the candidate answers to the first question, the validating comprises validating that the responses are supported by a threshold number of external sources; and adding the piece of data, fact, syntactical relationship, grammatical relationship, logical rule, or taxonomy rule extracted from responses to the at least one secondary question to the corpus of data. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A computer system, comprising:
-
an automated question answering (QA) system comprising; a corpus of data; a processor operatively connected to the corpus of data, the processor having software for performing a plurality of question answering processes; a receiver operatively connected to the processor; and a network interface operatively connected to the processor and to external expert community sources; the receiver receiving a question into the automated QA system, the processor comparing the question to the corpus of data and generating a plurality of candidate answers to the question from the corpus of data, the processor determining a confidence score for each candidate answer of the plurality of candidate answers based on evidence used to generate the each candidate answer of the plurality of candidate answers, wherein the evidence comprises good evidence, marginal evidence, and bad evidence, the processor identifying information to supplement the marginal evidence, the information improves the confidence scores for at least one candidate answer in the plurality of candidate answers, the processor generating a plurality of hypotheses concerning the information that supplements the marginal evidence and improves the confidence scores for the at least one candidate answer, the processor generating the at least one follow-on inquiry based on each hypothesis of the plurality of hypotheses, the processor ranking the hypotheses based on relative utility, the ranking determining an order for the automated QA system to output the at least one follow-on inquiry to the external expert community sources, the processor outputting the at least one follow-on inquiry to the external expert community sources using the network interface, the processor receiving responses to the at least one follow-on inquiry from the external expert community sources using the network interface, the processor validating the responses to the at least one follow-on inquiry and extracting a piece of data, fact, syntactical relationship, grammatical relationship, logical rule, or taxonomy rule that improves the confidence scores for the at least one candidate answer, the validating comprises validating that the responses are supported by a threshold number of external expert community sources, and the processor adding the piece of data, fact, syntactical relationship, grammatical relationship, logical rule, or taxonomy rule extracted from the responses to the at least one follow-on inquiry to the corpus of data. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. A question answering (QA) system comprising:
-
a processor; an evidence analysis module, the evidence analysis module being operatively connected to the processor; a first interface operatively connected to the processor; a second interface operatively connected to the processor and to one or more external sources separate from the QA system, the one or more external sources comprising a community of human respondents; and a corpus of data operatively connected to the evidence analysis module, the first interface receiving a first question to be answered by the QA system, the processor comparing the first question to the corpus of data and creating a collection of candidate answers to the first question from the corpus of data, each candidate answer in the collection of candidate answers to the first question having supporting evidence and a confidence score generated by the processor based on the evidence from corpus of data used to generate the candidate answer, wherein the evidence comprises good evidence, marginal evidence, and bad evidence; the evidence analysis module identifying information to supplement the marginal evidence, the information that improves the confidence scores for the candidate answers to the first question, the processor generating a plurality of hypotheses concerning the information to supplement the marginal evidence, the information that improves the confidence scores for the candidate answers to the first question, the evidence analysis module producing the secondary question based on each hypothesis of the plurality of hypotheses, an answer to the secondary question improving the ability of the QA system to understand and evaluate evidence associated with candidate answers to the first question, the processor ranking the hypotheses based on relative utility, the ranking determining an order in which the QA system outputs a secondary question to the one or more external sources, the processor presenting the secondary question through the second interface to the one or more external sources separate from the QA system to obtain responses to the secondary question, the processor receiving at least one response to the secondary question from the one or more external sources through the second interface, the evidence analysis module validating the at least one response to the secondary question and extracting a piece of data, fact, syntactical relationship, grammatical relationship, logical rule, or taxonomy rule that improves the confidence scores for the candidate answer to the first question, the validating comprises validating that the responses are supported by a threshold number of external sources, the processor adding the piece of data, fact, syntactical relationship, grammatical relationship, logical rule, or taxonomy rule extracted from the at least one response to the corpus of data. - View Dependent Claims (14, 15, 16, 17)
-
-
18. A non-transitory computer readable storage medium readable by a computerized device,
the computerized device comprising a question-answer system, the non-transitory computer readable storage medium storing instructions executable by the computerized device to perform a method comprising: -
receiving a first question into the question-answer system; comparing the question to the corpus of data; generating candidate answers for the first question from the corpus of data; determining a confidence score for each of the candidate answers based on evidence from the corpus of data used to generate the candidate answers, wherein the evidence comprises good evidence, marginal evidence, and bad evidence; identifying information from the corpus of data to supplement the marginal evidence, the information that improves confidence scores for candidate answers to a first question posed to the question-answer system; automatically generating a plurality of hypotheses concerning the information that supplements the marginal evidence and improves the confidence scores for the candidate answers to the first question posed to the question-answer system, the automatically generating the plurality of hypotheses further comprising; analyzing the first question, for each candidate answer of the candidate answers, forming a hypothesis based on considering each the candidate answer in context of the first question, spawning an independent thread for each hypothesis that attempts to prove the candidate answer, extracting evidence related to each hypothesis from the corpus of data, and for each evidence-hypothesis pair, analyzing elements of the first question and the evidence along dimensions selected form the group consisting of; type classification, time, geography, popularity, passage support, source reliability, and semantic relatedness; automatically generating at least one secondary question based on each of the plurality of hypotheses, an answer to the at least one secondary question improving the ability of the question-answer system to understand and evaluate evidence associated with the candidate answers to the first question; ranking the hypotheses based on relative utility, the ranking determining an order in which to output the at least one secondary question to external sources comprising a community of human respondents, wherein the community of human respondents are capable of answering the at least one secondary question; outputting the at least one secondary question to the external sources using a network interface; receiving responses to the at least one secondary question from the external sources using the network interface; validating the responses to the at least one secondary question to extract a piece of data, fact, syntactical relationship, grammatical relationship, logical rule, or taxonomy rule that improves confidence scores for the candidate answers to the first question, the validating comprises validating that the responses are supported by a threshold number of external sources; and adding the piece of data, fact, syntactical relationship, grammatical relationship, logical rule, or taxonomy rule extracted from responses to the at least one secondary question to the corpus of data. - View Dependent Claims (19, 20, 21)
-
Specification