Generating secondary questions in an introspective question answering system
First Claim
1. A method of generating secondary questions in a question-answer system comprising a processor running software for performing a plurality of question answering processes and a corpus of data;
- the processor comparing the first question to the corpus of data;
obtaining, from the question-answer system, candidate answers to the first question posed to the question-answer system, the candidate answers for the first question being generated from the corpus of data, each candidate answer being associated with evidence from the corpus of data;
analyzing the evidence from the corpus of data associated with each candidate answer, the evidence supporting or refuting each candidate answer, the analyzing the evidence further comprises assigning an evidence score to the evidence based on how well the evidence matches the first question, wherein the evidence comprises good evidence, marginal evidence, and bad evidence;
calculating confidence scores for the candidate answers to the first question based on the evidence from the corpus of data;
identifying information to supplement the marginal evidence, the information improves the confidence scores for the candidate answers to the first question, the information not being in the corpus of data;
automatically generating a plurality of hypotheses concerning the information that supplements the marginal evidence and improves the confidence scores for the candidate answers to-the first question, each hypothesis of the plurality of hypotheses being related to the information to improve the ability of the question-answer system to understand and evaluate the evidence associated with the candidate answers to the first question;
automatically generating at least one secondary question based on each hypothesis of the plurality of hypotheses concerning the information that supplements the marginal information and improves the confidence scores for the candidate answers to the first question, each the at least one secondary question being formulated as a natural language inquiry in a human understandable format;
ranking the hypotheses based on relative utility to determine an order in which to output the at least one secondary question to external sources to obtain responses;
outputting, in rank order of the hypotheses, the at least one secondary question to the external sources in natural language format, the external sources comprising a community of human respondents;
receiving responses to the at least one secondary question from the external sources;
validating the responses to the at least one secondary question to extract a piece of data, fact, syntactical relationship, grammatical relationship, logical rule, or taxonomy rule that improves said confidence scores for the candidate answers to the first question, the validating comprises validating that the responses are supported by a threshold number of external sources; and
adding the piece of data, fact, syntactical relationship, grammatical relationship, logical rule, or taxonomy rule extracted from the responses to the at least one secondary question to the corpus of data.
1 Assignment
0 Petitions
Accused Products
Abstract
A method of generating secondary questions in a question-answer system. Missing information is identified from a corpus of data using a computerized device. The missing information comprises any information that improves confidence scores for candidate answers to a question. The computerized device automatically generates a plurality of hypotheses concerning the missing information. The computerized device automatically generates at least one secondary question based on each of the plurality of hypotheses. The hypotheses are ranked based on relative utility to determine an order in which the computerized device outputs the at least one secondary question to external sources to obtain responses.
193 Citations
23 Claims
-
1. A method of generating secondary questions in a question-answer system comprising a processor running software for performing a plurality of question answering processes and a corpus of data;
-
the processor comparing the first question to the corpus of data; obtaining, from the question-answer system, candidate answers to the first question posed to the question-answer system, the candidate answers for the first question being generated from the corpus of data, each candidate answer being associated with evidence from the corpus of data; analyzing the evidence from the corpus of data associated with each candidate answer, the evidence supporting or refuting each candidate answer, the analyzing the evidence further comprises assigning an evidence score to the evidence based on how well the evidence matches the first question, wherein the evidence comprises good evidence, marginal evidence, and bad evidence; calculating confidence scores for the candidate answers to the first question based on the evidence from the corpus of data; identifying information to supplement the marginal evidence, the information improves the confidence scores for the candidate answers to the first question, the information not being in the corpus of data; automatically generating a plurality of hypotheses concerning the information that supplements the marginal evidence and improves the confidence scores for the candidate answers to-the first question, each hypothesis of the plurality of hypotheses being related to the information to improve the ability of the question-answer system to understand and evaluate the evidence associated with the candidate answers to the first question; automatically generating at least one secondary question based on each hypothesis of the plurality of hypotheses concerning the information that supplements the marginal information and improves the confidence scores for the candidate answers to the first question, each the at least one secondary question being formulated as a natural language inquiry in a human understandable format; ranking the hypotheses based on relative utility to determine an order in which to output the at least one secondary question to external sources to obtain responses; outputting, in rank order of the hypotheses, the at least one secondary question to the external sources in natural language format, the external sources comprising a community of human respondents; receiving responses to the at least one secondary question from the external sources; validating the responses to the at least one secondary question to extract a piece of data, fact, syntactical relationship, grammatical relationship, logical rule, or taxonomy rule that improves said confidence scores for the candidate answers to the first question, the validating comprises validating that the responses are supported by a threshold number of external sources; and adding the piece of data, fact, syntactical relationship, grammatical relationship, logical rule, or taxonomy rule extracted from the responses to the at least one secondary question to the corpus of data. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A method of generating follow-on inquiries in a question answering system, the method comprising the steps of:
-
receiving a question into an automated question answering system; attempting to answer the question by the automated question answering system, the automated question answering system comprising a processor using software for performing a plurality of question answering processes and a corpus of data; the processor comparing the question to the corpus of data and generating a plurality of candidate answers to the question; determining a confidence score for each candidate answer of the plurality of candidate answers based on evidence from the corpus of data used to generate the each candidate answer of the plurality of candidate answers, wherein the evidence comprises good evidence, marginal evidence, and bad evidence; identifying information to supplement the marginal evidence, the information improves the confidence scores for at least one of the candidate answers, the information not being in the corpus of data; automatically generating a plurality of hypotheses concerning the information that supplements the marginal information and improves the confidence scores for at least one of the candidate answers, each hypothesis of the plurality of hypotheses being related to the information to improve the ability of the question answering system to understand and evaluate the evidence used to generate the candidate answers; automatically generating at least one follow-on inquiry based on each hypothesis of the plurality of hypotheses concerning the information that supplements the marginal information and improves a confidence score for at least the of said candidate answers, each the at least one follow-on inquiry being formulated as a natural language inquiry in a human understandable format; ranking the hypotheses based on relative utility to determine an order for the automated question answering system to output the at least one follow-on inquiry to external sources to obtain responses, the external sources comprising a community of human respondents; outputting, in rank order of the hypotheses, said at least one follow-on inquiry to the external sources in natural language format; receiving responses to the at least one follow-on inquiry from the external sources; validating the responses to the at least one follow-on inquiry and extracting a piece of data, fact, syntactical relationship, grammatical relationship, logical rule, or taxonomy rule that improves the confidence scores for the at least one candidate answer, and adding the piece of data, fact, syntactical relationship, grammatical relationship, logical rule, or taxonomy rule extracted from the responses to the at least one follow-on inquiry to the corpus of data. - View Dependent Claims (8, 9, 10, 11, 12, 13)
-
-
14. A method comprising:
-
providing a first question to a Question Answering (QA) system, the QA system comprising a processor using software for performing a plurality of question answering processes and a corpus of data; the processor comparing the first question to the corpus of data and creating a collection of candidate answers to the first question, each candidate answer in the collection of candidate answers being created from the corpus of data; generating evidence from the corpus of data that supports or refutes each the candidate answer, wherein generating the evidence further comprises assigning an evidence score to the evidence based on how well the evidence matches the first question, wherein the evidence comprises good evidence, marginal evidence, and bad evidence; calculating confidence scores for each of the candidate answers to the first question based on the evidence from the corpus of data; identifying information to supplement the marginal evidence, the information improves a confidence score for at least one of the candidate answers to the first question, the information not being in the corpus of data; generating a plurality of hypotheses concerning the information that supplements the marginal evidence and improves the confidence score for the at least one of the candidate answers to the first question, each hypothesis of the plurality of hypotheses being related to the information to improve the ability of the QA system to understand and evaluate the evidence associated with the candidate answers to the first question; producing a secondary question based on each hypothesis of the plurality of hypotheses, each the secondary question being formulated as a natural language inquiry in a human understandable format; ranking the hypotheses based on relative utility to determine an order in which the QA system outputs the secondary question to external sources, the external sources comprising an expert community of human respondents; outputting, in rank order of the hypotheses, the secondary question to the external sources in natural language format; receiving responses to the secondary question from the external sources; validating the responses to the secondary question and extracting a piece of data, fact, syntactical relationship, grammatical relationship, logical rule, or taxonomy rule that improves the confidence scores for the at least one candidate answer, the validating comprises validating that the responses are supported by a threshold number of external sources; and adding the piece of data, fact, syntactical relationship, grammatical relationship, logical rule, or taxonomy rule extracted from the responses to the secondary question to the corpus of data. - View Dependent Claims (15, 16, 17, 18)
-
-
19. A method comprising:
-
receiving a question into a Question Answering (QA) system; the QA system comparing the question to a corpus of data; the QA system creating a collection of candidate answers to the question from the corpus of data; the QA system analyzing evidence from the corpus of data that supports or refutes each candidate answer of the collection of candidate answers, the analyzing the evidence further comprises assigning an evidence score to the evidence based on how well the evidence matches the first question, wherein the evidence comprises good evidence, marginal evidence, and bad evidence; the QA system calculating confidence scores for each candidate answer of the collection of candidate answers to the question based on the evidence from the corpus of data; the QA system identifying information to supplement the marginal evidence, the information improves the confidence score for at least one candidate answer in the collection of candidate answers, the information not being in the corpus of data; the QA system generating a plurality of hypotheses concerning the question and the collection of candidate answers, each hypothesis of the plurality of hypotheses being related to the information to improve the ability of the QA system to understand and evaluate the evidence associated with the candidate answer to the question, the generating the plurality of hypotheses further comprising; analyzing the first question, for each candidate answer of the candidate answers, forming a hypothesis based on considering each the candidate answer in context of the first question, spawning an independent thread for each hypothesis that attempts to prove the candidate answer, extracting evidence related to each hypothesis from the corpus of data, and for each evidence-hypothesis pair, analyzing elements of the first question and the evidence along dimensions selected form the group consisting of; type classification, time, geography, popularity, passage support, source reliability, and semantic relatedness; the QA system generating at least one follow-on inquiry based on each hypothesis of the plurality of hypotheses, each the at least one follow-on inquiry being formulated as a natural language inquiry in a human understandable format, an answer to the at least one follow-on inquiry improving the ability of the QA system to understand and evaluate evidence associated with the candidate answers to the question; the QA system ranking the hypotheses based on relative utility to determine an order in which to output the at least one follow-on inquiry to external sources, the external sources comprising an expert community of human respondents; the QA system outputting, in rank order of the hypothesis, the at least one follow-on inquiry to the external sources in natural language format; the QA system receiving responses to the follow-on inquiry from the external sources using the network interface; the QA system validating the responses to the follow-on inquiry and extracting a piece of data, fact, syntactical relationship, grammatical relationship, logical rule, or taxonomy rule that improves the confidence scores for the at least one candidate answer, the validating comprises validating that the responses are supported by a threshold number of external sources; and the QA system adding the piece of data, fact, syntactical relationship, grammatical relationship, logical rule, or taxonomy rule extracted from the responses to the follow-on inquiry to the corpus of data. - View Dependent Claims (20, 21, 22, 23)
-
Specification