Generating secondary questions in an introspective question answering system

US 10,621,880 B2
Filed: 09/11/2012
Issued: 04/14/2020
Est. Priority Date: 09/11/2012
Status: Active Grant

First Claim

Patent Images

1. A method of generating secondary questions in a question-answer system comprising a processor running software for performing a plurality of question answering processes and a corpus of data;

the processor comparing the first question to the corpus of data;

obtaining, from the question-answer system, candidate answers to the first question posed to the question-answer system, the candidate answers for the first question being generated from the corpus of data, each candidate answer being associated with evidence from the corpus of data;

analyzing the evidence from the corpus of data associated with each candidate answer, the evidence supporting or refuting each candidate answer, the analyzing the evidence further comprises assigning an evidence score to the evidence based on how well the evidence matches the first question, wherein the evidence comprises good evidence, marginal evidence, and bad evidence;

calculating confidence scores for the candidate answers to the first question based on the evidence from the corpus of data;

identifying information to supplement the marginal evidence, the information improves the confidence scores for the candidate answers to the first question, the information not being in the corpus of data;

automatically generating a plurality of hypotheses concerning the information that supplements the marginal evidence and improves the confidence scores for the candidate answers to-the first question, each hypothesis of the plurality of hypotheses being related to the information to improve the ability of the question-answer system to understand and evaluate the evidence associated with the candidate answers to the first question;

automatically generating at least one secondary question based on each hypothesis of the plurality of hypotheses concerning the information that supplements the marginal information and improves the confidence scores for the candidate answers to the first question, each the at least one secondary question being formulated as a natural language inquiry in a human understandable format;

ranking the hypotheses based on relative utility to determine an order in which to output the at least one secondary question to external sources to obtain responses;

outputting, in rank order of the hypotheses, the at least one secondary question to the external sources in natural language format, the external sources comprising a community of human respondents;

receiving responses to the at least one secondary question from the external sources;

validating the responses to the at least one secondary question to extract a piece of data, fact, syntactical relationship, grammatical relationship, logical rule, or taxonomy rule that improves said confidence scores for the candidate answers to the first question, the validating comprises validating that the responses are supported by a threshold number of external sources; and

adding the piece of data, fact, syntactical relationship, grammatical relationship, logical rule, or taxonomy rule extracted from the responses to the at least one secondary question to the corpus of data.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method of generating secondary questions in a question-answer system. Missing information is identified from a corpus of data using a computerized device. The missing information comprises any information that improves confidence scores for candidate answers to a question. The computerized device automatically generates a plurality of hypotheses concerning the missing information. The computerized device automatically generates at least one secondary question based on each of the plurality of hypotheses. The hypotheses are ranked based on relative utility to determine an order in which the computerized device outputs the at least one secondary question to external sources to obtain responses.

193 Citations

23 Claims

1. A method of generating secondary questions in a question-answer system comprising a processor running software for performing a plurality of question answering processes and a corpus of data;
- the processor comparing the first question to the corpus of data;
  
  obtaining, from the question-answer system, candidate answers to the first question posed to the question-answer system, the candidate answers for the first question being generated from the corpus of data, each candidate answer being associated with evidence from the corpus of data;
  
  analyzing the evidence from the corpus of data associated with each candidate answer, the evidence supporting or refuting each candidate answer, the analyzing the evidence further comprises assigning an evidence score to the evidence based on how well the evidence matches the first question, wherein the evidence comprises good evidence, marginal evidence, and bad evidence;
  
  calculating confidence scores for the candidate answers to the first question based on the evidence from the corpus of data;
  
  identifying information to supplement the marginal evidence, the information improves the confidence scores for the candidate answers to the first question, the information not being in the corpus of data;
  
  automatically generating a plurality of hypotheses concerning the information that supplements the marginal evidence and improves the confidence scores for the candidate answers to-the first question, each hypothesis of the plurality of hypotheses being related to the information to improve the ability of the question-answer system to understand and evaluate the evidence associated with the candidate answers to the first question;
  
  automatically generating at least one secondary question based on each hypothesis of the plurality of hypotheses concerning the information that supplements the marginal information and improves the confidence scores for the candidate answers to the first question, each the at least one secondary question being formulated as a natural language inquiry in a human understandable format;
  
  ranking the hypotheses based on relative utility to determine an order in which to output the at least one secondary question to external sources to obtain responses;
  
  outputting, in rank order of the hypotheses, the at least one secondary question to the external sources in natural language format, the external sources comprising a community of human respondents;
  
  receiving responses to the at least one secondary question from the external sources;
  
  validating the responses to the at least one secondary question to extract a piece of data, fact, syntactical relationship, grammatical relationship, logical rule, or taxonomy rule that improves said confidence scores for the candidate answers to the first question, the validating comprises validating that the responses are supported by a threshold number of external sources; and
  
  adding the piece of data, fact, syntactical relationship, grammatical relationship, logical rule, or taxonomy rule extracted from the responses to the at least one secondary question to the corpus of data.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method of claim 1, the automatically generating a plurality of hypotheses concerning the information that improves the confidence scores for the candidate answers to the first question further comprising:
    - analyzing the first question;
      
      for each of the candidate answers, forming a hypothesis based on considering each candidate answer in context of the first question;
      
      spawning an independent thread for each hypothesis that attempts to prove the candidate answer;
      
      extracting evidence related to each hypothesis from the corpus of data; and
      
      for each evidence-hypothesis pair, analyzing elements of the first question and the evidence.
  - 3. The method of claim 2, the analyzing the first question and the evidence being along dimensions selected from the group consisting of:
    - type classification,time,geography,popularity,passage support,source reliability, andsemantic relatedness.
  - 4. The method of claim 1, factors affecting the ranking the hypotheses further comprise at least one of:
    - cumulative impact on all candidate answers based on the responses to one or more secondary questions,estimated impact on future questions in the corpus of data based on determining a likelihood that a hypothesis will appear in relevant content for future questions,frequency of observation of terms in the question in the corpus of data, andlikelihood that an intuitive and answerable secondary question may be formulated for a human expert.
  - 5. The method of claim 1,the processor generating at least one secondary question comprising at least one of:
    - using the question and the evidence to formulate the at least one secondary question,using semantic concepts to formulate the at least one secondary question,using relations and data to formulate the at least one secondary question, andusing background knowledge to formulate the at least one secondary question.
  - 6. The method of claim 1, the at least one secondary question being selected from the group consisting of:
    - simple Yes/No questions;
      
      questions requiring responses on a qualitative scale; and
      
      questions requiring a quantitative response.

7. A method of generating follow-on inquiries in a question answering system, the method comprising the steps of:
- receiving a question into an automated question answering system;
  
  attempting to answer the question by the automated question answering system, the automated question answering system comprising a processor using software for performing a plurality of question answering processes and a corpus of data;
  
  the processor comparing the question to the corpus of data and generating a plurality of candidate answers to the question;
  
  determining a confidence score for each candidate answer of the plurality of candidate answers based on evidence from the corpus of data used to generate the each candidate answer of the plurality of candidate answers, wherein the evidence comprises good evidence, marginal evidence, and bad evidence;
  
  identifying information to supplement the marginal evidence, the information improves the confidence scores for at least one of the candidate answers, the information not being in the corpus of data;
  
  automatically generating a plurality of hypotheses concerning the information that supplements the marginal information and improves the confidence scores for at least one of the candidate answers, each hypothesis of the plurality of hypotheses being related to the information to improve the ability of the question answering system to understand and evaluate the evidence used to generate the candidate answers;
  
  automatically generating at least one follow-on inquiry based on each hypothesis of the plurality of hypotheses concerning the information that supplements the marginal information and improves a confidence score for at least the of said candidate answers, each the at least one follow-on inquiry being formulated as a natural language inquiry in a human understandable format;
  
  ranking the hypotheses based on relative utility to determine an order for the automated question answering system to output the at least one follow-on inquiry to external sources to obtain responses, the external sources comprising a community of human respondents;
  
  outputting, in rank order of the hypotheses, said at least one follow-on inquiry to the external sources in natural language format;
  
  receiving responses to the at least one follow-on inquiry from the external sources;
  
  validating the responses to the at least one follow-on inquiry and extracting a piece of data, fact, syntactical relationship, grammatical relationship, logical rule, or taxonomy rule that improves the confidence scores for the at least one candidate answer, andadding the piece of data, fact, syntactical relationship, grammatical relationship, logical rule, or taxonomy rule extracted from the responses to the at least one follow-on inquiry to the corpus of data.
- View Dependent Claims (8, 9, 10, 11, 12, 13)
- - 8. The method of claim 7, the automatically generating a plurality of hypotheses further comprising:
    - analyzing the question;
      
      for each candidate answer of the plurality of candidate answers, forming a hypothesis based on considering each the candidate answer in context of the question;
      
      spawning an independent thread for each hypothesis that attempts to prove the candidate answer;
      
      extracting evidence related to each hypothesis from the corpus of data; and
      
      for each evidence-hypothesis pair, analyzing elements of the question and the evidence.
  - 9. The method of claim 8, the analyzing said question and the evidence being along dimensions selected from the group consisting of:
    - type classification,time,geography,popularity,passage support,source reliability, andsemantic relatedness.
  - 10. The method of claim 7, the information that improves a confidence score for at least one of the candidate answers comprising a data, a fact, a syntactical relationship, a grammatical relationship, a logical rule, or a taxonomy rule, not in the corpus of data.
  - 11. The method of claim 7, factors affecting the ranking the hypotheses further comprising at least on of:
    - cumulative impact on all candidate answers based on the responses to one or more follow-on inquiries,estimated impact on future questions in the corpus of data based on determining a likelihood that a hypothesis will appear in relevant content for future questions,frequency of observation of terms in the question in the corpus of data, andlikelihood that an intuitive and answerable follow-on inquiry may be formulated for a human expert.
  - 12. The method of claim 7,the generating at least one follow-on inquiry comprising at least one of:
    - using the question and evidence to generate the at least one follow-on inquiry,using semantic concepts to generate the at least one follow-on inquiry,using relations and data to generate the at least one follow-on inquiry, andusing background knowledge to generate the at least one follow-on inquiry.
  - 13. The method of claim 7, the at least one follow-on inquiry being selected from the group consisting of:
    - simple Yes/No questions;
      
      questions requiring responses on a qualitative scale; and
      
      questions requiring a quantitative response.

14. A method comprising:
- providing a first question to a Question Answering (QA) system, the QA system comprising a processor using software for performing a plurality of question answering processes and a corpus of data;
  
  the processor comparing the first question to the corpus of data and creating a collection of candidate answers to the first question, each candidate answer in the collection of candidate answers being created from the corpus of data;
  
  generating evidence from the corpus of data that supports or refutes each the candidate answer, wherein generating the evidence further comprises assigning an evidence score to the evidence based on how well the evidence matches the first question, wherein the evidence comprises good evidence, marginal evidence, and bad evidence;
  
  calculating confidence scores for each of the candidate answers to the first question based on the evidence from the corpus of data;
  
  identifying information to supplement the marginal evidence, the information improves a confidence score for at least one of the candidate answers to the first question, the information not being in the corpus of data;
  
  generating a plurality of hypotheses concerning the information that supplements the marginal evidence and improves the confidence score for the at least one of the candidate answers to the first question, each hypothesis of the plurality of hypotheses being related to the information to improve the ability of the QA system to understand and evaluate the evidence associated with the candidate answers to the first question;
  
  producing a secondary question based on each hypothesis of the plurality of hypotheses, each the secondary question being formulated as a natural language inquiry in a human understandable format;
  
  ranking the hypotheses based on relative utility to determine an order in which the QA system outputs the secondary question to external sources, the external sources comprising an expert community of human respondents;
  
  outputting, in rank order of the hypotheses, the secondary question to the external sources in natural language format;
  
  receiving responses to the secondary question from the external sources;
  
  validating the responses to the secondary question and extracting a piece of data, fact, syntactical relationship, grammatical relationship, logical rule, or taxonomy rule that improves the confidence scores for the at least one candidate answer, the validating comprises validating that the responses are supported by a threshold number of external sources; and
  
  adding the piece of data, fact, syntactical relationship, grammatical relationship, logical rule, or taxonomy rule extracted from the responses to the secondary question to the corpus of data.
- View Dependent Claims (15, 16, 17, 18)
- - 15. The method of claim 14, the generating a plurality of hypotheses concerning the information that improves the confidence scores for the candidate answers to the first question further comprising:
    - analyzing the first question;
      
      for each candidate answer or the collection of candidate answers, forming a hypothesis based on considering each candidate answer in context of the first question;
      
      spawning an independent thread for each hypothesis that attempts to prove the candidate answer;
      
      extracting evidence related to each hypothesis from the corpus of data; and
      
      for each evidence-hypothesis pair, analyzing elements of the first question and the evidence along dimensions selected from the group consisting of;
      
      type classification,time,geography,popularity,passage support,source reliability, andsemantic relatedness.
  - 16. The method of claim 14, factors affecting the ranking of the hypotheses further comprising at least one of:
    - cumulative impact on all candidate answers based on responses to one or more secondary questions,estimated impact on future questions in the corpus of data based on determining a likelihood that a hypothesis will appear in relevant content for further questions,frequency of observations of terms in the first question in the corpus of data, andlikelihood that an intuitive and answerable secondary question may be formulated for a human expert.
  - 17. The method of claim 14, the producing a secondary question comprising at least one of:
    - using said question and evidence to generate the secondary question,using semantic concepts to generate the secondary question,using relations and data to generate the secondary question, andusing background knowledge to generate the secondary question.
  - 18. The method of claim 14, the secondary question being selected from the group consisting of:
    - simple Yes/No questions;
      
      questions requiring responses on a qualitative scale; and
      
      questions requiring a quantitative response.

19. A method comprising:
- receiving a question into a Question Answering (QA) system;
  
  the QA system comparing the question to a corpus of data;
  
  the QA system creating a collection of candidate answers to the question from the corpus of data;
  
  the QA system analyzing evidence from the corpus of data that supports or refutes each candidate answer of the collection of candidate answers, the analyzing the evidence further comprises assigning an evidence score to the evidence based on how well the evidence matches the first question, wherein the evidence comprises good evidence, marginal evidence, and bad evidence;
  
  the QA system calculating confidence scores for each candidate answer of the collection of candidate answers to the question based on the evidence from the corpus of data;
  
  the QA system identifying information to supplement the marginal evidence, the information improves the confidence score for at least one candidate answer in the collection of candidate answers, the information not being in the corpus of data;
  
  the QA system generating a plurality of hypotheses concerning the question and the collection of candidate answers, each hypothesis of the plurality of hypotheses being related to the information to improve the ability of the QA system to understand and evaluate the evidence associated with the candidate answer to the question, the generating the plurality of hypotheses further comprising;
  
  analyzing the first question,for each candidate answer of the candidate answers, forming a hypothesis based on considering each the candidate answer in context of the first question,spawning an independent thread for each hypothesis that attempts to prove the candidate answer,extracting evidence related to each hypothesis from the corpus of data, andfor each evidence-hypothesis pair, analyzing elements of the first question and the evidence along dimensions selected form the group consisting of;
  
  type classification,time,geography,popularity,passage support,source reliability, andsemantic relatedness;
  
  the QA system generating at least one follow-on inquiry based on each hypothesis of the plurality of hypotheses, each the at least one follow-on inquiry being formulated as a natural language inquiry in a human understandable format, an answer to the at least one follow-on inquiry improving the ability of the QA system to understand and evaluate evidence associated with the candidate answers to the question;
  
  the QA system ranking the hypotheses based on relative utility to determine an order in which to output the at least one follow-on inquiry to external sources, the external sources comprising an expert community of human respondents;
  
  the QA system outputting, in rank order of the hypothesis, the at least one follow-on inquiry to the external sources in natural language format;
  
  the QA system receiving responses to the follow-on inquiry from the external sources using the network interface;
  
  the QA system validating the responses to the follow-on inquiry andextracting a piece of data, fact, syntactical relationship, grammatical relationship, logical rule, or taxonomy rule that improves the confidence scores for the at least one candidate answer, the validating comprises validating that the responses are supported by a threshold number of external sources; and
  
  the QA system adding the piece of data, fact, syntactical relationship, grammatical relationship, logical rule, or taxonomy rule extracted from the responses to the follow-on inquiry to the corpus of data.
- View Dependent Claims (20, 21, 22, 23)
- - 20. The method of claim 19, the information that improves the confidence scores for the candidate answers comprising a data, a fact, a syntactical relationship, a grammatical relationship, a logical rule, or a taxonomy rule, not in said corpus data.
  - 21. The method of claim 19, factors affecting the ranking said hypotheses further comprising at least one of:
    - cumulative impact on all candidate answers based on responses to one or more follow-on inquiries,estimated impact on future questions in the corpus of data based on determining a likelihood that a hypothesis will appear in relevant content for future questions,frequency of observation of terms in the question in the corpus of data, and likelihood that an intuitive and answerable follow-on inquiry may be formulated for a human expert.
  - 22. The method of claim 19, the generating at least one follow-on inquiry comprising formulating a natural language inquiry, and the generating at least one follow-on inquiry comprising at least one of:
    - using said question and evidence to generate the at least one follow-on inquiry,using semantic concepts to generate the at least one follow-on inquiry,using relations and data to generate the at least one follow-on inquiry, andusing background knowledge to generate the at least one follow-on inquiry.
  - 23. The method of claim 19, the at least one follow-on inquiry being selected from the group consisting of:
    - simple Yes/No questions;
      
      questions requiring responses on a qualitative scale; and
      
      questions requiring a quantitative response.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Boguraev, Branimir K., Buchanan, David W., Chu-Carroll, Jennifer, Ferrucci, David A., Kalyanpur, Aditya A., Murdock, IV, James W., Patwardhan, Siddharth A.
Primary Examiner(s)
Chaki, Kakali
Assistant Examiner(s)
Seck, Ababacar

Application Number

US13/610,267
Publication Number

US 20140072947A1
Time in Patent Office

2,772 Days
Field of Search

706 45
US Class Current
CPC Class Codes

G09B 7/00 Electrically-operated teach...

Generating secondary questions in an introspective question answering system

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

193 Citations

23 Claims

Specification

Solutions

Use Cases

Quick Links

Generating secondary questions in an introspective question answering system

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

193 Citations

23 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links