System and method for domain adaptation in question answering

US 9,240,128 B2
Filed: 09/24/2011
Issued: 01/19/2016
Est. Priority Date: 05/14/2008
Status: Expired due to Fees

First Claim

Patent Images

1. A method for providing adaptation to a question answering system, wherein the question answering system has associated therewith a first corpus of data and a question-answer set, the question-answer set being a collection of questions and correct answers to these questions, such that each question has one or more correct answers associated with it, the method comprising the steps of:

submitting a set of questions to the question answering system;

receiving back from the question answering system a set of answers generated in response to the set of questions, the set of answers that are received back being based upon at least one document in the first corpus of data;

comparing the set of answers received back from the question answering system to answers in the question-answer set;

identifying, based on the comparison of the set of answers received back from the question answering system to answers in the question-answer set, a plurality of answers from the set of answers received back from the question answering system that are not correct;

generating a plurality of groups by performing automated grouping on at least one of;

(a) a plurality of questions from the question-answer set that correspond to the identified answers that are not correct; and

(b) a plurality of answers from the question-answer set that correspond to the identified answers that are not correct;

creating a collection of related terms associated with the groups;

obtaining, from a second corpus of data, textual information about each of the related terms, wherein the second corpus of data is external relative to the first corpus of data;

creating a plurality of textual resources from the obtained information, each of the plurality of textual resources being associated with one of the related terms;

scoring each of the plurality of textual resources based on whether each textual resource is informative with respect to the at least one document in the first corpus of data; and

adding at least one of the created textual resources to the first corpus of data, wherein the at least one of the created textual resources that is added to the first corpus of data comprises a subset of all of the created plurality of textual resources and wherein the at least one of the created textual resources that is added to the first corpus of data had been scored as more informative with respect to the at least one document in the first corpus of data than at least one of the other created textual resources that is not added to the first corpus of data.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present disclosure relates generally to question answering systems and methods and, particularly, to systems and methods for domain adaptation in question answering.

Citations

16 Claims

1. A method for providing adaptation to a question answering system, wherein the question answering system has associated therewith a first corpus of data and a question-answer set, the question-answer set being a collection of questions and correct answers to these questions, such that each question has one or more correct answers associated with it, the method comprising the steps of:
- submitting a set of questions to the question answering system;
  
  receiving back from the question answering system a set of answers generated in response to the set of questions, the set of answers that are received back being based upon at least one document in the first corpus of data;
  
  comparing the set of answers received back from the question answering system to answers in the question-answer set;
  
  identifying, based on the comparison of the set of answers received back from the question answering system to answers in the question-answer set, a plurality of answers from the set of answers received back from the question answering system that are not correct;
  
  generating a plurality of groups by performing automated grouping on at least one of;
  
  (a) a plurality of questions from the question-answer set that correspond to the identified answers that are not correct; and
  
  (b) a plurality of answers from the question-answer set that correspond to the identified answers that are not correct;
  
  creating a collection of related terms associated with the groups;
  
  obtaining, from a second corpus of data, textual information about each of the related terms, wherein the second corpus of data is external relative to the first corpus of data;
  
  creating a plurality of textual resources from the obtained information, each of the plurality of textual resources being associated with one of the related terms;
  
  scoring each of the plurality of textual resources based on whether each textual resource is informative with respect to the at least one document in the first corpus of data; and
  
  adding at least one of the created textual resources to the first corpus of data, wherein the at least one of the created textual resources that is added to the first corpus of data comprises a subset of all of the created plurality of textual resources and wherein the at least one of the created textual resources that is added to the first corpus of data had been scored as more informative with respect to the at least one document in the first corpus of data than at least one of the other created textual resources that is not added to the first corpus of data.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1, wherein each of the answers from the set of answers received back from the question answering system that are not correct comprises one of:
    - (a) an answer that is incorrect; and
      
      (b) an answer that is non-existent.
  - 3. The method of claim 1, wherein the generating a plurality of groups comprises using at least one ontology.
  - 4. The method of claim 1, wherein the generating a plurality of groups comprises at least one of:
    - (a) clustering; and
      
      (b) classifying.
  - 5. The method of claim 1, wherein the generating a plurality of groups comprises performing automated grouping on each of:
    - (a) a plurality of questions from the question-answer set that correspond to the identified answers that are not correct; and
      
      (b) a plurality of answers from the question-answer set that correspond to identified answers that are not correct.
  - 6. The method of claim 1, wherein the obtaining comprises at least one of:
    - (a) obtaining from the world wide web; and
      
      (b) obtaining from an e-commerce source.
  - 7. The method of claim 1, wherein the creating the plurality of textual resources from the obtained information comprises creating at least one of:
    - (a) at least one n-gram collection;
      
      (b) at least one lexicalized relation resource; and
      
      (c) at least one new text document using information obtained from at least one of;
      
      (i) the world wide web; and
      
      (ii) an e-commerce source.
  - 8. The method of claim 1, wherein the steps are carried out in the order recited.

9. A method for providing adaptation to a question answering system, wherein the question answering system has associated therewith a first corpus of data and a question-answer set, the question-answer set being a collection of questions and correct answers to these questions, such that each question has one or more correct answers associated with it, the method comprising the steps of:
- submitting a set of questions to the question answering system;
  
  receiving back from the question answering system a set of answers generated in response to the set of questions, the set of answers that are received back being based upon at least one document in the first corpus of data;
  
  comparing the set of answers received back from the question answering system to answers in the question-answer set;
  
  identifying, based on the comparison of the set of answers received back from the question answering system to answers in the question-answer set, a plurality of answers from the set of answers received back from the question answering system that are not correct;
  
  obtaining, for each question with an answer that is not correct, a corresponding correct answer and computing for each corresponding correct answer an associated semantic data type;
  
  finding, for each semantic data type associated with a corresponding correct answer, a collection of words or expressions that are related to the semantic data type;
  
  obtaining from a second corpus of data, for each of the words or expressions that are in the collection that is found, additional related information and creating a plurality of textual resources with the additional information, each of the plurality of textual resources being associated with one of the words or expressions that are in the collection that is found, wherein the second corpus of data is external relative to the first corpus of data;
  
  scoring each of the plurality of textual resources based on whether each textual resource is informative with respect to the at least one document in the first corpus of data; and
  
  adding at least one of the created textual resources to the first corpus of data, wherein the at least one of the created textual resources that is added to the first corpus of data comprises a subset of all of the created plurality of textual resources and wherein the at least one of the created textual resources that is added to the first corpus of data had been scored as more informative with respect to the at least one document in the first corpus of data than at least one of the other created textual resources that is not added to the first corpus of data.
- View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
- - 10. The method of claim 9, wherein each of the answers from the set of answers received back from the question answering system that are not correct comprises one of:
    - (a) an answer that is incorrect; and
      
      (b) an answer that is non-existent.
  - 11. The method of claim 9, wherein each corresponding correct answer is obtained from the question-answer set.
  - 12. The method of claim 9, wherein the computing comprises a look up in an ontology.
  - 13. The method of claim 9, wherein the computing comprises finding in the second corpus of data one or more expressions and treating a portion of each expression as a semantic type.
  - 14. The method of claim 9, wherein the finding a collection of words or expressions that are related to the semantic data type comprises searching the second corpus of data.
  - 15. The method of claim 9, wherein the finding a collection of words or expressions that are related to the semantic data type comprises using an ontology.
  - 16. The method of claim 9, wherein the steps are carried out in the order recited.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Bagchi, Sugato, Ferrucci, David A., Gondek, David C., Levas, Anthony T., Zadrozny, Wlodek W.
Primary Examiner(s)
Chbouki, Tarek

Application Number

US13/244,431
Publication Number

US 20120077178A1
Time in Patent Office

1,578 Days
Field of Search
US Class Current

1/1
CPC Class Codes

G09B 7/00 Electrically-operated teach...

System and method for domain adaptation in question answering

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

16 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for domain adaptation in question answering

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

16 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links