Providing question and answers with deferred type evaluation using text with limited structure

US 9,798,800 B2
Filed: 09/21/2011
Issued: 10/24/2017
Est. Priority Date: 09/24/2010
Status: Expired due to Fees

First Claim

Patent Images

1. A computer-implemented method for automatically generating answers to questions comprising the steps of:

analyzing a corpus of documents to identify a document containing a list, wherein said list contains item-delimiting markup;

parsing said list to identify type information and entities in said list indicated by said markup, wherein to identify said type information and entities in said list comprises;

extracting a type from a title indicating said list;

determining a presence of item-delimeter mark-up associated with said list, each mark-up delimeter including or not including one or more associated hyperlinks, and for each determined item-delimiter mark-up item;

if a hyperlink is included;

obtaining an instance of a hyperlink in closest proximity to the mark-up item-delimeter, and extracting an entity from a target of said hyperlink instance; and

if a hyperlink is not included;

using an annotator to identify phrases included in text associated with the item-delimeter mark-up, and extracting a most salient phrase as said entity;

creating entity-type pairs, wherein said entity-type pairs comprise said extracted entities and the identified type from said list;

receiving a lexical answer type associated with an input query;

receiving a candidate answer to said query;

determining whether said candidate answer is associated with an entity in said created entity-type pairs;

for any associated entity-type pairs, comparing said extracted type in said associated entity-type pair with said lexical answer type;

generating a type-matching score, wherein said type-matching score is indicative of a quality of said obtained candidate answer based on matching types; and

using said type-matching score to evaluate said candidate answer as an answer to said query;

wherein a hardware processor automatically performs one or more of said steps.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system, method and computer program product for conducting questions and answers with deferred type evaluation based on any corpus of data. The method includes processing a query including waiting until a “Type” (i.e. a descriptor) is determined AND a candidate answer is provided. Then, a search is conducted to look (search) for evidence that the candidate answer has the required Lexical Answer Type (e.g., as determined by a matching function that can leverage a parser, a semantic interpreter and/or a simple pattern matcher). Prior to or during candidate answer evaluation, a process is provided for extracting and storing collections of entity-type pairs from semi-structured text documents. During QA processing and candidate answer scoring, a process is implemented to match the query LAT against the lexical type of each provided candidate answer and generate a score judging a degree of match.

Citations

21 Claims

1. A computer-implemented method for automatically generating answers to questions comprising the steps of:
- analyzing a corpus of documents to identify a document containing a list, wherein said list contains item-delimiting markup;
  
  parsing said list to identify type information and entities in said list indicated by said markup, wherein to identify said type information and entities in said list comprises;
  
  extracting a type from a title indicating said list;
  
  determining a presence of item-delimeter mark-up associated with said list, each mark-up delimeter including or not including one or more associated hyperlinks, and for each determined item-delimiter mark-up item;
  
  if a hyperlink is included;
  
  obtaining an instance of a hyperlink in closest proximity to the mark-up item-delimeter, and extracting an entity from a target of said hyperlink instance; and
  
  if a hyperlink is not included;
  
  using an annotator to identify phrases included in text associated with the item-delimeter mark-up, and extracting a most salient phrase as said entity;
  
  creating entity-type pairs, wherein said entity-type pairs comprise said extracted entities and the identified type from said list;
  
  receiving a lexical answer type associated with an input query;
  
  receiving a candidate answer to said query;
  
  determining whether said candidate answer is associated with an entity in said created entity-type pairs;
  
  for any associated entity-type pairs, comparing said extracted type in said associated entity-type pair with said lexical answer type;
  
  generating a type-matching score, wherein said type-matching score is indicative of a quality of said obtained candidate answer based on matching types; and
  
  using said type-matching score to evaluate said candidate answer as an answer to said query;
  
  wherein a hardware processor automatically performs one or more of said steps.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21)
- - 2. The computer-implemented method of claim 1, further comprising:
    - storing, in a further memory storage device, said created entity-type pairs, each entity-pair structure representing said one or more entities and associated type,wherein said determining whether said candidate answer is associated with an entity in said entity-type pairs comprises;
      
      accessing said stored entity-type pairs to identify a type from an entity-type pair.
  - 3. The computer-implemented method of claim 2, wherein said corpus analyzing, said list identifying, said list parsing and said creating entity-type pairs are performed off-line as a pre-processing step.
  - 4. The computer-implemented method of claim 2, wherein said corpus analyzing, said list identifying, said list parsing and said creating entity-type pairs are performed on-demand by a Question Answer system at a time of receiving said input query.
  - 5. The computer-implemented method of claim 2, further comprising:
    - parsing each respective said lexical answer type associated with an input query and said identified type for any associated entity-type pair of a candidate answer, to obtain respective terms or phrases for each.
  - 6. The computer-implemented method of claim 1, wherein said comparing said identified type in said associated entity-type pair with said lexical answer type comprises:
    - matching individual terms of respective lexical answer type associated with an input query and said identified type for any associated entity-type pair, ormatching entire phrases of each respective lexical answer type associated with an input query and said identified type for any associated entity-type pair.
  - 7. The computer-implemented method of claim 5, further comprising:
    - determining which terms of respective said lexical answer type associated with an input query and said identified type for any associated entity-type pair are to be used for said term matching;
      
      obtaining individual scores based on degree of match between terms of respective said lexical answer type associated with an input query and said identified type for any associated entity-type pair; and
      
      combining obtained scores of said matches determined for said phrase matching.
  - 8. The computer-implemented method as claimed in claim 1, wherein said analyzing a corpus of documents to identify a document containing a list comprises:
    - parsing content of said corpus to identify said item-delimiting markup, said item delimited mark-up specifying an associated type and entities forming an entity-type pair.
  - 9. The computer-implemented method as claimed in claim 1, wherein said item-delimiting markup includes a title, a header, a recitation of the word “
    - list”
      
      of entities of a specified type, bullet markers, parentheses, a hypertext link, a Uniform Resource Locator, a table in said data source.
  - 10. The computer-implemented method as claimed in claim 1, wherein said item-delimiting markup includes a tag representing a category or sub-category including entities of a specified type.
  - 11. The computer-implemented method of claim 6, wherein term matching is one of:
    - aggregate term matching or primitive term matching.
  - 12. The computer-implemented method of claim 11, wherein said phrase matching employs a phrase matcher, said method comprising:
    - receiving, at said phrase matcher, an input pair of phrases, each phrase having one or more terms, andproducing, at said phrase matcher, a judgment regarding a degree of match between the two phrases.
  - 13. The computer-implemented method of claim 12, wherein said primitive term matching employs a term matcher, said method comprising:
    - receiving, at said term matcher, an input pair of terms;
      
      producing, at said term matcher, a judgment regarding the degree of match between the pair of terms.
  - 14. The computer-implemented method of claim 12, wherein a phrase matcher comprises:
    - a headword phrase matcher,wherein each phrase is a headword plus a collection of modifiers, said phrase matching comprising;
      
      matching headwords to headwords and modifiers to modifiers.
  - 15. The computer-implemented method of claim 11, wherein said primitive term matcher is a geopolitical term matcher for matching pairs of terms that are both geopolitical entities.
  - 16. The computer-implemented method of claim 11, wherein said primitive term matcher is a thesaurus synonym term matcher for matching pairs of terms that are synonyms in a known thesaurus.
  - 17. The computer-implemented method of claim 11, wherein said primitive term matcher is a string-edit-distance term matcher for matching pairs of terms that have approximately the same letters.
  - 18. The computer-implemented method of claim 12, wherein said phrase matcher comprises an aggregate term matcher for combining multiple term matchers, said aggregate term matcher employing one or more delegate term matchers, a delegate term matcher being one of primitive term matcher or an aggregate term matcher, an aggregate term matcher invoking one or more of its delegates, said aggregate term matcher combining a score of delegate term matchers according to a combination logic.
  - 19. The computer-implemented method of claim 18, further comprising:
    - receiving, at the phrase matcher, input text strings representing the phrases to be matched;
      
      implementing logic, at said phrase matcher, for choosing pairs of terms, from phrases comprising;
      
      question terms and passage terms; and
      
      determining if any terms are delegate terms, and if any terms are delegate terms,determining whether a delegate term is aggregated to includes multiple terms; and
      
      responsive to determining a delegate term is aggregated, invoking, at the phrase matcher, its delegate term aggregate matcher;
      
      otherwise, invoking a primitive term matcher.
  - 20. The computer-implemented method of claim 19, wherein said aggregate term matcher runs all of its delegate matchers;
    - and returns a sum of all of the scores of all the delegates, said aggregate term matcher comprising one of;
      
      a maximum score aggregate term matcher for receiving an input pair of terms, applying each of its delegates to that pair of terms, and returning a maximum score across all of the delegates;
      
      ora product of scores aggregate term matcher for receiving an input pair of terms, applying each of its delegates to that pair of terms, and multiplying together all of the scores of all of the delegate.
  - 21. The computer-implemented method of claim 20, wherein said aggregate term matcher uses a statistical model derived from machine learning to combine the scores of the delegates into a score for the aggregate.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Duboue, Pablo A., Fan, James J., Ferrucci, David A., Murdock, IV, James W., Welty, Christopher A., Zadrozny, Wlodek W.
Primary Examiner(s)
Trujillo, James
Assistant Examiner(s)
Morris, John J

Application Number

US13/239,165
Publication Number

US 20120078902A1
Time in Patent Office

2,225 Days
Field of Search

707708, 707736, 707755, 707760, 707771, 707E17014, 707999003, 707999102, 704 9, 704257
US Class Current
CPC Class Codes

G06F 16/334 Query execution G06F16/335 ...

Providing question and answers with deferred type evaluation using text with limited structure

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

21 Claims

Specification

Solutions

Use Cases

Quick Links

Providing question and answers with deferred type evaluation using text with limited structure

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

21 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links