Using paraphrase metrics for answering questions
First Claim
1. A method, in a data processing system, for using paraphrase metrics for answering questions, the method comprising:
- receiving an input question;
generating a candidate answer from a corpus of information, wherein the candidate answer has a supporting passage from the corpus of information;
dividing the input question into a first sequence of tokens;
dividing the supporting passage into a second sequence of tokens;
matching question tokens from the first set of tokens to passage tokens from the second set of tokens, wherein matching the question tokens to the passage tokens comprises identifying a focus of the input question and matching the focus of the input question to an occurrence of the candidate answer in the supporting passage, wherein matching the focus of the input question to an occurrence of the candidate answer in the supporting passage comprises;
treating the focus of the question as a match for the candidate answer;
modifying the input question to replace text of the focus of the question with text of the candidate answer;
modifying the passage to replace text of the candidate answer with text of the focus of the question;
ormodify the input question and the passage to replace text of the focus of the question and text of the candidate answer with a common reserved constant string;
identifying a plurality of subsequences of tokens within the second sequence of tokens;
applying a paraphrase metric to compare the first sequence of tokens to each of the plurality of subsequences of tokens to generate a plurality of paraphrase metric scores; and
determining a confidence score for the candidate answer based on a highest paraphrase metric score within the plurality of paraphrase metric scores.
1 Assignment
0 Petitions
Accused Products
Abstract
A mechanism is provided in a data processing system for using paraphrase metrics for answering questions. The mechanism receives an input question and generating a candidate answer from a corpus of information. The candidate answer has a supporting passage from the corpus of information. The mechanism divides the input question into a first sequence of tokens and divides the supporting passage into a second sequence of tokens. The mechanism identifies a plurality of subsequences of tokens within the second sequence of tokens and applies a paraphrase metric to compare the first sequence of tokens to each of the plurality of subsequences of tokens to generate a plurality of paraphrase metric scores. The mechanism then determines a confidence score for the candidate answer based on a highest paraphrase metric score within the plurality of paraphrase metric scores.
26 Citations
11 Claims
-
1. A method, in a data processing system, for using paraphrase metrics for answering questions, the method comprising:
-
receiving an input question; generating a candidate answer from a corpus of information, wherein the candidate answer has a supporting passage from the corpus of information; dividing the input question into a first sequence of tokens; dividing the supporting passage into a second sequence of tokens; matching question tokens from the first set of tokens to passage tokens from the second set of tokens, wherein matching the question tokens to the passage tokens comprises identifying a focus of the input question and matching the focus of the input question to an occurrence of the candidate answer in the supporting passage, wherein matching the focus of the input question to an occurrence of the candidate answer in the supporting passage comprises; treating the focus of the question as a match for the candidate answer; modifying the input question to replace text of the focus of the question with text of the candidate answer; modifying the passage to replace text of the candidate answer with text of the focus of the question;
ormodify the input question and the passage to replace text of the focus of the question and text of the candidate answer with a common reserved constant string; identifying a plurality of subsequences of tokens within the second sequence of tokens; applying a paraphrase metric to compare the first sequence of tokens to each of the plurality of subsequences of tokens to generate a plurality of paraphrase metric scores; and determining a confidence score for the candidate answer based on a highest paraphrase metric score within the plurality of paraphrase metric scores. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A computer program product comprising a computer readable storage medium having a computer readable program stored therein, wherein the computer readable program, when executed on a computing device, causes the computing device to:
-
receive an input question; generate a candidate answer from a corpus of information, wherein the candidate answer has a supporting passage from the corpus of information; divide the input question into a first sequence of tokens; divide the supporting passage into a second sequence of tokens; match question tokens from the first set of tokens to passage tokens from the second set of tokens, wherein matching the question tokens to the passage tokens comprises identifying a focus of the input question and matching the focus of the input question to an occurrence of the candidate answer in the supporting passage, wherein matching the focus of the input question to an occurrence of the candidate answer in the supporting passage comprises; treating the focus of the question as a match for the candidate answer; modifying the input question to replace text of the focus of the question with text of the candidate answer; modifying the passage to replace text of the candidate answer with text of the focus of the question;
ormodify the input question and the passage to replace text of the focus of the question and text of the candidate answer with a common reserved constant string; identify a plurality of subsequences of tokens within the second sequence of tokens; apply a paraphrase metric to compare the first sequence of tokens to each of the plurality of subsequences of tokens to generate a plurality of paraphrase metric scores; and determine a confidence score for the candidate answer based on a highest paraphrase metric score within the plurality of paraphrase metric scores. - View Dependent Claims (8, 9)
-
-
10. An apparatus comprising:
-
a processor; and a memory coupled to the processor, wherein the memory comprises instructions which, when executed by the processor, cause the processor to; receive an input question; generate a candidate answer from a corpus of information, wherein the candidate answer has a supporting passage from the corpus of information; divide the input question into a first sequence of tokens; divide the supporting passage into a second sequence of tokens; match question tokens from the first set of tokens to passage tokens from the second set of tokens, wherein matching the question tokens to the passage tokens comprises identifying a focus of the input question and matching the focus of the input question to an occurrence of the candidate answer in the supporting passage, wherein matching the focus of the input question to an occurrence of the candidate answer in the supporting passage comprises; treating the focus of the question as a match for the candidate answer; modifying the input question to replace text of the focus of the question with text of the candidate answer; modifying the passage to replace text of the candidate answer with text of the focus of the question;
ormodify the input question and the passage to replace text of the focus of the question and text of the candidate answer with a common reserved constant string; identify a plurality of subsequences of tokens within the second sequence of tokens; apply a paraphrase metric to compare the first sequence of tokens to each of the plurality of subsequences of tokens to generate a plurality of paraphrase metric scores; and determine a confidence score for the candidate answer based on a highest paraphrase metric score within the plurality of paraphrase metric scores. - View Dependent Claims (11)
-
Specification