Scoring candidates using structural information in semi-structured documents for question answering systems
First Claim
1. A computer program product for automatically scoring candidate answers to questions in a question and answer system, the computer program product comprising a storage medium readable by a processing circuit and storing instructions run by the processing circuit for performing a method, the method comprising:
- receiving plural candidate answers associated with a query string, said plural candidate answers obtained from at least one document in a data corpus using query terms;
identifying one or more entity structures embedded in said at least one document; and
for each at least one document;
extracting said one or more entity structures embedded in said at least one document, said embedded entity structures comprising user embedded tags or embedded links to other documents;
determining a number of said entity structures having terms in said embedded tags or embedded links to other documents that match query terms in the received input query string;
computing a score for each of said plural candidate answers in said document as a function of a count of said number of entity structures having terms in said embedded tags or said embedded links to other documents that match query terms in the query string;
said score computing comprising;
assigning an associated weight to a count of said matching query terms associated with each said score for each said plural candidate answer; and
computing a final score by combining each weighted match count associated with each of the candidate answers.
0 Assignments
0 Petitions
Accused Products
Abstract
A system, program product, and methodology automatically scores candidate answers to questions in a question and answer system. In the candidate answer scoring method, a processor device performs one or more of receiving one or more candidate answers associated with a query string, the candidates obtained from a data source having semi-structured content; identifying one or more documents with semi-structured content from the data source having a candidate answer; and for each identified document: extracting one or more entity structures embedded in the identified document; determining a number of the entity structures in the identified document that appear in the received input query; and, computing a score for a candidate answer in the document as a function of the number Overall system efficiency is improved by giving the correct candidate answers higher scores through leveraging context-dependent structural information such as links to other documents and embedded tags.
-
Citations
16 Claims
-
1. A computer program product for automatically scoring candidate answers to questions in a question and answer system, the computer program product comprising a storage medium readable by a processing circuit and storing instructions run by the processing circuit for performing a method, the method comprising:
-
receiving plural candidate answers associated with a query string, said plural candidate answers obtained from at least one document in a data corpus using query terms; identifying one or more entity structures embedded in said at least one document; and
for each at least one document;extracting said one or more entity structures embedded in said at least one document, said embedded entity structures comprising user embedded tags or embedded links to other documents; determining a number of said entity structures having terms in said embedded tags or embedded links to other documents that match query terms in the received input query string; computing a score for each of said plural candidate answers in said document as a function of a count of said number of entity structures having terms in said embedded tags or said embedded links to other documents that match query terms in the query string; said score computing comprising;
assigning an associated weight to a count of said matching query terms associated with each said score for each said plural candidate answer; andcomputing a final score by combining each weighted match count associated with each of the candidate answers. - View Dependent Claims (2, 3, 4, 9, 11, 13, 15, 16)
-
-
5. A system for automatically scoring candidate answers to questions in a question and answer system comprising:
-
a memory storage device; a processor device in communication with the memory device that performs a method comprising; receiving plural candidate answers associated with a query string, said plural candidate answers obtained from at least one document in a data corpus using query terms; identifying one or more entity structures embedded in said at least one document; and
for each at least one document;extracting said one or more entity structures embedded in said at least one document, said embedded entity structures comprising user embedded tags or embedded links to other documents; determining a number of said entity structures having terms in said embedded tags or said embedded links to other documents that match query terms in the received input query string; computing a score for each of said plural candidate answers in said document as a function of a count of said number of entity structures having terms in said embedded tags or said embedded links to other documents that match query terms in the query string; said score computing comprising;
assigning an associated weight to a count of said matching query terms associated with each said score for each said plural candidate answer; andcomputing a final score by combining each weighted match count associated with each of the candidate answers. - View Dependent Claims (6, 7, 8, 10, 12, 14)
-
Specification