Answer category data classifying using dynamic thresholds
First Claim
1. A computer-implemented method for managing answer confidence data by a question-answering system, the method comprising:
- receiving, via the question-answering system, a query about a subject matter;
parsing, using a natural language processing technique configured to analyze syntactic and semantic content, the query;
searching, based on the parsed query, a corpus having information about the subject matter;
generating, based on the search, a plurality of answers to the query, wherein each answer is associated with confidence score representing a likelihood that the answer is a correct answer to the query;
sorting, based on similarities and differences in types among the plurality of answers and without regard to their associated confidence scores, the plurality of answers into a plurality of answer categories, including sorting a first group of the plurality of answers into a first answer category and a second group of the plurality of answers into a second answer category, wherein each answer category includes answers that are similar in type to each other;
classifying, based on their associated confidence scores, each answer of a sub-group of the first group of answers into one of a plurality of confidence buckets using a first plurality of static, predetermined confidence thresholds associated with the first answer category;
generating, based on the confidence scores associated with the answers of the first group of answers, a first plurality dynamic thresholds associated with the first answer category;
classifying, based on their associated confidence scores, each unclassified answer of the first group answers into one of the plurality of confidence buckets using the first plurality of dynamic thresholds;
classifying, based on their associated confidence scores, each answer of a sub-group of the second group of answers into one of the plurality of confidence buckets using a second plurality of static, predetermined confidence thresholds associated with the second answer category;
generating, based on the confidence scores associated with the answers of the second group of answers, a second plurality dynamic thresholds associated with the second answer category;
classifying, based on their associated confidence scores, each unclassified answer of the second group answers into one of the plurality of confidence buckets using the second plurality of dynamic thresholds; and
presenting, via the question-answering system and as a response to the query, the plurality of answers sorted based on the plurality of confidence buckets.
1 Assignment
0 Petitions
Accused Products
Abstract
Managing confidence data in a question-answering environment is disclosed. Managing confidence data can include sorting, based on a set of answer categories for a subject matter, a first set of a plurality of answers into a first answer category. The first set can correspond to at least one of a third set of a plurality of confidence scores and the second set can correspond to at least one of a fourth set of the plurality of confidence scores. Managing confidence data can include classifying confidence scores of the third set into one of a plurality of confidence buckets using a first threshold and determining a fifth set of a plurality of thresholds using the plurality of confidence scores. Managing confidence data can include classifying unclassified confidence scores of the third set into one of the plurality of confidence buckets using the fifth set of the plurality of thresholds.
-
Citations
4 Claims
-
1. A computer-implemented method for managing answer confidence data by a question-answering system, the method comprising:
-
receiving, via the question-answering system, a query about a subject matter; parsing, using a natural language processing technique configured to analyze syntactic and semantic content, the query; searching, based on the parsed query, a corpus having information about the subject matter; generating, based on the search, a plurality of answers to the query, wherein each answer is associated with confidence score representing a likelihood that the answer is a correct answer to the query; sorting, based on similarities and differences in types among the plurality of answers and without regard to their associated confidence scores, the plurality of answers into a plurality of answer categories, including sorting a first group of the plurality of answers into a first answer category and a second group of the plurality of answers into a second answer category, wherein each answer category includes answers that are similar in type to each other; classifying, based on their associated confidence scores, each answer of a sub-group of the first group of answers into one of a plurality of confidence buckets using a first plurality of static, predetermined confidence thresholds associated with the first answer category; generating, based on the confidence scores associated with the answers of the first group of answers, a first plurality dynamic thresholds associated with the first answer category; classifying, based on their associated confidence scores, each unclassified answer of the first group answers into one of the plurality of confidence buckets using the first plurality of dynamic thresholds; classifying, based on their associated confidence scores, each answer of a sub-group of the second group of answers into one of the plurality of confidence buckets using a second plurality of static, predetermined confidence thresholds associated with the second answer category; generating, based on the confidence scores associated with the answers of the second group of answers, a second plurality dynamic thresholds associated with the second answer category; classifying, based on their associated confidence scores, each unclassified answer of the second group answers into one of the plurality of confidence buckets using the second plurality of dynamic thresholds; and presenting, via the question-answering system and as a response to the query, the plurality of answers sorted based on the plurality of confidence buckets. - View Dependent Claims (2, 3, 4)
-
Specification