Answer category data classifying using dynamic thresholds

US 9,946,747 B2
Filed: 05/11/2015
Issued: 04/17/2018
Est. Priority Date: 11/05/2014
Status: Expired due to Fees

First Claim

Patent Images

1. A computer-implemented method for managing answer confidence data by a question-answering system, the method comprising:

receiving, via the question-answering system, a query about a subject matter;

parsing, using a natural language processing technique configured to analyze syntactic and semantic content, the query;

searching, based on the parsed query, a corpus having information about the subject matter;

generating, based on the search, a plurality of answers to the query, wherein each answer is associated with confidence score representing a likelihood that the answer is a correct answer to the query;

sorting, based on similarities and differences in types among the plurality of answers and without regard to their associated confidence scores, the plurality of answers into a plurality of answer categories, including sorting a first group of the plurality of answers into a first answer category and a second group of the plurality of answers into a second answer category, wherein each answer category includes answers that are similar in type to each other;

classifying, based on their associated confidence scores, each answer of a sub-group of the first group of answers into one of a plurality of confidence buckets using a first plurality of static, predetermined confidence thresholds associated with the first answer category;

generating, based on the confidence scores associated with the answers of the first group of answers, a first plurality dynamic thresholds associated with the first answer category;

classifying, based on their associated confidence scores, each unclassified answer of the first group answers into one of the plurality of confidence buckets using the first plurality of dynamic thresholds;

classifying, based on their associated confidence scores, each answer of a sub-group of the second group of answers into one of the plurality of confidence buckets using a second plurality of static, predetermined confidence thresholds associated with the second answer category;

generating, based on the confidence scores associated with the answers of the second group of answers, a second plurality dynamic thresholds associated with the second answer category;

classifying, based on their associated confidence scores, each unclassified answer of the second group answers into one of the plurality of confidence buckets using the second plurality of dynamic thresholds; and

presenting, via the question-answering system and as a response to the query, the plurality of answers sorted based on the plurality of confidence buckets.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Managing confidence data in a question-answering environment is disclosed. Managing confidence data can include sorting, based on a set of answer categories for a subject matter, a first set of a plurality of answers into a first answer category. The first set can correspond to at least one of a third set of a plurality of confidence scores and the second set can correspond to at least one of a fourth set of the plurality of confidence scores. Managing confidence data can include classifying confidence scores of the third set into one of a plurality of confidence buckets using a first threshold and determining a fifth set of a plurality of thresholds using the plurality of confidence scores. Managing confidence data can include classifying unclassified confidence scores of the third set into one of the plurality of confidence buckets using the fifth set of the plurality of thresholds.

Citations

4 Claims

1. A computer-implemented method for managing answer confidence data by a question-answering system, the method comprising:
- receiving, via the question-answering system, a query about a subject matter;
  
  parsing, using a natural language processing technique configured to analyze syntactic and semantic content, the query;
  
  searching, based on the parsed query, a corpus having information about the subject matter;
  
  generating, based on the search, a plurality of answers to the query, wherein each answer is associated with confidence score representing a likelihood that the answer is a correct answer to the query;
  
  sorting, based on similarities and differences in types among the plurality of answers and without regard to their associated confidence scores, the plurality of answers into a plurality of answer categories, including sorting a first group of the plurality of answers into a first answer category and a second group of the plurality of answers into a second answer category, wherein each answer category includes answers that are similar in type to each other;
  
  classifying, based on their associated confidence scores, each answer of a sub-group of the first group of answers into one of a plurality of confidence buckets using a first plurality of static, predetermined confidence thresholds associated with the first answer category;
  
  generating, based on the confidence scores associated with the answers of the first group of answers, a first plurality dynamic thresholds associated with the first answer category;
  
  classifying, based on their associated confidence scores, each unclassified answer of the first group answers into one of the plurality of confidence buckets using the first plurality of dynamic thresholds;
  
  classifying, based on their associated confidence scores, each answer of a sub-group of the second group of answers into one of the plurality of confidence buckets using a second plurality of static, predetermined confidence thresholds associated with the second answer category;
  
  generating, based on the confidence scores associated with the answers of the second group of answers, a second plurality dynamic thresholds associated with the second answer category;
  
  classifying, based on their associated confidence scores, each unclassified answer of the second group answers into one of the plurality of confidence buckets using the second plurality of dynamic thresholds; and
  
  presenting, via the question-answering system and as a response to the query, the plurality of answers sorted based on the plurality of confidence buckets.
- View Dependent Claims (2, 3, 4)
- - 2. The method of claim 1, wherein the generating the first plurality of dynamic thresholds includes:
    - detecting a plurality of gaps between consecutive confidence scores associated with the first group of answers;
      
      calculating a standard deviation associated with the plurality of gaps;
      
      identifying, based on the standard deviation, a portion of the plurality of gaps that meets or exceeds the standard deviation; and
      
      using, responsive to its identification, the portion of the plurality of gaps as a basis for the first plurality of dynamic thresholds.
  - 3. The method of claim 1, wherein the generating the first plurality of dynamic thresholds includes:
    - detecting a plurality of rate changes between consecutive confidence scores associated with the first group of answers;
      
      identifying, based on the detected plurality of rate changes, a portion of the plurality of rate changes as the largest of the plurality of rate changes; and
      
      using, responsive to its identification, the portion of the plurality of rate changes as a basis for the first plurality of dynamic thresholds.
  - 4. The method of claim 1, further comprising:
    - detecting that a first confidence bucket of the plurality of confidence buckets includes a number of classified answers that exceeds a quantity threshold;
      
      generating, based on the confidence scores associated with the answers classified into the first confidence bucket, a third plurality of dynamic thresholds associated with the first confidence bucket; and
      
      reclassifying, based on the third plurality of dynamic thresholds, a portion of the plurality of classified answers into other confidence buckets.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Barker, Kevin S., DeLima, Roberto, Eggebraaten, Thomas J., Megerian, Mark G., Setnes, Marie L.
Primary Examiner(s)
Vo, Truong

Application Number

US14/708,536
Publication Number

US 20160125064A1
Time in Patent Office

1,072 Days
Field of Search
US Class Current
CPC Class Codes

G06F 16/24522   Translation of natural lang...

G06F 16/24575   using context

G06F 16/24578   using ranking

G06F 16/284   Relational databases

G06F 16/285   Clustering or classification

G06F 16/3329   Natural language query form...

G06F 16/3332   Query translation

G06F 16/3334   Selection or weighting of t...

G06F 16/3344   using natural language anal...

G06F 16/367   Ontology

G06F 16/90324   using system suggestions

G06F 16/951   Indexing; Web crawling tech...

G06F 16/9535   Search customisation based ...

G06F 16/9538   Presentation of query results

G06F 16/955   using information identifie...

G06F 40/205   Parsing

G06F 40/211   Syntactic parsing, e.g. bas...

G06F 40/284   Lexical analysis, e.g. toke...

G06F 40/30   Semantic analysis

G06F 40/40   Processing or translation o...

G06N 20/00 : Machine learning

G06N 5/022 : Knowledge engineering; Know...

G06N 5/046 : Forward inferencing; Produc...

G06N 7/01 : Probabilistic graphical mod...

G06Q 30/0203 : Market surveys; Market polls

G09B 7/00 : Electrically-operated teach...

G09B 7/02 : of the type wherein the stu...

G16H 10/20 : for electronic clinical tri...

G16H 40/20 : for the management or admin...

G16H 50/70 : for mining of medical data,...

G16Z 99/00 : Subject matter not provided...

View All

Answer category data classifying using dynamic thresholds

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

4 Claims

Specification

Solutions

Use Cases

Quick Links

Answer category data classifying using dynamic thresholds

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

4 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links