System and method for determining confidence levels for the results of a categorization system
First Claim
1. A computer system for determining a confidence level of a category of one or more documents, comprising:
- a memory and a central processing unit (CPU);
a categorization process executed by the CPU and generating a results table, one or more documents being categorized into one or more categories, each category corresponding to a score in said results table and each document having a size and one or more features; and
a confidence process executed by the CPU on said results table to generate a threshold table, said confidence process determining a confidence level for said each category by taking a product of a first credibility that a highest scoring category of said categories is correct with a second credibility that a next highest scoring category of said categories is incorrect and normalizing the product by a third credibility that at least one of the highest and second-highest scoring categories is incorrect.
1 Assignment
0 Petitions
Accused Products
Abstract
After a categorization process has been run, the scores of the top-two ranking categories along with the size or number of features in the object being categorized, are passed to a confidence assignment process. This determines a value for the confidence in the top category based on the evidence afforded by the input parameters. The magnitude of this confidence value will determine whether the system can accept the automatic categorization results, or whether human involvement is required. This invention also describes the process of determining the optimal value of an internal scaling parameter in the confidence assignment process. The construction of a threshold table based on this parameter is also described. The threshold table matches confidence values against error levels. For a given error rate the previously assigned confidence determines whether the categorization results can be accepted without need for human intervention. This invention maximizes the number of objects that can be automatically processed, for a given error rate.
-
Citations
20 Claims
-
1. A computer system for determining a confidence level of a category of one or more documents, comprising:
-
a memory and a central processing unit (CPU); a categorization process executed by the CPU and generating a results table, one or more documents being categorized into one or more categories, each category corresponding to a score in said results table and each document having a size and one or more features; and a confidence process executed by the CPU on said results table to generate a threshold table, said confidence process determining a confidence level for said each category by taking a product of a first credibility that a highest scoring category of said categories is correct with a second credibility that a next highest scoring category of said categories is incorrect and normalizing the product by a third credibility that at least one of the highest and second-highest scoring categories is incorrect. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A method of verifying categorization of one or more documents, the method comprising the steps of:
-
taking a product of the credibility that a highest scoring document category is correct with a second credibility that a next highest scoring document category is incorrect; normalizing the product by the credibility that at least one of the highest and second-highest scoring document categories is incorrect to create a confidence score; and assigning the document to the highest scoring document category if the confidence score is above a threshold. - View Dependent Claims (16, 17)
-
-
18. A system for verifying categorization of one or more documents, the system comprising:
-
means for taking a product of the credibility that a highest scoring document category is correct with a second credibility that a next highest scoring document category is incorrect; means for normalizing the product by the credibility that at least one of the highest and second-highest document scoring categories is incorrect to create a confidence score; and means for assigning the document in the highest scoring document category if the confidence score is above a threshold. - View Dependent Claims (19, 20)
-
Specification