Determining document classification probabilistically through classification rule analysis
First Claim
Patent Images
1. A system for determining document classification probabilistically through classification rule analysis, the system comprising:
- at least one server comprising;
a memory configured to store instructions; and
a processor configured to execute an application in conjunction with the instructions stored in the memory, wherein the application is configured to;
identify patterns and evidences within content of representative documents;
construct a classification rule based on an entity determined according to an analysis of the patterns and an affinity determined according to an analysis of the evidences; and
process the content with the classification rule to;
determine an entity count and an entity confidence level for the entity;
determine an affinity presence and an affinity confidence level for the affinity, wherein the affinity confidence level is determined from a probability of at least one of the evidences being within a proximity window of a presence of the affinity, and the proximity window includes a window of the content used to scan the content for the affinity;
aggregate the entity count, the entity confidence level, the affinity presence, and the affinity confidence level to returned results;
compare the returned results to expected results to evaluate the classification rule against acceptance requirements; and
in response to a determination that the classification rule meets the acceptance requirements, identify confidence levels for the patterns and the evidences;
elseedit the classification rule.
1 Assignment
0 Petitions
Accused Products
Abstract
A classification application identifies patterns and evidences within representative documents. The application constructs a classification rule according to an entity and an affinity determined from the patterns and evidences. The application processes the representative documents with the classification rule to evaluate whether the rules meet acceptance requirements. Subsequent to a successful evaluation, the application identifies confidence levels for patterns and evidences within other documents.
-
Citations
19 Claims
-
1. A system for determining document classification probabilistically through classification rule analysis, the system comprising:
at least one server comprising; a memory configured to store instructions; and a processor configured to execute an application in conjunction with the instructions stored in the memory, wherein the application is configured to; identify patterns and evidences within content of representative documents; construct a classification rule based on an entity determined according to an analysis of the patterns and an affinity determined according to an analysis of the evidences; and process the content with the classification rule to; determine an entity count and an entity confidence level for the entity; determine an affinity presence and an affinity confidence level for the affinity, wherein the affinity confidence level is determined from a probability of at least one of the evidences being within a proximity window of a presence of the affinity, and the proximity window includes a window of the content used to scan the content for the affinity; aggregate the entity count, the entity confidence level, the affinity presence, and the affinity confidence level to returned results; compare the returned results to expected results to evaluate the classification rule against acceptance requirements; and in response to a determination that the classification rule meets the acceptance requirements, identify confidence levels for the patterns and the evidences;
elseedit the classification rule. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
8. A computing device for determining document classification probabilistically through classification rule analysis, the computing device comprising:
-
a memory configured to store instructions; and a processor coupled to the memory, the processor executing an application in conjunction with the instructions stored in the memory, wherein the application is configured to; identify patterns and evidences within content of documents; determine a confidence level of an affinity from a probability of at least one of the evidences being within a proximity window of a presence of the affinity, wherein the proximity window includes a window of the content used to scan the content for the affinity; construct a classification rule based on an entity determined according to an analysis of the patterns and the affinity determined according to an analysis of the evidences; define the entity through the patterns, wherein each pattern is a collection of text markers with a probability of finding the entity including one or more of;
a regular expression match and a keyword match;process the content with the classification rule to collect returned results; and compare the returned results to expected results to evaluate the classification rule against acceptance requirements. - View Dependent Claims (9, 10, 11, 12)
-
-
13. A method executed on a computing device for determining document classification probabilistically through classification rule analysis, the method comprising:
-
identifying patterns and evidences within content of documents, wherein each one of the evidences includes an aggregate of keyword matches that are in proximity; determining a confidence level of an affinity from a probability of at least one of the evidences being within a proximity window of a presence of the affinity, wherein the proximity window includes a window of the content used to scan the content for the affinity; constructing a classification rule based on an entity determined according to an analysis of the patterns and the affinity determined according to an analysis of the evidences; processing the content with the classification rule to collect returned results; and comparing the returned results to expected results to evaluate the classification rule against acceptance requirements. - View Dependent Claims (14, 15, 16, 17, 18, 19)
-
Specification