Modular, folder based approach for semi-automated document classification
First Claim
Patent Images
1. A document classification system for classifying text documents into a particular category in a complex ontology comprising a set of entity means which:
- (a) use a set of folders, and folder monitoring processes operating on documents to classify them within a subset of the ontology or domain of interest;
(b) use an automated text classification module to make a preliminary classification of documents into a category of interest associated with the entity whereby a classification module is able to use an example set of appropriately classified documents to train itself to classify new documents that match the categories in the entity'"'"'s domain of interest with a measurable degree of accuracy;
(c) use an external final decision step to determine whether the initial automated classification is appropriate; and
(d) use an iterative process consisting of an automated re-classification step, in conjunction with an external decision step, to either locate the appropriate classification within the domain of interest for the entity, or to reject the document from the entity'"'"'s domain of interest to be handled by some other process.
0 Assignments
0 Petitions
Accused Products
Abstract
The Modular, Folder Based Approach for Semi-Automated Document Classification is a systematic approach to implementing a divide and conquer strategy which leverages the power of known automated document classification techniques and organizes the use of standard software techniques into a system which is easy to configure, and deploy.
33 Citations
12 Claims
-
1. A document classification system for classifying text documents into a particular category in a complex ontology comprising a set of entity means which:
-
(a) use a set of folders, and folder monitoring processes operating on documents to classify them within a subset of the ontology or domain of interest; (b) use an automated text classification module to make a preliminary classification of documents into a category of interest associated with the entity whereby a classification module is able to use an example set of appropriately classified documents to train itself to classify new documents that match the categories in the entity'"'"'s domain of interest with a measurable degree of accuracy; (c) use an external final decision step to determine whether the initial automated classification is appropriate; and (d) use an iterative process consisting of an automated re-classification step, in conjunction with an external decision step, to either locate the appropriate classification within the domain of interest for the entity, or to reject the document from the entity'"'"'s domain of interest to be handled by some other process.
-
-
2. A document-classification system as claimed in 1 further comprising a training means such that the classification module uses an example set of appropriately classified documents to train itself to classify new documents that match the categories in the entity'"'"'s domain of interest with a measurable degree of accuracy.
-
3. A document-classification system as claimed in 2 further comprising a training means such that documents which are initially classified incorrectly, but are subsequently categorized within the domain of interest covered by the entity, become candidates for subsequent training of the classification module.
-
4. A document-classification system as claimed in 1, 2, or 3 wherein the external final decision step may be executed by a human subject matter expert who either accepts or rejects the preliminary classification made by the automated text classification module.
-
5. A document-classification system as claimed in 1, 2, or 3 wherein the external final decision step may be executed by a non-human autonomous entity which either accepts or rejects the preliminary classification made by the automated text classification module.
-
6. A document-classification system as claimed in 5 wherein the autonomous entity is an external computer process.
-
7. A document-classification system as claimed in 1, 2, or 3 wherein the entity means may exist on the same computer system.
-
8. A document-classification system as claimed in 1, 2, or 3 wherein the entity means may exist on separate computer systems as implementation needs dictate.
- 9. A document classification system for classifying text documents into a particular category in a complex ontology comprising a set of interconnected entity means wherein each entity operates independent of all other entities.
-
12. A document classification as claimed in 9, 10, or 11 wherein the set of interconnected, but independent, entity is mediated by the use of folder structure means and folder monitoring means.
Specification