×

SYSTEMS AND METHODS FOR ENABLING MANUAL CLASSIFICATION OF UNRECOGNIZED DOCUMENTS TO COMPLETE WORKFLOW FOR ELECTRONIC JOBS AND TO ASSIST MACHINE LEARNING OF A RECOGNITION SYSTEM USING AUTOMATICALLY EXTRACTED FEATURES OF UNRECOGNIZED DOCUMENTS

  • US 20090116755A1
  • Filed: 11/06/2008
  • Published: 05/07/2009
  • Est. Priority Date: 11/06/2007
  • Status: Abandoned Application
First Claim
Patent Images

1. In a document analysis system that receives and processes jobs from a plurality of users to automatically organize each job according to the categories of documents each job contains, and in which at least some of the documents may be automatically recognized and classified, a method of enabling a human trainer to assist in classifying unrecognized electronic documents and of causing the unrecognized electronic documents to be used to train the automatic recognition and classification of subsequently-received documents, the method comprising:

  • automatically extracting image and text features from each received electronic document;

    comparing the extracted features with feature sets associated with each category of document to determine whether the document is recognizable as belonging to a document category, in which each feature set includes a subset of image and text features and corresponding weights for each image and text feature in the subset so that the feature set distinguishes the respective category of document from the other categories of documents;

    if an electronic document is recognized as belonging to one of the document categories, classifying the electronic document as belonging to that document category;

    if an electronic document is unrecognized, submitting the unrecognized document to a learning phase, in which the unrecognized document is presented to a human trainer for manual classification of the unrecognized electronic document into a document category, and automatically modifying at least one of the features and the weights of the feature set of the document category corresponding to the manually-classified electronic document using the automatically extracted features of the manually-classified document so that subsequent automatic classification of documents by the document analysis system improves as more and more unrecognized documents train the feature sets.

View all claims
  • 3 Assignments
Timeline View
Assignment View
    ×
    ×