×

SYSTEMS AND METHODS TO AUTOMATICALLY CLASSIFY ELECTRONIC DOCUMENTS USING EXTRACTED IMAGE AND TEXT FEATURES AND USING A MACHINE LEARNING SUBSYSTEM

  • US 20090116736A1
  • Filed: 11/06/2008
  • Published: 05/07/2009
  • Est. Priority Date: 11/06/2007
  • Status: Abandoned Application
First Claim
Patent Images

1. A document analysis system that automatically classifies documents by recognizing in each document distinctive features that have been automatically learned by said system, so that said system may organize jobs according to the categories of documents the job contains, the document analysis system comprising:

  • a document acquisition system for receiving jobs from a plurality of users, each job containing at least one electronic document having at least one page that includes image aspects and text;

    a document feature recognition system for automatically extracting image and text features from each received electronic document;

    a document classification system for automatically classifying recognized electronic documents as belonging to a corresponding category of document by finding the best match between the extracted features of each said document and feature sets associated with each category of document, in which each feature set includes a set of image and text features and corresponding weights that distinguishes the respective category of document from the other categories of documents;

    a document recognition training system for automatically training the feature set for each corresponding category of documents, said training system using extracted features of unrecognized electronic documents to automatically modify the feature set for a document category so that the ability of the document classification system to automatically classify documents improves as the training system is subjected to more and more unrecognized documents and the feature sets are modified accordingly; and

    a job organization system for automatically organizing each job according to the categories of documents it contains by organizing electronic documents associated with each job based on at least one business rule that corresponds to the categories of documents.

View all claims
  • 3 Assignments
Timeline View
Assignment View
    ×
    ×