×

Automatic document separation

  • US 9,910,829 B2
  • Filed: 02/14/2014
  • Issued: 03/06/2018
  • Est. Priority Date: 12/19/2003
  • Status: Active Grant
First Claim
Patent Images

1. A method for automatically separating documents represented within a plurality of images by delineating document boundaries and identifying document types in accordance with classification rules, the method comprising:

  • automatically generating classification rules that predict a document type or subdocument type for each of the plurality of images based on textual information and/or graphical information represented in each respective one of the plurality of images, wherein the classification rules are generated based on analyzing textual information and/or graphical information of a plurality of training images using one or more of;

    a probabilistic network;

    relational algebra; and

    machine learning techniquesautomatically generating one or more identifiers for identifying which of a plurality of document images belongs to which of a plurality of categories;

    automatically categorizing a plurality of document images into a plurality of predetermined categories based on analyzing textual information and/or image characteristics of each of the plurality of document images using the classification rules, wherein the step of automatically categorizing comprises;

    producing an output score for each document image based on the analysis thereof using the classification rules, wherein each output score represents an estimated document type probability or a subdocument type probability; and

    using a graph search algorithm to determine an optimum categorization sequence from a plurality of possible categorization sequences for the plurality of document images based on the output scores; and

    separating documents within the plurality of document images from one another by either;

    electronically associating at least one computer-generated label with at least some of the plurality of document images, each label corresponding to a different one of the plurality of categories and comprising one of the one or more identifiers generated for identifying which of the plurality of document images belongs to which of the plurality of categories;

    orinserting one or more computer-generated separation pages between at least some of the plurality of document images to delineate images belonging to different ones of the plurality of categories, each separation page comprising one of the one or more identifiers generated for identifying which of the plurality of document images belongs to which of the plurality of categories;

    orboth electronically associating the at least one computer-generated label with at least some of the plurality of document images and inserting the one or more computer-generated separation pages between at least some of the plurality of document images.

View all claims
  • 8 Assignments
Timeline View
Assignment View
    ×
    ×