×

Document identification by characteristics matching

  • US 5,159,667 A
  • Filed: 05/31/1989
  • Issued: 10/27/1992
  • Est. Priority Date: 05/31/1989
  • Status: Expired due to Term
First Claim
Patent Images

1. A computer-implemented process for classifying documents comprising the steps of:

  • preliminarily creating a knowledge base of documents each characterized by a hierarchy of objects that are defined by parameters indicating physical and relational characteristics, the hierarchy being organized from a lowest object level to one or more successively higher object levels and storing said knowledge base in a computer;

    scanning a document to form binary light and dark pixels and inputting into said computer data representing the pixels;

    performing, in said computer, the following steps;

    segmenting the document into primary areas of significance based on the pixels;

    calculating parameters that define the segmented primary areas;

    comparing the parameters of each segmented primary area with the parameters of the lowest level objects in the hierarchy of objects that characterize each document in the knowledge base;

    assigning to each segmented primary area weights of evidence relative to the lowest level objects based on the comparison;

    generating a weighted hypothesis of a label for each of the segmented areas based on the weights of evidence relative to the lowest level objects;

    grouping the segmented primary areas into areas of significance more relevant than the primary areas;

    calculating parameters that define the more relevant areas;

    comparing the parameters of each more relevant area with the parameters of the second lowest level objects in the hierarchy;

    assigning to each more relevant area weights of evidence relative to the second lowest level objects based on the comparison and reevaluating the weights of evidence assigned to the segmented primary areas;

    generating a weighted hypothesis of a label for each of the more relevant areas and revising the weighted hypothesis of the label for each of the segmented primary areas based on the weights of evidence of the second lowest level objects and the lowest level objects; and

    classifying the document based on the labels and the weights of evidence developed by the preceding step.

View all claims
  • 5 Assignments
Timeline View
Assignment View
    ×
    ×