×

Systems and methods for automatically identifying document information

  • US 10,127,444 B1
  • Filed: 03/09/2017
  • Issued: 11/13/2018
  • Est. Priority Date: 03/09/2017
  • Status: Active Grant
First Claim
Patent Images

1. A computer implemented method for processing an electronic document, the method comprising:

  • receiving the electronic document;

    generating an initial codification of the electronic document, the initial codification of the electronic document comprising a plurality of canonical feature codifications, each canonical feature codification comprising a plurality of attribute values describing attributes of the canonical feature as it appears in the electronic document;

    accessing a reference set of reference document codifications;

    for each reference document codification in the reference set, determining a similarity between the reference document codification and the initial codification of the electronic document;

    generating a comparison set of reference document codifications by including at most a threshold number of reference document codifications that are determined to be the most similar to the initial codification of the electronic document,each reference document codification in the comparison set comprising a plurality of canonical feature codifications, each canonical feature codification comprising a plurality of attribute values describing a position of the canonical feature in the reference document;

    processing each canonical feature codification in each reference document codification in the comparison set by;

    determining whether the electronic document has one or more text rectangles in a potential position of the canonical feature, the potential position of the canonical feature defined with reference to the attribute values of the canonical feature in the canonical feature codification; and

    in response determining that the electronic document has one or more text rectangles in a potential position of the canonical feature, recording a preliminary association between the text rectangle and the canonical feature; and

    for each text rectangle preliminarily associated with one or more canonical features, determining a final canonical feature assignment for the text rectangle, the final canonical feature assignment being determined based on the one or more canonical features preliminarily associated with the text rectangle.

View all claims
  • 3 Assignments
Timeline View
Assignment View
    ×
    ×