×

Systems and methods for automatically identifying document information

  • US 10,325,149 B1
  • Filed: 09/05/2018
  • Issued: 06/18/2019
  • Est. Priority Date: 03/09/2017
  • Status: Active Grant
First Claim
Patent Images

1. A computer-implemented method of processing an electronic document, comprising:

  • defining, by a processor, a set of canonical features for a document type and a plurality of attributes for a canonical feature;

    receiving, by the processor, an electronic document of the document type;

    identifying a set of text rectangles from the electronic document;

    obtaining a comparison set of reference document codifications,one of the comparison set of reference document codifications comprising a plurality of canonical feature codifications,one of the plurality of canonical feature codifications comprising one or more attribute values for one or more of the plurality of attributes of one of the set of canonical features as the one canonical feature appears in the one reference document;

    for each current canonical feature of the set of canonical features;

    selecting a set of canonical feature codifications from the comparison set of reference document codifications;

    determining a set of possible data types for the current canonical feature from the set of canonical feature codifications;

    calculating a frequency of occurrence for each of the set of possible data types;

    filtering out each of the set of canonical features codifications for which the frequency of occurrence of the corresponding data type is below a threshold to obtain a filtered set of canonical feature codifications; and

    identifying a match between one of the set of text rectangles and one of the filtered set of canonical feature codifications;

    for each of the set of text rectangles, selecting one of the matching canonical feature codifications as a final canonical feature codification for the text rectangle.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×