Method and system for using OCR data for grouping and classifying documents

  • US 8,724,907 B1
  • Filed: 03/28/2012
  • Issued: 05/13/2014
  • Est. Priority Date: 03/28/2012
  • Status: Active Grant
First Claim
Patent Images

1. A system for classifying digitized documents, the system comprising:

  • a processor-based document management system executed on a computer system and configured to;

    create and store a plurality of templates associated with a plurality of document classes, each template comprising a plurality of keywords;

    receive a digitized document to be classified;

    compare each template with the digitized document to be classified, wherein the comparison comprises;

    comparing a first area value associated with a template with a second area value associated with the digitized document,the first area value associated with a keyword indicating an area occupied by the keyword in the template, andthe second area value that indicates an area occupied by a word in the digitized document to be classified;

    determine that a difference between the first and second area values is below a threshold value; and

    upon the determination that a difference is below a threshold value, identify the keyword as being a keyword for a word pair, and identify the word in the digitized document to be classified as being a corresponding word for the word pair.

View all claims
    ×
    ×

    Thank you for your feedback

    ×
    ×