×

Redigitization system and service

  • US 9,330,323 B2
  • Filed: 04/29/2012
  • Issued: 05/03/2016
  • Est. Priority Date: 04/29/2012
  • Status: Active Grant
First Claim
Patent Images

1. A method comprising:

  • rasterizing an electronic document to obtain a raster image of the electronic document;

    determining the author of the electronic document;

    performing one or more optical character recognition (OCR) tasks on the raster image of the electronic document, performing the OCR tasks including identifying digitization errors in the electronic document based on a comparison to a personalized tf*idf error dictionary associated with the author to determine known OCR errors specific to the author, the personalized tf*idf error dictionary representing (i) whether a term occurred or not, (ii) how many times the term occurred, (iii) what percent of words are each term, (iv) using log instead of linear scales for the number of occurrences of the term, (v) or a combination thereof;

    correcting errors discovered by the OCR tasks; and

    creating a customized error corrected version of the electronic document.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×