×

Optical character recognition employing deep learning with machine generated training data

  • US 10,489,682 B1
  • Filed: 12/21/2017
  • Issued: 11/26/2019
  • Est. Priority Date: 12/21/2017
  • Status: Active Grant
First Claim
Patent Images

1. A computer-implemented method for training a computerized deep learning system utilized by an optical character recognition system comprising the computer-implemented operations of:

  • generating a plurality of synthetic text segments, by programmatically converting each of a plurality of text strings to a corresponding image, each text string and corresponding image forming a synthetic image/text tuple;

    generating a plurality of real-life text segments by processing from a corpus of document images, at least a subset of images from the corpus, with a plurality of OCR programs, each of the OCR programs processing each image from the subset to produce a real-life image/text tuple, and at least some of the OCR programs producing a confidence value corresponding to each real-life image/text tuple, and wherein each OCR program is characterized by a conversion accuracy substantially below a desired accuracy for an identified domain;

    storing the synthetic image/text tuple and the real-life image/text tuple to data storage as training data in a format accessible by the computerized deep learning system for training; and

    training the computerized deep learning system with the training data.

View all claims
  • 3 Assignments
Timeline View
Assignment View
    ×
    ×