Optical character recognition employing deep learning with machine generated training data

US 10,489,682 B1
Filed: 12/21/2017
Issued: 11/26/2019
Est. Priority Date: 12/21/2017
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method for training a computerized deep learning system utilized by an optical character recognition system comprising the computer-implemented operations of:

generating a plurality of synthetic text segments, by programmatically converting each of a plurality of text strings to a corresponding image, each text string and corresponding image forming a synthetic image/text tuple;

generating a plurality of real-life text segments by processing from a corpus of document images, at least a subset of images from the corpus, with a plurality of OCR programs, each of the OCR programs processing each image from the subset to produce a real-life image/text tuple, and at least some of the OCR programs producing a confidence value corresponding to each real-life image/text tuple, and wherein each OCR program is characterized by a conversion accuracy substantially below a desired accuracy for an identified domain;

storing the synthetic image/text tuple and the real-life image/text tuple to data storage as training data in a format accessible by the computerized deep learning system for training; and

training the computerized deep learning system with the training data.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An optical character recognition system employs a deep learning system that is trained to process a plurality of images within a particular domain to identify images representing text within each image and to convert the images representing text to textually encoded data. The deep learning system is trained with training data generated from a corpus of real-life text segments that are generated by a plurality of OCR modules. Each of the OCR modules produces a real-life image/text tuple, and at least some of the OCR modules produce a confidence value corresponding to each real-life image/text tuple. Each OCR module is characterized by a conversion accuracy substantially below a desired accuracy for an identified domain. Synthetically generated text segments are produced by programmatically converting text strings to a corresponding image where each text string and corresponding image form a synthetic image/text tuple.

Citations

36 Claims

1. A computer-implemented method for training a computerized deep learning system utilized by an optical character recognition system comprising the computer-implemented operations of:
- generating a plurality of synthetic text segments, by programmatically converting each of a plurality of text strings to a corresponding image, each text string and corresponding image forming a synthetic image/text tuple;
  
  generating a plurality of real-life text segments by processing from a corpus of document images, at least a subset of images from the corpus, with a plurality of OCR programs, each of the OCR programs processing each image from the subset to produce a real-life image/text tuple, and at least some of the OCR programs producing a confidence value corresponding to each real-life image/text tuple, and wherein each OCR program is characterized by a conversion accuracy substantially below a desired accuracy for an identified domain;
  
  storing the synthetic image/text tuple and the real-life image/text tuple to data storage as training data in a format accessible by the computerized deep learning system for training; and
  
  training the computerized deep learning system with the training data.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
- - 2. The computer-implemented method of claim 1 further comprising:
    - augmenting the synthetic image/text tuples and the real-life image/text tuples data by adding noise to image portions of the tuples.
  - 3. The computer-implemented method of claim 2 wherein adding noise to image portions of the tuples comprises:
    - randomly selecting image portions of the tuples and superimposing to the selected image portions, noise selected from the group consisting of random speckled noise, random lines, random binarization threshold, white on black text.
  - 4. The computer-implemented method of claim 2 wherein adding noise to image portions of the tuples comprises:
    - randomly selecting image portions of the tuples and superimposing patterned noise to the selected image portions.
  - 5. The computer-implemented method of claim 1 further comprising processing the image portions of the tuples to format the image portions into a fixed normative input employed by the computerized deep learning system.
  - 6. The computer-implemented method of claim 5 wherein processing the image portions of the tuples to format the image portions into a fixed normative input employed by the computerized deep learning system comprises:
    - scaling the image portion of each of the tuples to fit in a field of view of the computerized deep learning system.
  - 7. The computer-implemented method of claim 5 wherein processing the image portions of the tuples to format the image portions into a fixed normative input employed by the computerized deep learning system comprises:
    - centering the image portion of each of the tuples within a field of view of the computerized deep learning system.
  - 8. The computer-implemented method of claim 1 further comprising:
    - processing, for storage as training data, output of the OCR programs by employing statistical metrics to identify the highest quality tuples generated by the OCR programs.
  - 9. The computer-implemented method of claim 8 wherein employing the statistical metrics comprises:
    - selecting, between confidence metrics of equal value generated by two or more OCR programs, a confidence metric generated from a deep-learning based OCR program over confidence metrics generated from OCR programs not based on computerized deep learning;
      
      selecting segments in order of OCR confidence as indicated by confidence metric generated by an OCR program; and
      
      selecting segments for which the same text is generated by the OCR programs, and if the same text is not generated by the OCR programs then selecting segments having the least edit distance.
  - 10. The computer-implemented method of claim 9 further comprising:
    - identifying a subset of the real-life image/text tuples for labeling by humans, the subset characterized by a range of confidence values and differing outputs among the OCR programs for given segments.
  - 11. The computer-implemented method of claim 1 further comprising modifying a font of the image portion of at least a subset of the synthetic image/text tuples.
  - 12. The computer-implemented method of claim 1 wherein generating a plurality of synthetic text segments comprises randomly selecting sets of consecutive words from a text corpus comprising a set of fully-formed English language sentences.
  - 13. The computer-implemented method of claim 1 wherein generating a plurality of synthetic text segments comprises randomly selecting sets of consecutive words from a text corpus characterized by common text elements in the identified domain.
  - 14. The computer-implemented method of claim 12 further comprising modifying the selected sets of consecutive words to reflect biases of character types that occur in the identified domain.
  - 15. The computer-implemented method of claim 13 further comprising modifying the selected sets of consecutive words to reflect biases of character types that occur in the identified domain.
  - 16. The computer-implemented method of claim 14 further comprising generating the image portion of the synthetic image/text tuple in accordance with a randomly chosen font and font size.
  - 17. The computer-implemented method of claim 15 further comprising generating the image portion of the synthetic image/text tuple in accordance with a randomly chosen font and font size.

18. A computerized optical character recognition system comprising:
- a computerized deep learning system trained to process a plurality of encoded images within a particular domain to identify images representing text within each encoded image and converting the encoded images representing text to textually encoded data;
  
  data storage for storing the encoded images representing text and textually encoded data;
  
  wherein the computerized deep learning system is trained with training data generated from a corpus of,real-life text segments generated by processing from a corpus of encoded document images, at least a subset of encoded images from the corpus, with a plurality of OCR modules, each of the OCR modules processing each encoded image from the corpus to produce a real-life image/text tuple, and at least some of the OCR modules producing a confidence value corresponding to each real-life image/text tuple, and wherein each OCR module is characterized by an conversion accuracy substantially below a desired accuracy for an identified domain; and
  
  synthetically generated text segments, generated by programmatically converting each of a plurality of text strings to a corresponding encoded image, each text string and corresponding encoded image forming a synthetic image/text tuple.
- View Dependent Claims (19)
- - 19. The computerized optical character recognition system of claim 18 wherein the real-life image/text tuples are processed to fit within a field of view of the computerized deep learning system, and wherein the synthetic image/text tuples are processed to reflect textual characteristics of the identified domain.

20. A computerized system for training a computerized deep learning system utilized by an optical character recognition system comprising:
- a processor configured to execute instructions that when executed cause the processor to;
  
  generate a plurality of synthetic text segments, by programmatically converting each of a plurality of text strings to a corresponding image, each text string and corresponding image forming a synthetic image/text tuple; and
  
  generate a plurality of real-life text segments by processing from a corpus of document images, at least a subset of images from the corpus, with a plurality of OCR modules, each of the OCR modules processing each image from the subset to produce a real-life image/text tuple, and at least some of the OCR modules producing a confidence value corresponding to each real-life image/text tuple, and wherein each OCR module is characterized by an conversion accuracy substantially below a desired accuracy for an identified domain; and
  
  data storage, operatively coupled to the processor, for storing the synthetic image/text tuple and the real-life image/text tuple as training data in a format accessible by the deep learning system for training, wherein the computerized system employs the training data to train the deep learning system.
- View Dependent Claims (21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36)
- - 21. The computerized system of claim 20 wherein the processor is further configured to execute instructions that when executed cause the processor to:
    - augment the synthetic image/text tuples and the real-life image/text tuples data by adding noise to image portions of the tuples.
  - 22. The computerized system of claim 21 wherein adding noise to image portions of the tuples comprises:
    - randomly selecting image portions of the tuples and superimposing to the selected image portions, noise selected from the group consisting of random speckled noise, random lines, random binarization threshold, white on black text.
  - 23. The computerized system of claim 21 wherein adding noise to image portions of the tuples comprises:
    - randomly selecting image portions of the tuples and superimposing patterned noise to the selected image portions.
  - 24. The computerized system of claim 20 wherein the processor is further configured to execute instructions that when executed cause the processor to:
    - process the image portions of the tuples to format the image portions into a fixed normative input employed by the computerized deep learning system.
  - 25. The computerized system of claim 24 wherein the instructions that when executed cause the processor to process the image portions of the tuples to format the image portions into a fixed normative input employed by the deep learning system comprise instructions that when executed cause the processor to:
    - scale the image portion of each of the tuples to fit in a field of view of the deep learning system.
  - 26. The computerized system of claim 24 wherein the instructions that when executed cause the processor to process the image portions of the tuples to format the image portions into a fixed normative input employed by the computerized deep learning system comprise instructions that when executed cause the processor to:
    - center the image portion of each of the tuples within a field of view of the computerized deep learning system.
  - 27. The computerized system of claim 20 wherein the processor is further configured to execute instructions that when executed cause the processor to:
    - process, for storage as training data, output of the OCR programs by employing statistical metrics to identify the highest quality tuples generated by the OCR programs.
  - 28. The computerized system of claim 27 wherein employing the statistical metrics comprises:
    - selecting, between confidence metrics of equal value generated by two or more OCR programs, a confidence metric generated from a deep-learning based OCR program over confidence metrics generated from OCR programs not based on deep learning;
      
      selecting segments in order of OCR confidence as indicated by confidence metric generated by an OCR program; and
      
      selecting segments for which the same text is generated by the OCR programs, and if the same text is not generated by the OCR programs then selecting segments having the least edit distance.
  - 29. The computerized system of claim 28 wherein the processor is further configured to execute instructions that when executed cause the processor to:
    - identify a subset of the real-life image/text tuples for labeling by humans, the subset characterized by a range of confidence values and differing outputs among the OCR programs for given segments.
  - 30. The computerized system of claim 20 wherein the processor is further configured to execute instructions that when executed cause the processor to:
    - modify a font of the image portion of at least a subset of the synthetic image/text tuples.
  - 31. The computerized system of claim 20 wherein the instructions that cause the processor to generate a plurality of synthetic text segments comprise instructions that when executed cause the processor to:
    - randomly select sets of consecutive words from a text corpus comprising a set of fully-formed English language sentences.
  - 32. The computerized system of claim 20 wherein the instructions that cause the processor to generate a plurality of synthetic text segments comprises instructions that when executed cause the processor to:
    - randomly select sets of consecutive words from a text corpus characterized by common text elements in the identified domain.
  - 33. The computerized system of claim 31 further comprising instructions that when executed cause the processor to:
    - modify the selected sets of consecutive words to reflect biases of character types that occur in the identified domain.
  - 34. The computerized system of claim 32 further comprising instructions that when executed cause the processor to:
    - modify the selected sets of consecutive words to reflect biases of character types that occur in the identified domain.
  - 35. The computerized system of claim 33 further comprising instructions that when executed cause the processor to:
    - generate the image portion of the synthetic image/text tuple in accordance with a randomly chosen font and font size.
  - 36. The computerized system of claim 34 further comprising instructions that when executed cause the processor to:
    - generate the image portion of the synthetic image/text tuple in accordance with a randomly chosen font and font size.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Automation Anywhere, Inc.
Original Assignee
Automation Anywhere, Inc.
Inventors
Kumar, Nishit, Corcoran, Thomas, Selva, Bruno, Chan, Derek S, Kakhandiki, Abhijit
Primary Examiner(s)
Bayat, Ali

Application Number

US15/851,617
Time in Patent Office

705 Days
Field of Search

382160
US Class Current
CPC Class Codes

G06F 18/214   Generating training pattern...

G06F 18/41   Interactive pattern learnin...

G06N 3/04   Architecture, e.g. intercon...

G06N 3/044   Recurrent networks, e.g. Ho...

G06N 3/045   Combinations of networks

G06N 3/08   Learning methods

G06N 3/084   Backpropagation, e.g. using...

G06V 30/10   Character recognition

G06V 30/162   Quantising the image signal

G06V 30/19133   Interactive pattern learnin...

G06V 30/19147   Obtaining sets of training ...

G06V 30/224   of printed characters havin...

G06V 30/414   Extracting the geometrical ...

G06V 30/416   Extracting the logical stru...

Optical character recognition employing deep learning with machine generated training data

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

Citations

36 Claims

Specification

Solutions

Use Cases

Quick Links

Optical character recognition employing deep learning with machine generated training data

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

36 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links