Apparatuses, methods, and systems for 3-channel dynamic contextual script recognition using neural network image analytics and 4-tuple machine learning with enhanced templates and context data

US 10,671,892 B1
Filed: 11/05/2019
Issued: 06/02/2020
Est. Priority Date: 03/31/2019
Status: Active Grant

First Claim

Patent Images

1. A method, comprising:

training a first machine learning model based on a plurality of documents and a plurality of templates associated with the plurality of documents;

executing the first machine learning model to generate a plurality of relevancy masks, the plurality of relevancy masks to remove a visual structure of the plurality of templates from a visual structure of the plurality of documents;

generating a plurality of multichannel field images to include the plurality of relevancy masks and at least one of the plurality of documents or the plurality of templates;

training a second machine learning model based on the plurality of multichannel field images and a plurality of non-native texts associated with the plurality of documents; and

executing the second machine learning model to generate the plurality of non-native texts from the plurality of multichannel field images.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In some embodiments, a method includes training a first machine learning model based on multiple documents and multiple templates associated with the multiple documents. The method further includes executing the first machine learning model to generate multiple relevancy masks, the multiple relevancy masks to remove a visual structure of the multiple templates from a visual structure of the multiple documents. The method further includes generating multiple multichannel field images to include the multiple relevancy masks and at least one of the multiple documents or the multiple templates. The method further includes training a second machine learning model based on the multiple multichannel field images and multiple non-native texts associated with the multiple documents. The method further includes executing the second machine learning model to generate multiple non-native texts from the multiple multichannel field images.

Citations

20 Claims

1. A method, comprising:
- training a first machine learning model based on a plurality of documents and a plurality of templates associated with the plurality of documents;
  
  executing the first machine learning model to generate a plurality of relevancy masks, the plurality of relevancy masks to remove a visual structure of the plurality of templates from a visual structure of the plurality of documents;
  
  generating a plurality of multichannel field images to include the plurality of relevancy masks and at least one of the plurality of documents or the plurality of templates;
  
  training a second machine learning model based on the plurality of multichannel field images and a plurality of non-native texts associated with the plurality of documents; and
  
  executing the second machine learning model to generate the plurality of non-native texts from the plurality of multichannel field images.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1, further comprising:
    - executing the first machine learning model to separate a non-native content of a document from a template content of the document; and
      
      executing the second machine learning to generate a non-native text from the non-native content.
  - 3. The method of claim 1, further comprising:
    - preparing a plurality of prepared documents based on at least one of an image processing technique, a noise reduction technique, a skew correction technique, a normalization technique, a thresholding technique, a filtering technique, or a segmentation technique,the plurality of documents associated with the plurality of prepared documents.
  - 4. The method of claim 1, wherein the plurality of multichannel field images are a plurality of 3-channel field images and associate each document from the plurality of documents to a template from the plurality of templates and a relevancy mask from the plurality of relevancy masks.
  - 5. The method of claim 1, wherein each document from the plurality of documents includes an indication of at least one of a document image, a handwritten document, a printed document, a table, or a webpage.
  - 6. The method of claim 1, wherein at least one of the first machine learning model or the second machine learning model is an artificial neural network (ANN) model.
  - 7. The method of claim 1, wherein at least one of the first machine learning model or the second machine learning model is a Deep Neural Network-Hidden Markov Model (DNN-HMM).
  - 8. The method of claim 1, wherein at least one of the first machine learning model or the second machine learning model is a Long Term Short Term Memory network with Connectionist Temporal Classification (LSTM-CTC) model.

9. An apparatus, comprising:
- a memory; and
  
  a processor operatively coupled to the memory,the processor configured to execute a first program to generate a plurality of relevancy masks to remove a visual structure of a plurality of templates from a visual structure of a plurality of documents;
  
  the processor configured to generate a plurality of multi-tuple data, each multi-tuple data from the plurality of multi-tuple data including a relevancy mask and at least a document, a template, or a non-native text; and
  
  the processor configured to execute a second program to generate the non-native text for each multi-tuple data from the plurality of multi-tuple data.
- View Dependent Claims (10, 11, 12, 13, 14, 15)
- - 10. The apparatus of claim 9, wherein the processor is configured to:
    - execute the first program to separate a non-native content of a document from a template content of the document; and
      
      execute the second program to generate a non-native text from the non-native content.
  - 11. The apparatus of claim 9, wherein the processor is configured to:
    - receive a first plurality of documents, a first plurality of templates, and a plurality of non-native texts;
      
      train the first program based on the first plurality of documents and the first plurality of templates;
      
      generate a first plurality of multi-tuple data based on the plurality of documents, the plurality of templates, the plurality of relevancy masks, and the plurality of non-native texts; and
      
      train the second program based on the first plurality of multi-tuple data.
  - 12. The apparatus of claim 9, wherein the processor is configured to:
    - prepare the plurality of documents document to generate a plurality of prepared document based on at least one of an image processing technique, a noise reduction technique, a skew correction technique, a normalization technique, a thresholding technique, a filtering technique, or a segmentation technique; and
      
      generate a plurality of multi-tuple data, each multi-tuples data from the plurality of multi-tuple data including at least the prepared document, the template, the relevancy mask, or the non-native text.
  - 13. The apparatus of claim 9, wherein at least one of the first program or the second program is an ANN model.
  - 14. The apparatus of claim 9, wherein at least one of the first program or the second program is a DNN-HMM or a LSTM-CTC model.
  - 15. The apparatus of claim 9, wherein at the first program is a procedural instruction, and the second program is an optical character recognition (OCR) model.

16. A non-transitory processor-readable medium storing code representing instructions to be executed by a processor, the code comprising code to cause the processor to:
- receive a plurality of documents associated with a plurality of templates and a plurality of non-native texts;
  
  generate a plurality of relevancy masks based on the plurality of documents or the plurality of templates;
  
  generate a plurality of multichannel field images including the plurality of relevancy masks and at least one of the plurality of documents, the plurality of non-native texts, or the plurality of templates; and
  
  execute an optical character recognition (OCR) model to generate the plurality of non-native texts from the plurality of multichannel field images.
- View Dependent Claims (17, 18, 19, 20)
- - 17. The non-transitory processor-readable medium of claim 16, the code further comprising code to cause the processor to:
    - separate a non-native content of a document from a template content of the document; and
      
      generate a non-native text from the non-native content.
  - 18. The non-transitory processor-readable medium of claim 16, the code further comprising code to cause the processor to:
    - execute a procedural program to compute the OCR model from the plurality of multichannel field images.
  - 19. The non-transitory processor-readable medium of claim 16, wherein the OCR model can be an ANN model.
  - 20. The non-transitory processor-readable medium of claim 16, wherein the OCR model can be one of a DNN-HMM or a LSTM-CTC model.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Hyper Labs, Inc.
Original Assignee
Hyper Labs, Inc.
Inventors
Daskalov, Boris Nikolaev, Balchev, Daniel Biser
Primary Examiner(s)
Tucker, Wesley J

Application Number

US16/674,324
Time in Patent Office

210 Days
Field of Search
US Class Current
CPC Class Codes

G06F 18/214   Generating training pattern...

G06F 18/217   Validation; Performance eva...

G06F 18/2413   based on distances to train...

G06F 18/295   Markov models or related mo...

G06N 3/044   Recurrent networks, e.g. Ho...

G06N 3/045   Combinations of networks

G06N 3/08   Learning methods

G06N 7/01   Probabilistic graphical mod...

G06V 10/30   Noise filtering

G06V 10/764   using classification, e.g. ...

G06V 10/82   using neural networks

G06V 10/85   Markov-related models; Mark...

G06V 2201/01   Solutions for problems rela...

G06V 30/1478   of characters or characters...

G06V 30/18   Extraction of features or c...

G06V 30/2276   with probabilistic networks...

G06V 30/412   Layout analysis of document...

Apparatuses, methods, and systems for 3-channel dynamic contextual script recognition using neural network image analytics and 4-tuple machine learning with enhanced templates and context data

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Apparatuses, methods, and systems for 3-channel dynamic contextual script recognition using neural network image analytics and 4-tuple machine learning with enhanced templates and context data

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links