Apparatuses, methods, and systems for 3-channel dynamic contextual script recognition using neural network image analytics and 4-tuple machine learning with enhanced templates and context data
First Claim
1. A method, comprising:
- training a first machine learning model based on a plurality of documents and a plurality of templates associated with the plurality of documents;
executing the first machine learning model to generate a plurality of relevancy masks, the plurality of relevancy masks to remove a visual structure of the plurality of templates from a visual structure of the plurality of documents;
generating a plurality of multichannel field images to include the plurality of relevancy masks and at least one of the plurality of documents or the plurality of templates;
training a second machine learning model based on the plurality of multichannel field images and a plurality of non-native texts associated with the plurality of documents; and
executing the second machine learning model to generate the plurality of non-native texts from the plurality of multichannel field images.
1 Assignment
0 Petitions
Accused Products
Abstract
In some embodiments, a method includes training a first machine learning model based on multiple documents and multiple templates associated with the multiple documents. The method further includes executing the first machine learning model to generate multiple relevancy masks, the multiple relevancy masks to remove a visual structure of the multiple templates from a visual structure of the multiple documents. The method further includes generating multiple multichannel field images to include the multiple relevancy masks and at least one of the multiple documents or the multiple templates. The method further includes training a second machine learning model based on the multiple multichannel field images and multiple non-native texts associated with the multiple documents. The method further includes executing the second machine learning model to generate multiple non-native texts from the multiple multichannel field images.
-
Citations
20 Claims
-
1. A method, comprising:
-
training a first machine learning model based on a plurality of documents and a plurality of templates associated with the plurality of documents; executing the first machine learning model to generate a plurality of relevancy masks, the plurality of relevancy masks to remove a visual structure of the plurality of templates from a visual structure of the plurality of documents; generating a plurality of multichannel field images to include the plurality of relevancy masks and at least one of the plurality of documents or the plurality of templates; training a second machine learning model based on the plurality of multichannel field images and a plurality of non-native texts associated with the plurality of documents; and executing the second machine learning model to generate the plurality of non-native texts from the plurality of multichannel field images. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. An apparatus, comprising:
-
a memory; and a processor operatively coupled to the memory, the processor configured to execute a first program to generate a plurality of relevancy masks to remove a visual structure of a plurality of templates from a visual structure of a plurality of documents; the processor configured to generate a plurality of multi-tuple data, each multi-tuple data from the plurality of multi-tuple data including a relevancy mask and at least a document, a template, or a non-native text; and the processor configured to execute a second program to generate the non-native text for each multi-tuple data from the plurality of multi-tuple data. - View Dependent Claims (10, 11, 12, 13, 14, 15)
-
-
16. A non-transitory processor-readable medium storing code representing instructions to be executed by a processor, the code comprising code to cause the processor to:
-
receive a plurality of documents associated with a plurality of templates and a plurality of non-native texts; generate a plurality of relevancy masks based on the plurality of documents or the plurality of templates; generate a plurality of multichannel field images including the plurality of relevancy masks and at least one of the plurality of documents, the plurality of non-native texts, or the plurality of templates; and execute an optical character recognition (OCR) model to generate the plurality of non-native texts from the plurality of multichannel field images. - View Dependent Claims (17, 18, 19, 20)
-
Specification