Systems and methods for recognizing characters in digitized documents

US 10,558,893 B2
Filed: 05/24/2019
Issued: 02/11/2020
Est. Priority Date: 04/11/2016
Status: Active Grant

First Claim

Patent Images

1. A system for recognizing a plurality of handwritten characters over multiple lines in an image, the system comprising:

a neural network configured to receive the image, the neural network including;

a cascade of a plurality of pairs of a first long short-term memory (LSTM) layer and a convolution layer, wherein each first LSTM layer is configured to generate a first output according to a scanning direction, each convolution layer is configured to generate a feature map based on the first output from a corresponding first LSTM layer in the pair, and feature maps generated by a plurality of pairs are inputted to a next plurality of pairs in the cascade;

a second LSTM layer configured to generate a second output from a plurality of features maps generated by a last plurality of pairs in the cascade; and

a linear layer configured to generate final feature maps based on the second output, wherein the final feature maps include a feature vector at each grid thereof;

a weight calculator configured to calculate a weight vector for each grid of the final feature maps to generate an image summary; and

a decoder configured to determine a probability of each character in the image based on the image summary and the final feature maps.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods and systems are provided for end-to-end text recognition in digitized documents of handwritten characters over multiple lines without explicit line segmentation. An image is received. Based on the image, one or more feature maps are determined. Each of the one or more feature maps include one or more feature vectors. Based at least in part on the one or more feature maps, one or more scalar scores are determined. Based on the one or more scalar scores, one or more attention weights are determined. By applying the one or more attention weights to each of the one or more feature vectors, one or more image summary vectors are determined. Based at least in part on the one or more image summary vectors, one or more handwritten characters are determined.

Citations

20 Claims

1. A system for recognizing a plurality of handwritten characters over multiple lines in an image, the system comprising:
- a neural network configured to receive the image, the neural network including;
  
  a cascade of a plurality of pairs of a first long short-term memory (LSTM) layer and a convolution layer, wherein each first LSTM layer is configured to generate a first output according to a scanning direction, each convolution layer is configured to generate a feature map based on the first output from a corresponding first LSTM layer in the pair, and feature maps generated by a plurality of pairs are inputted to a next plurality of pairs in the cascade;
  
  a second LSTM layer configured to generate a second output from a plurality of features maps generated by a last plurality of pairs in the cascade; and
  
  a linear layer configured to generate final feature maps based on the second output, wherein the final feature maps include a feature vector at each grid thereof;
  
  a weight calculator configured to calculate a weight vector for each grid of the final feature maps to generate an image summary; and
  
  a decoder configured to determine a probability of each character in the image based on the image summary and the final feature maps.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The system according to claim 1, wherein a dimension of the feature maps is less than or equal to a dimension of the image.
  - 3. The system according to claim 1, wherein a number of scanning directions is four.
  - 4. The system according to claim 3, wherein the scanning directions include upward and downward in a vertical direction and backward and forward in a horizontal direction.
  - 5. The system according to claim 1, wherein a number of convolution layers in the plurality of pairs is four.
  - 6. The system according to claim 1, wherein a dimension of the weight vector is a number of the final feature maps.
  - 7. The system according to claim 1, wherein the neural network is further configured to generate an attention vector for each grid based on a corresponding weight vector and a corresponding feature vector of the final feature maps.
  - 8. The system according to claim 7, wherein attention vectors for all grids of the final feature maps are the image summary.
  - 9. The system according to claim 1, wherein the neural network further includes a collapse layer configured to concatenate sequences of attention vectors to generate a concatenated sequence of image vectors.
  - 10. The system according to claim 9, wherein the decoder is further configured to decode the concatenated sequence to identify line beginnings and endings of whole paragraphs in the image.

11. A method for recognizing a plurality of handwritten characters over multiple lines in an image, the method comprising:
- generating, by each of a plurality of first long short-term memory (LSTM) layers, a first output according to a scanning direction, wherein each first LSTM layer is paired with a convolution layer;
  
  generating, by each of a plurality of convolution layers, a feature map based on the first output from a corresponding first LSTM layer in the pair;
  
  iterating generating the first output and generating the feature map in a cascade manner;
  
  generating, by a second LSTM layer, a second output from a plurality of feature maps generated by a plurality of pairs of the first LSTM layers and the convolution layers;
  
  generating, by a linear layer, final feature maps based on the second output, wherein the final feature maps include a feature vector at each grid thereof;
  
  calculating a weight vector for each grid of the final feature maps to generate an image summary; and
  
  determining a probability of each character in the image based on the image summary and the final feature maps.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
- - 12. The method according to claim 11, wherein a dimension of the feature maps is less than or equal to a dimension of the image.
  - 13. The method according to claim 11, wherein a number of scanning directions is four.
  - 14. The method according to claim 13, wherein the scanning directions include upward and downward in a vertical direction and backward and forward in a horizontal direction.
  - 15. The method according to claim 11, wherein a number of convolution layers in the plurality of pairs is four.
  - 16. The method according to claim 11, wherein a dimension of the weight vector is a number of the final feature maps.
  - 17. The method according to claim 11, further comprising generating an attention vector for each grid based on a corresponding weight vector and a corresponding feature vector of the final feature maps.
  - 18. The method according to claim 17, wherein attention vectors for all grids of the final feature maps are the image summary.
  - 19. The method according to claim 11, further comprising concatenating sequences of attention vectors to generate a concatenated sequence of image vectors.
  - 20. The method according to claim 19, wherein the concatenated sequence is decoded to identify line beginnings and endings of whole paragraphs in the image.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
A2iA SAS (Mitek Systems Incorporated)
Original Assignee
A2iA SAS (Mitek Systems Incorporated)
Inventors
Bluche, Theodore Damien Christian
Primary Examiner(s)
Patel, Jayesh A

Application Number

US16/421,952
Publication Number

US 20190279035A1
Time in Patent Office

263 Days
Field of Search

None
US Class Current
CPC Class Codes

G06F 18/214   Generating training pattern...

G06N 3/044   Recurrent networks, e.g. Ho...

G06N 3/045   Combinations of networks

G06N 3/082   modifying the architecture,...

G06N 3/084   Backpropagation, e.g. using...

G06V 30/10   Character recognition

G06V 30/18057   Integrating the filters int...

G06V 30/333   Preprocessing; Feature extr...

G06V 30/373   using a special pattern or ...

Systems and methods for recognizing characters in digitized documents

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Systems and methods for recognizing characters in digitized documents

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links