Systems and methods for recognizing characters in digitized documents
First Claim
1. A system for recognizing a plurality of handwritten characters over multiple lines in an image, the system comprising:
- a neural network configured to receive the image, the neural network including;
a cascade of a plurality of pairs of a first long short-term memory (LSTM) layer and a convolution layer, wherein each first LSTM layer is configured to generate a first output according to a scanning direction, each convolution layer is configured to generate a feature map based on the first output from a corresponding first LSTM layer in the pair, and feature maps generated by a plurality of pairs are inputted to a next plurality of pairs in the cascade;
a second LSTM layer configured to generate a second output from a plurality of features maps generated by a last plurality of pairs in the cascade; and
a linear layer configured to generate final feature maps based on the second output, wherein the final feature maps include a feature vector at each grid thereof;
a weight calculator configured to calculate a weight vector for each grid of the final feature maps to generate an image summary; and
a decoder configured to determine a probability of each character in the image based on the image summary and the final feature maps.
0 Assignments
0 Petitions
Accused Products
Abstract
Methods and systems are provided for end-to-end text recognition in digitized documents of handwritten characters over multiple lines without explicit line segmentation. An image is received. Based on the image, one or more feature maps are determined. Each of the one or more feature maps include one or more feature vectors. Based at least in part on the one or more feature maps, one or more scalar scores are determined. Based on the one or more scalar scores, one or more attention weights are determined. By applying the one or more attention weights to each of the one or more feature vectors, one or more image summary vectors are determined. Based at least in part on the one or more image summary vectors, one or more handwritten characters are determined.
-
Citations
20 Claims
-
1. A system for recognizing a plurality of handwritten characters over multiple lines in an image, the system comprising:
-
a neural network configured to receive the image, the neural network including; a cascade of a plurality of pairs of a first long short-term memory (LSTM) layer and a convolution layer, wherein each first LSTM layer is configured to generate a first output according to a scanning direction, each convolution layer is configured to generate a feature map based on the first output from a corresponding first LSTM layer in the pair, and feature maps generated by a plurality of pairs are inputted to a next plurality of pairs in the cascade; a second LSTM layer configured to generate a second output from a plurality of features maps generated by a last plurality of pairs in the cascade; and a linear layer configured to generate final feature maps based on the second output, wherein the final feature maps include a feature vector at each grid thereof; a weight calculator configured to calculate a weight vector for each grid of the final feature maps to generate an image summary; and a decoder configured to determine a probability of each character in the image based on the image summary and the final feature maps. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A method for recognizing a plurality of handwritten characters over multiple lines in an image, the method comprising:
-
generating, by each of a plurality of first long short-term memory (LSTM) layers, a first output according to a scanning direction, wherein each first LSTM layer is paired with a convolution layer; generating, by each of a plurality of convolution layers, a feature map based on the first output from a corresponding first LSTM layer in the pair; iterating generating the first output and generating the feature map in a cascade manner; generating, by a second LSTM layer, a second output from a plurality of feature maps generated by a plurality of pairs of the first LSTM layers and the convolution layers; generating, by a linear layer, final feature maps based on the second output, wherein the final feature maps include a feature vector at each grid thereof; calculating a weight vector for each grid of the final feature maps to generate an image summary; and determining a probability of each character in the image based on the image summary and the final feature maps. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
-
Specification