Handwritten word spotter system using synthesized typed queries
First Claim
Patent Images
1. A method comprising:
- receiving a query string;
generating a plurality of computer-generated images based on the query string with a computer typographic font, each of the plurality of computer-generated images being generated by varying the computer typographic font, the font variations having been identified based on a precision for retrieving word images which match at least one selected query string;
training a model based on the plurality of computer-generated images;
scoring candidate handwritten word images of a collection of handwritten word images using the trained model; and
based on the scores, identifying a subset of the word images.
5 Assignments
0 Petitions
Accused Products
Abstract
A wordspotting system and method are disclosed for processing candidate word images extracted from handwritten documents. In response to a user inputting a selected query string, such as a word to be searched in one or more of the handwritten documents, the system automatically generates at least one computer-generated image based on the query string in a selected font or fonts. A model is trained on the computer-generated image(s) and is thereafter used in the scoring the candidate handwritten word images. The candidate or candidates with the highest scores and/or documents containing them can be presented to the user, tagged, or otherwise processed differently from other candidate word images/documents.
35 Citations
24 Claims
-
1. A method comprising:
-
receiving a query string; generating a plurality of computer-generated images based on the query string with a computer typographic font, each of the plurality of computer-generated images being generated by varying the computer typographic font, the font variations having been identified based on a precision for retrieving word images which match at least one selected query string; training a model based on the plurality of computer-generated images; scoring candidate handwritten word images of a collection of handwritten word images using the trained model; and based on the scores, identifying a subset of the word images. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
-
-
18. A computer implemented processing system comprising:
-
a synthesizer which synthesizes a plurality of computer-generated images based on a received query string with a computer typographic font, each of the plurality of computer-generated images being generated by varying the computer typographic font, the font variations having been identified based on a precision for retrieving word images which match at least one selected query string; a model which is trained on features extracted from the plurality of computer-generated images; and a scoring component which scores candidate handwritten word images of a collection of candidate handwritten word images against the model and, based on the scores, identifies a subset of the handwritten word images. - View Dependent Claims (19, 20, 21, 22)
-
-
23. A method comprising:
-
receiving a query string; generating a plurality of computer-generated images based on the query string, each of the plurality of computer-generated images being generated by varying a selected computer typographic font; training a model based on the plurality of computer-generated images in the different fonts; scoring candidate handwritten word images of a collection of handwritten word images using the trained model; and based on the scores, identifying a subset of the word images. - View Dependent Claims (24)
-
Specification