Method for comparing image sections to determine similarity therebetween
First Claim
1. A method performed by a computer for comparing at least two image sections having a plurality of image signals, each image section representing a token, to identify similar tokens, each token representing a unit of semantic understanding comprising the steps of:
- (a) rasterizing, using source image derivation system, a document to produce an image section representing a token;
(b) storing image signals of an image section representing a first token in a first model memory;
(c) dilating the image signals representing the first token to produce a dilated representation of the first token and storing the detailed representation of the first token in first image memory;
(c) storing image signals of an image section representing a second token in second model memory;
(e) dilating the image signals of an image section representing the second token to produce a dilated representation of the second token and storing the dilated representation of the second token in a second image memory;
(f) comparing the image signals stored in the first model memory with the images signals stored in the second image memory to determine a first similarity metric;
(g) comparing the image signals store in the second model memory with the image signals stored in the first image memory to determine a second similarity metric; and
(h) indicating whether the first token is similar to the second token in response to the first and second similarity metrics.
3 Assignments
0 Petitions
Accused Products
Abstract
A method for comparing two image sections consisting of a plurality of image signals, or pixels, where each image section represents a token (e.g., character, symbol, glyph, string of components, or similar units of semantic understanding), in order to identify when similar tokens are present within the image sections. The invention further operates without the need for individually detecting and/or identifying the components making up the tokens. In one embodiment, the method relies upon the detection of connected components within words to first isolate individual word tokens and then applies a two stage process where dilated images of the tokens are compared with model representations of the tokens to determine the relative similarity therebetween.
178 Citations
21 Claims
-
1. A method performed by a computer for comparing at least two image sections having a plurality of image signals, each image section representing a token, to identify similar tokens, each token representing a unit of semantic understanding comprising the steps of:
-
(a) rasterizing, using source image derivation system, a document to produce an image section representing a token; (b) storing image signals of an image section representing a first token in a first model memory; (c) dilating the image signals representing the first token to produce a dilated representation of the first token and storing the detailed representation of the first token in first image memory; (c) storing image signals of an image section representing a second token in second model memory; (e) dilating the image signals of an image section representing the second token to produce a dilated representation of the second token and storing the dilated representation of the second token in a second image memory; (f) comparing the image signals stored in the first model memory with the images signals stored in the second image memory to determine a first similarity metric; (g) comparing the image signals store in the second model memory with the image signals stored in the first image memory to determine a second similarity metric; and (h) indicating whether the first token is similar to the second token in response to the first and second similarity metrics. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A method performed in a programmable computer for comparing at least two image sections, each image section consisting of a plurality of image signals, wherein a first image section represents a token from an unknown image object and a second image section represents a token from a known image object stored in a dictionary of images and where a token represents a unit of semantic understanding, to identify the unknown token as matching the token previously stored in the dictionary of images, comprising the steps of:
-
a) storing image signals of an image section representing an unknown token in a first model memory; (b) dilating the image signals representing the unknown token to produce a dilated representation of the unknown token and storing the dilated representation of the unknown token in a first image memory; (c) replicating image signals of an image section representing a known token stored in a dictionary of images to a second model memory; (d) dilating the image signals representing the known token to produce a dilated representation of the second token and storing the dilated representation of the second token in a second image memory; (e) comparing the image signals stored in the first model memory with the image signals stored in the first image memory to determine a second similarity metric; (f) comparing the image signals stored in the second model memory with the image signals stored in the first image memory to determine a second similarity metric; and (g) indicating whether the unknown token is similar to the known token in response to the first and second similarity metrics. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21)
-
Specification