Method for comparing word shapes
First Claim
1. A method of determining the equivalency of a plurality of symbol strings which form word objects within data defining an image to determine a relative measure of the similarity between the symbol strings, comprising the steps of:
- detecting a first discrete symbol string in the data defining the image, thereby isolating a word object represented by the first symbol string;
deriving a first contour signal representative of a shape of the first symbol string;
measuring a characteristic dimension for the first symbol string;
detecting a second discrete symbol string in the data defining the image, thereby isolating a word object represented by the second symbol string;
deriving a second contour signal representative of a shape of the second symbol string;
measuring the characteristic dimension for the second symbol string;
calculating a scaling ratio by dividing the characteristic dimension for the first symbol string by the characteristic dimension for the second symbol string;
scaling, in two-dimensions, the second contour signal in accordance with the scaling ratio, to produce a new second contour signal for subsequent comparison to the first contour signaldetermining a difference signal representative of the difference between the first and second contour signals over a range in which both signals are defined; and
evaluating the difference signal over a portion of the defined range to arrive at a difference measure indicative of the relative similarity between the first and second strings.
3 Assignments
0 Petitions
Accused Products
Abstract
A method for determining the relative equivalency or match between two or more character strings represented in an array of image data, including the steps of determining page orientation, isolating character strings from adjacent character strings, and establishing a set of boundaries or reference lines about the character strings. Subsequently, the boundaries are used to represent the character string images as word shape contours or signals which are generated from the imaginal data within the boundaries. The word shape contours are then compared using one of the described comparison methods to determine the relative equivalency or similarity of the contours.
103 Citations
46 Claims
-
1. A method of determining the equivalency of a plurality of symbol strings which form word objects within data defining an image to determine a relative measure of the similarity between the symbol strings, comprising the steps of:
-
detecting a first discrete symbol string in the data defining the image, thereby isolating a word object represented by the first symbol string; deriving a first contour signal representative of a shape of the first symbol string; measuring a characteristic dimension for the first symbol string; detecting a second discrete symbol string in the data defining the image, thereby isolating a word object represented by the second symbol string; deriving a second contour signal representative of a shape of the second symbol string; measuring the characteristic dimension for the second symbol string; calculating a scaling ratio by dividing the characteristic dimension for the first symbol string by the characteristic dimension for the second symbol string; scaling, in two-dimensions, the second contour signal in accordance with the scaling ratio, to produce a new second contour signal for subsequent comparison to the first contour signal determining a difference signal representative of the difference between the first and second contour signals over a range in which both signals are defined; and evaluating the difference signal over a portion of the defined range to arrive at a difference measure indicative of the relative similarity between the first and second strings. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 12, 13, 14, 15, 16)
-
-
10. The method of claim 10 wherein the step of calculating a difference value representative of the difference between the value of the first contour signal and the value of the second contour signal includes the steps of:
-
(a) calculating the value of a difference measure used to represent the relative difference between f(x1) and g(x2); and (b) determining whether f(x1) is greater than g(x2), and if so, multiplying the value determined in step (a) by a predetermined hill-to-valley ratio to form a product which is returned as the difference signal, otherwise returning the calculated value as the difference signal. - View Dependent Claims (11)
-
-
17. An apparatus for determining the equivalency of a plurality of strings of symbols which form word objects within data defining an image to determine a relative measure of the similarity between the symbol strings, comprising:
-
isolation means for detecting first and second discrete symbol strings within the data defining the image and isolating the discrete symbol strings into word objects represented by the strings; means for deriving a first contour signal representative of a shape of the first symbol string and a second contour signal representative of a shape of the second symbol string wherein the means for deriving contour signals includes measurement means for determining the magnitude of a dimension common to all symbols of a symbol string, said measurement means operating on both the first and second symbol strings to produce both a first common dimension measurement and a second common dimension measurement, means for scaling one of the contour signals, in proportion to the ratio of the first common dimension measurement with respect to the second common dimension measurement; means for determining a difference signal representative of the difference between a magnitude of the first contour signal and a magnitude of the second contour signal over a defined range; and arithmetic processing means for evaluating the difference signal over a portion of the defined range to arrive at a difference measure indicative of the relative similarity between the first and second string. - View Dependent Claims (18, 19, 20, 21, 22, 23, 24, 25)
-
-
26. A method for comparing a first string of symbols which form a word object within data defining a first image with a second string of symbols which form a word object within data defining a second image to determine a relative measure of the similarity between the first and second strings of symbols, comprising the steps of:
-
detecting a first discrete symbol string in the data defining the first image, thereby isolating a word object represented by the first symbol string; deriving a first contour signal representative of a shape of the first symbol string; detecting a second discrete symbol string in the data defining the second image, thereby isolating a word object represented by the second symbol string; deriving a second contour signal representative of a shape of the second symbol string; determining a difference signal representative of the difference between the first contour signal and the second contour signals over a range in which both signals are defined, wherein the step of determining a difference signal further includes the steps of finding a center of gravity for an arc formed by the first contour, finding a center of gravity for an arc formed by the second contour, aligning the centers of gravity for the first and second contours to determine a relative shift therebetween, identifying, at the extremities of the contours, non-overlapping regions of the contours having only one contour defined therein, defining, within a non-overlapping region, the difference signal as the difference between the contour signal defined therein and a predetermined constant value, and defining, within the overlapping region where both contours are defined, the difference signal as the difference between the contour signals defined therein; and evaluating the difference signal over a portion of the defined range to arrive at a difference measure indicative of the relative similarity between the first and second strings. - View Dependent Claims (27, 28, 29, 30, 31, 32, 33, 34, 35)
-
-
36. A method for comparing a first string of symbols which form a word object within data defining an image with a known string of symbols which form a word object in order to determine a relative measure of the similarity between the first and known strings, comprising the steps of:
-
detecting a first discrete symbol string in the data defining the first image, thereby isolating a word object represented by the first string of symbols; deriving a first contour signal representative of a shape of the first string of symbols; deriving a second contour signal representative of a shape of the known string of symbols; determining a difference signal representative of the difference between the first and second contour signals over a range in which both signals are defined;
wherein the step of determining a difference signal further includes the steps offinding a center of gravity for an arc formed by the first contour, finding a center of gravity for an arc formed by the second contour, aligning the centers of gravity for the first and second contours to determine a relative shift therebetween, identifying, at the extremities of the contours, non-overlapping regions of the contours having only one contour defined therein, defining, within the non-overlapping regions, the difference signal as the difference between the contour signal defined therein and a predetermined constant value, and defining, within the overlapping region where both contours are defined, the difference signal as the difference between the contour signals defined therein; and evaluating the difference signal over a portion of the defined range to arrive at a difference measure indicative of the relative similarity between the first and second strings. - View Dependent Claims (37, 38, 39)
-
-
40. A method for comparing a first string of symbols which form a word object within data defining an image with a known string of symbols which form a word object in order to to determine a relative measure of the similarity between the first and second strings, comprising the steps of:
-
(a) detecting a first discrete symbol string in the data defining the image, thereby isolating a word object represented by the first string of symbols; (b) deriving a first contour signal f(x) representative of a shape of the first string of symbols, wherein the first contour signal f(x) is a one-dimensional signal based upon a single independent variable x; (c) deriving a second contour signal g(x) representative of a shape of the known string of symbols, wherein the second contour signal g(x) is also a one-dimensional signal based upon the single independent variable x; (d) determining a difference signal, d(x) representative of the difference between the first and second contour signals over a range in which both signals are defined, including the steps of; (i) evaluating the value of the first contour signal, f(x1), where single independent variable x1 lies within the domain xa through xb ; (ii) evaluating the value of the second contour signal, g(x2), where single independent variable x2 lies within the domain xc through xd ; (iii) calculating a difference value representative of the difference between the value of the first contour signal and the value of the second contour signal; (iv) storing the difference value; v) repeating steps (ii) through (iv) until signal g(x2) has been evaluated at all single independent variable points x2 lying within the domain xc through xd ; (vi) setting the value of the difference signal at location x1, d(x1), equal to the the smallest stored difference value; (vii) repeating steps (i) through (vi) for all single independent variable points x1 lying within the domain xa through xb, thereby defining difference signal d(x) over the domain xa through xb ; and (e) evaluating the difference signal over a portion of the defined range to arrive at a difference measure indicative of the relative similarity between the first and second strings.
-
-
41. A method for comparing a first string of symbols which form a word object within data defining an image with a known string of symbols which form a word object in order to to determine a relative measure of the similarity between the first and second strings, comprising the steps of:
-
(a) detecting a first discrete symbol string in the data defining the image, thereby isolating a word object represented by the first string of symbols; (b) deriving a first contour signal f(x) representative of the shape of the first string of symbols, wherein the first contour signal f(x) is a one-dimensional signal based upon a single independent variable x; (c) deriving a second contour signal g(x) representative of the shape of the known string of symbols, wherein the second contour signal g(x) is also a one-dimensional signal based upon the single independent variable x; (d) determining a difference signal d(f(x),g(x)) representative of the difference between the first and second contour signals, including steps of; (i) evaluating the value of the first contour signal, f(xm), where xm lies within the domain xa through xb, over which the first contour signal is defined; (ii) evaluating the value of the second contour signal, g(xn), where xn lies within a subset of the domain xc through xd, over which the second contour signal is defined; (iii) evaluating difference signal, d(f(xm),g(xn)), representative of the difference between the value of the first contour signal f(xm) and the value of the second contour signal g(xn), where both xm and xn lie within a range over which the first and second contours are defined, to determine a difference value; (iv) weighting the difference value in accordance with a predefined weighting factor, the selection of which is determined by the relationship between the single independent variable values xm and xn ; (v) storing, in a unique memory location (m,n), the weighted difference value; (vi) repeating steps (ii) through (v) until signal g(xn) has been evaluated at all points xn lying within the subset of the domain xc through xd, so as to generate an array of stored difference values; (vii) repeating steps (i) through (vi) for all values of xm lying within the domain xa through xb, thereby evaluating the difference signal, d(f(xm),g(xn)), over the domain xa through xb ; (viii) systematically traversing the array of stored difference values, by beginning at array position (xa,xc), and finishing at position (xb,xd), while accumulating the stored difference values of all array locations along a traversal path therebetween; and (ix) repeating step (viii) for all possible traversal paths, wherein the path resulting in the smallest accumulated difference value is an optimized traversal path, the accumulated value of which is a measure of the difference between the contour signals, indicating the relative similarity between the first and second strings. - View Dependent Claims (42, 43, 44, 45)
-
-
46. A method for determining the relevancy, defined by the number of common words appearing in both, of two printed text documents, said documents being represented as arrays of image data, comprising the steps of:
-
generating word shape contours for a plurality of symbol strings within a first array of image data representing a first document based upon a difference between distances separating adjacent symbols within the strings and distances separating adjacent symbol strings therein; generating word shape contours for a plurality of symbol strings within a second array of image data representing a second document based upon a difference between distances separating adjacent symbols within the strings and distances separating adjacent symbol strings therein; comparing all of the word shape contours of the first array with all of the word shape contours of the second array to produce difference measures for all pairs of word shape contours compared; determining the number of word shape pairs having difference measures below a predetermined threshold; and based upon said number, determining what proportion of the words contained in the two documents are similar, thereby producing an indication of the relevance of the two documents.
-
Specification