Method and apparatus for similarity matching of handwritten data objects
First Claim
1. In a database having a plurality of handwritten strings, apparatus for determining a distance between two of the plurality of strings, comprising:
- (a) a processor for extracting a plurality of global features from each one of the plurality of handwritten strings in the database, comprising;
(1) means for dividing the handwritten string into a plurality of strokes;
(2) means for identifying a plurality of bounding boxes, each bounding box containing a respectively different one of the plurality of strokes;
(3) means for extracting from the string;
(A) a number of points in the string,(B) a maximum angle between a first one of the points in the string and a corner of a tallest one of the plurality of bounding boxes,(C) a number of positive inversions in the string, and(D) a number of negative inversions in the string;
(b) means for storing the extracted global features in a storage medium; and
(c) means for calculating the distance between the two handwritten strings based on all of the numbers of points, maximum angles, numbers of positive inversions and numbers of negative inversions extracted by the extracting means.
3 Assignments
0 Petitions
Accused Products
Abstract
Apparatus for determining a distance between two handwritten strings in a database. A processor extracts global features from each string. The processor divides the string into strokes, and identifies a plurality of bounding boxes. Each box contains a different stroke. The processor extracts global features from the suing, including: (1) a number of points; (2) a maximum angle between a first point in the string and a corner of the tallest bounding box; (3) a number of positive inversions; and (4) a number of negative inversions. The apparatus calculates the distance between the strings based on all of the numbers of points, maximum angles, numbers of positive inversions and numbers of negative inversions. A fixed query tree index may be formed. The tree has leaves and internal nodes belonging to multiple levels. A different key is associated with each level. Each key is a handwritten string. Each string is associated with one of the leaves, such that each child of each internal node in any of the levels between the one leaf and the root node is a root of a respective subtree. Each string associated with any leaf in the subtree which includes the one leaf is equally distant from the key associated with the one level, based on the global features. The tree is queried to search for a subset of the strings, such that each string in the subset is within a threshold distance of an input string, according to the distance function.
278 Citations
30 Claims
-
1. In a database having a plurality of handwritten strings, apparatus for determining a distance between two of the plurality of strings, comprising:
-
(a) a processor for extracting a plurality of global features from each one of the plurality of handwritten strings in the database, comprising; (1) means for dividing the handwritten string into a plurality of strokes; (2) means for identifying a plurality of bounding boxes, each bounding box containing a respectively different one of the plurality of strokes; (3) means for extracting from the string; (A) a number of points in the string, (B) a maximum angle between a first one of the points in the string and a corner of a tallest one of the plurality of bounding boxes, (C) a number of positive inversions in the string, and (D) a number of negative inversions in the string; (b) means for storing the extracted global features in a storage medium; and (c) means for calculating the distance between the two handwritten strings based on all of the numbers of points, maximum angles, numbers of positive inversions and numbers of negative inversions extracted by the extracting means. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method for indexing and querying a database having a plurality of electronic handwritten strings, comprising the steps of:
-
(a) extracting a plurality of global features from each one of the plurality of electronic handwritten strings; (b) forming a fixed query tree index having a plurality of leaves and a plurality of internal nodes which belong to a plurality of levels, including the steps of; (1) associating a respectively different single key with each one of the plurality of levels, each key being a handwritten string, (2) associating each string with one of the plurality of leaves, such that; (A) each child of each internal node in any one of the plurality of levels between the one leaf and a root node of the index is a root of a respective subtree, and (B) each string associated with any leaf in the subtree which includes the one leaf is equally distant from the single key associated with the one level, using a distance function based on the global features; and (c) querying the fixed query tree to search for a first subset of the strings, such that each string in the first subset is within a threshold distance of an input string, according to the distance function used in step (b) (2) (B). - View Dependent Claims (9, 10, 11, 12, 13)
-
-
14. A method for indexing a database having a plurality of electronic handwritten strings, comprising the steps of:
-
(a) performing, for each one of the plurality of electronic handwritten strings, the steps of; (1) dividing the handwritten string into a plurality of strokes; (2) identifying a plurality of bounding boxes, each bounding box containing a respectively different one of the plurality of strokes; (3) extracting a plurality of global features from the string, the plurality of global features including; (A) a number of points in the string, (B) a maximum angle between a first one of the points in the string and a corner of a tallest one of the plurality of bounding boxes, (C) a number of positive inversions in the string, and (D) a number of negative inversions in the string; (b) forming a fixed query tree index having a plurality of leaves and a plurality of internal nodes which belong to a plurality of levels, including the steps of; (1) associating a respectively different key with each one of the plurality of levels, each key being a handwritten string, (2) calculating a respective distance between one of the plurality of strings and one or more of the keys, based on all of the numbers of points, maximum angles, numbers of positive inversions and numbers of negative inversions extracted in step (a) (3), (3) associating the one string with one of the plurality of leaves, such that; (A) each child of each internal node in any one of the plurality of levels between the one leaf and a root node of the index is a root of a respective subtree, and (B) each string associated with any leaf in the subtree which includes the one leaf is equally distant from the key associated with the one level, using the distances calculated in step (b) (2); and (4) repeating steps (b) (2) and (b) (3) each time a respectively different one of the plurality of electronic handwritten strings is added to the database. - View Dependent Claims (15, 16, 17, 18, 19, 20, 21)
-
-
22. A method for indexing a database having a plurality of electronic handwritten strings, comprising the steps of:
-
(a) performing, for each one of the plurality of electronic handwritten strings, the steps of; (1) dividing the handwritten string into a plurality of strokes; (2) extracting a plurality of features from each stroke of the string; (b) forming an R-tree index having a plurality of leaves and a plurality of internal nodes, each one of the plurality of internal nodes having at least one child node, including the steps of; (1) providing, for each respective one of the child nodes, a pointer to the child node and an identification of a minimum bounding contour associated with the child node, the minimum bounding contour containing a plurality of further minimum bounding contours pointed to by respective ones of a corresponding plurality of entries in the child, if the child node is an internal node, each minimum bounding contour defining a respective range of values within which the value of one of the features of one of the plurality of strokes lies; (2) storing a plurality of entries in each leaf node, each entry comprising; (A) a feature vector representing a respective one of the plurality of strokes, and (B) a pointer which points to the one of the plurality of electronic handwritten strings containing the one stroke, the one electronic handwritten string being a member of the subset, wherein all the feature vectors stored in any one leaf node differ from one another by less than a threshold distance. - View Dependent Claims (23)
-
-
24. A method for indexing a database having a plurality of electronic handwritten strings, comprising the steps of:
-
(a) performing, for each one of the plurality of electronic handwritten strings, the steps of; (1) dividing the handwritten string into a plurality of strokes; (2) extracting a plurality of features from each stroke of the string; (b) forming an R-tree index having a plurality of leaves and a plurality of internal nodes, each one of the plurality of internal node having at least one child node, including the steps of; (1) providing, for each respective one of the child nodes, a pointer to the child node and an identification of a minimum bounding contour associated with the child node, the minimum bounding contour containing a plurality of further minimum bounding contours pointed to by respective ones of a corresponding plurality of entries in the child node, if the child node is an internal node, (2) storing a plurality of entries in each leaf node, (A) each entry comprising an additional minimum bounding contour which is associated with a respective range of values for each one of the plurality of features, (B) the additional minimum bounding contour being associated with a plurality of pointers, each pointer pointing to a respective one of the plurality of electronic handwritten strings for which the features of at least one stroke thereof are within the respective range of values of the additional minimum bounding contour, (C) wherein all the additional minimum bounding contours stored in any one leaf node differ from one another by less than a threshold distance. - View Dependent Claims (25, 26)
-
-
27. A method for indexing a database having a plurality of electronic handwritten strings, comprising the steps of:
-
(a) performing, for each one of the plurality of electronic handwritten strings, the steps of; (1) dividing the handwritten string into a plurality of strokes; (2) identifying a plurality of bounding boxes, each bounding box containing a respectively different one of the plurality of strokes; (3) extracting a plurality of global features from the string, the plurality of global features including; (A) a number of points in the string, (B) a maximum angle between a first one of the points in the string and a corner of a tallest one of the plurality of bounding boxes, (C) a number of positive inversions in the string, and (D) a number of negative inversions in the string; (b) forming an R-tree index having a plurality of leaves and a plurality of internal nodes, (1) each internal node having at least one child node, and having, for each respective child node, a pointer to the child node and an identification of a minimum bounding contour associated with the child node, the minimum bounding contour containing a plurality of further minimum bounding contours pointed to by respective ones of a plurality of entries in the child node, if the child node is an internal node, (2) each one of the leaf nodes having a plurality of pointers, each pointer of a given one of the leaf nodes pointing to a respective one of a subset of the plurality of handwritten strings, for which subset the global features extracted in step (a) (3) differ from one another by less than a threshold value. - View Dependent Claims (28, 29, 30)
-
Specification