Spatial recognition and grouping of text and graphics
First Claim
1. A computer readable medium having stored thereon executable software comprising:
- a component that receives input comprising at least one entity comprising one or more strokes, wherein strokes associated with the input are generated in a non-linear sequence in time; and
a recognition component that utilizes a simultaneous segmentation and recognition process to recognize the at least one entity in the input, the recognition component recognizes the at least one entity based at least in part on a predetermined minimum threshold distance between convex hulls of respective strokes associated with the input, wherein strokes are associated with the at least one entity when distance between convex hulls of respective strokes is less than the predetermined minimum threshold distance, and a predetermined maximum number of strokes per entity, and further wherein the simultaneous segmentation and recognition process comprises;
linking strokes of the input in a graph to form connected subgraphs;
applying a search method over the connected subgraphs of a proximity graph to determine an optimal decomposition; and
recognizing the optimal decomposition via employment of a classifier, wherein the search method employs an optimization method comprising;
utilizing a cost optimization function to facilitate in segmentation and recognition to determine an optimum grouping and labeling according to Equation (1);
C([V.sub.i]))=.PHI.(R(V.sub.i),R(V.sub.2), . . . , R(V.sub.n));
(Eq.
1)where V.sub.i is a subset of vertices which form a decomposition of an input, R is a best recognition result for the set of vertices, the function .PHI. is a combination cost, and C represents an overall cost of a particular grouping [V.sub.i],wherein the segmentation and recognition process further provides for subsequent correction of the one or more strokes associated with the input generated in a non-linear sequence in time.
2 Assignments
0 Petitions
Accused Products
Abstract
The present invention leverages spatial relationships to provide a systematic means to recognize text and/or graphics. This allows augmentation of a sketched shape with its symbolic meaning, enabling numerous features including smart editing, beautification, and interactive simulation of visual languages. The spatial recognition method obtains a search-based optimization over a large space of possible groupings from simultaneously grouped and recognized sketched shapes. The optimization utilizes a classifier that assigns a class label to a collection of strokes. The overall grouping optimization assumes the properties of the classifier so that if the classifier is scale and rotation invariant the optimization will be as well. Instances of the present invention employ a variant of AdaBoost to facilitate in recognizing/classifying symbols. Instances of the present invention employ dynamic programming and/or A-star search to perform optimization. The present invention applies to both hand-sketched shapes and printed handwritten text, and even heterogeneous mixtures of the two.
131 Citations
44 Claims
-
1. A computer readable medium having stored thereon executable software comprising:
-
a component that receives input comprising at least one entity comprising one or more strokes, wherein strokes associated with the input are generated in a non-linear sequence in time; and a recognition component that utilizes a simultaneous segmentation and recognition process to recognize the at least one entity in the input, the recognition component recognizes the at least one entity based at least in part on a predetermined minimum threshold distance between convex hulls of respective strokes associated with the input, wherein strokes are associated with the at least one entity when distance between convex hulls of respective strokes is less than the predetermined minimum threshold distance, and a predetermined maximum number of strokes per entity, and further wherein the simultaneous segmentation and recognition process comprises; linking strokes of the input in a graph to form connected subgraphs; applying a search method over the connected subgraphs of a proximity graph to determine an optimal decomposition; and recognizing the optimal decomposition via employment of a classifier, wherein the search method employs an optimization method comprising; utilizing a cost optimization function to facilitate in segmentation and recognition to determine an optimum grouping and labeling according to Equation (1);
C([V.sub.i]))=.PHI.(R(V.sub.i),R(V.sub.2), . . . , R(V.sub.n));
(Eq.
1)where V.sub.i is a subset of vertices which form a decomposition of an input, R is a best recognition result for the set of vertices, the function .PHI. is a combination cost, and C represents an overall cost of a particular grouping [V.sub.i], wherein the segmentation and recognition process further provides for subsequent correction of the one or more strokes associated with the input generated in a non-linear sequence in time. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 44)
-
-
24. A method for facilitating recognition, comprising:
-
receiving at least one P-dimensional input comprising a plurality of strokes that are generated in a non-linear sequence in time; determining at least two potential groupings of strokes associated with the input based at least in part on a predetermined proximity threshold between respective strokes associated with the input and a predetermined maximum number of strokes per entity, to facilitate recognizing at least one entity; determining a respective best recognition result for each subset of strokes associated with each potential grouping of the at least two potential groupings, wherein each potential grouping can comprise one or more subsets of strokes corresponding to one or more entities of the at least one entity; calculating a combination cost relating to the cost of combining the respective best recognition results associated with each potential grouping of the at least two potential groupings; calculating a respective total cost for each potential grouping of the at least two potential groupings as a function of respective combination costs of each potential grouping of the at least two potential groupings; selecting a potential grouping of the at least two potential groupings that has a best cost as a best grouping of strokes; utilizing the best grouping of strokes to facilitate recognizing the at least one entity in the input; and employing a simultaneous segmentation and recognition process to recognize the at least one entity in the input based at least in part on a predetermined proximity threshold between respective strokes associated with the input, wherein strokes are associated with the at least one entity when distance between convex hulls of respective strokes is less than the predetermined proximity threshold, and a predetermined number of strokes per entity, and further wherein the simultaneous segmentation and recognition process comprises; linking strokes of the input in a graph to form connected subgraphs; applying a search method over the connected subgraphs of the proximity graph to determine an optimal decomposition; and recognizing the optimal decomposition via employment of a classifier, wherein the search method employs an optimization method comprising; utilizing a cost optimization function to facilitate in segmentation and recognition to determine an optimum grouping and labeling according to;
C({Vi})=Φ
(R(V1),R(V2), . . . , R(Vn));where Vi is a subset of vertices which form a decomposition of an input, R is a best recognition result for the set of vertices, the function Φ
is a combination cost, and C represents an overall cost of a particular grouping {Vi},wherein the segmentation and recognition process further provides for subsequent correction of the one or more strokes associated with the input generated in a non-linear sequence in time. - View Dependent Claims (25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 43)
-
-
42. A system that facilitates recognition comprising:
-
means for receiving at least one input comprising, at least in part, a P-dimensional space of at least one entity, each entity comprising one or more strokes, wherein strokes are generated in a non-linear sequence in time; and means for utilizing a simultaneous segmentation and recognition process to recognize the at least one entity from the input as a function of a predetermined proximity threshold between convex hulls of respective strokes, wherein strokes are associated with the at least one entity when distance between convex hulls of respective strokes is less than the predetermined proximity threshold, and a predetermined maximum number of strokes per entity, and further wherein the simultaneous segmentation and recognition process comprising; linking strokes of the input in a graph to form connected subgraphs; applying a search method over the connected subgraphs of a proximity graph to determine an optimal decomposition; and recognizing the optimal decomposition via employment of a classifier, the search method employs an optimization method comprising; utilizing a cost optimization function to facilitate in segmentation and recognition to determine an optimum grouping and labeling according to Equation (1);
C({Vi})=Φ
(R(V1),R(V2), . . . , R(Vn));where Vi is a subset of vertices which form a decomposition of an input, R is a best recognition result for the set of vertices, the function Φ
is a combination cost, and C represents an overall cost of a particular grouping {Vi},wherein the segmentation and recognition process further provides for subsequent correction of the one or more strokes associated with the input generated in a non-linear sequence in time.
-
Specification