Clustering
First Claim
1. A clustering system comprising:
- a mark extractor to extract a mark from a document;
a match component operative to compare at least one property of the mark to match properties of existing clusters of marks so as to identify matching existing clusters;
a two dimensional table that stores the existing clusters according to box size; and
a match symbol component operative to compare the mark to the matching existing clusters and identify a matching cluster.
3 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods for performing clustering of a document image are disclosed. A property of an extracted mark from a document is compared to the properties of the existing clusters. If the property of the mark fails to match any of the properties of the existing clusters, the mark is added as a new cluster to the existing cluster. One property that can be utilized is x size and y size, which is the width and height, of the existing clusters. Another property that can be employed is ink size, which refers to the ratio of black pixels to total pixels in a cluster. Yet another property that can be utilized is a reduced mark or image, which is a pixel size reduced version the bitmap of the mark and/or cluster. The above properties can be employed to identify mismatches and reduce the number of bit by bit comparisons performed.
132 Citations
32 Claims
-
1. A clustering system comprising:
-
a mark extractor to extract a mark from a document;
a match component operative to compare at least one property of the mark to match properties of existing clusters of marks so as to identify matching existing clusters;
a two dimensional table that stores the existing clusters according to box size; and
a match symbol component operative to compare the mark to the matching existing clusters and identify a matching cluster. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
-
-
20. A method of clustering comprising:
-
locating a mark within a document;
comparing a first property of the mark with first properties of existing clusters to identify matching and mismatching clusters;
on a match of the first property, comparing a bitmap of the mark with bitmaps of the matching clusters to find a matched cluster of the matching clusters; and
on a mismatch of the first property and on a mismatch of the bitmap, adding the mark as the new cluster to the existing clusters. - View Dependent Claims (21, 22, 23, 24, 25, 26, 27, 28)
-
-
29. A document encoding system comprising:
-
a mask separator operative to generate a binary mask from a document image, the binary mask including textual information;
a background foreground segmenter operative to segment a foreground image and a background image from the document image according to the binary mask; and
a clustering system operative to identify clusters in the mask in a computationally efficient manner.
-
-
30. A data packet adapted to be transmitted between at least two computer processes, comprising:
a data field comprising information associated with a property of clusters, the property of the clusters being efficiently comparable to a similar property of a mark to identify a mismatch, the mismatch indicating that the mark is a new cluster and avoiding a bit by bit comparison of the mark to the clusters.
-
31. A computer readable medium storing computer executable components operable to perform a method of clustering, comprising:
-
a component for locating a mark;
a component for comparing a first property of the mark with first properties of existing clusters to identify matching and mismatching clusters;
on a match of the first property, a component for comparing a bitmap of the mark with bitmaps of the matching clusters to find a most matched cluster of the matching clusters; and
on a mismatch of the first property and on a mismatch of the bitmap, a component to add the mark as a new cluster to the existing clusters.
-
-
32. A computer readable medium storing computer executable instructions operable to perform a method of clustering, comprising:
for each page of at least one page of a document;
a component for finding at least one mark;
a component for comparing a first property of the at least one mark with first properties of existing clusters to identify matching and mismatching clusters;
on a match of the first property, a component for comparing a bitmap of the at least one mark with bitmaps of the matching clusters to find a most matched cluster of the matching clusters; and
on a mismatch of the first property and on a mismatch of the bitmap, a component for adding the at least one mark as a new cluster to the existing clusters; and
a component for updating a global library.
Specification