×

Device for outputting character recognition results, character recognition device, and program therefor

  • US 7,558,426 B2
  • Filed: 08/21/2008
  • Issued: 07/07/2009
  • Est. Priority Date: 04/19/2004
  • Status: Expired due to Fees
First Claim
Patent Images

1. A device for outputting results of character recognition processing, comprising:

  • a category classifier for performing character recognition processing to classify image data of a plurality of characters to be recognized, thereby associating each of the characters with one of a plurality of categories recognized in the character recognition processing, and storing the image data of the characters associated with each one of the plurality of categories in storage means;

    a clustering processor for reading out the stored image data, for the characters associated with each one of the plurality of categories, and further classifying each of the characters into one of a plurality of clusters within the category, the clustering processor configured for;

    determining feature values related to shapes of each of the characters read out from stored the image data, further comprising, for each of the characters;

    normalizing a size of the image data of the character;

    dividing the normalized image data into a particular number of regions oriented in a vertical direction and again into the particular number of regions in a horizontal direction;

    determining, for each of a first plurality of pixel lines extending across a width of each of the regions oriented in the vertical direction, a count of a number of pixels encountered when starting from an upper edge of that pixel line in the region, until a color of the normalized image data changes from white to black;

    determining, for each of a second plurality of pixel lines extending across a depth of each of the regions oriented in the horizontal direction, the count of the number of pixels encountered when starting from a leftmost edge of that pixel line in the region, until the color of the normalized image data changes from white to black;

    summing, for each of the regions, the count of the number of encountered pixels to obtain a feature value for the region; and

    storing each of the feature values in a feature vector associated with the character;

    computing a nucleus for a first cluster within the category as an average value of the feature values in the feature vectors associated with the characters associated with the category, wherein the first cluster initially represents all of the characters associated with the category; and

    subdividing the first cluster into a plurality of clusters, until reaching a configured maximum number of clusters, by;

    selecting a pair comprising two arbitrary ones of the feature vectors associated with the characters associated with the category;

    establishing the two arbitrary ones as temporary nuclei for subdividing the first cluster;

    assigning each of the other feature vectors associated with the characters associated with the category to a nearest one of the temporary nuclei, thereby creating a pair of temporary clusters;

    for each of the temporary clusters, determining an average value of distances between the temporary nucleus for that temporary cluster and each of the assigned feature vectors in that temporary cluster and summing the determined average values;

    repeating, for each remaining combination of two of the feature vectors, the selecting, the establishing, the assigning, the determining an average value, and the summing the determined average values; and

    determining, from the pairs of temporary clusters, which pair exhibits a minimum value for the determined sum and establishing that pair as a new subdivision of the first cluster; and

    a screen creator for displaying the image data for each of the characters on a confirmation screen, the screen creator configured for;

    sorting, within each of the categories and each of the clusters into which the category is subdivided, the image data for each of the characters associated with that category and that cluster into a sorted order determined using the feature vector for that character;

    displaying, for each of the categories and each of the clusters into which the category is subdivided, the image data for the characters associated with that category and cluster on the confirmation screen in the sorted order, such that the displayed image data is visually grouped by cluster within category; and

    displaying, for each of the clusters, a cluster identifier in association with the image data displayed for each of the characters associated with that cluster, thereby visually emphasizing when one of the clusters ends and another of the clusters begins.

View all claims
  • 0 Assignments
Timeline View
Assignment View
    ×
    ×