×

Method of identifying script of line of text

  • US 7,020,338 B1
  • Filed: 04/08/2002
  • Issued: 03/28/2006
  • Est. Priority Date: 04/08/2002
  • Status: Expired due to Fees
First Claim
Patent Images

1. A method of script identification, comprising the steps of:

  • (a) assigning a weight for each of a user-definable number of n-grams in a user-definable number of documents of known scripts, where each of the user-definable number of documents of known scripts is assigned a score equal to the sum of the weights of the n-grams contained therein;

    (b) identifying a line of text in a document of unknown script, where the line of text includes pixels;

    (c) cropping the line of text identified in step (b);

    (d) rescaling the line of text cropped in step (c);

    (e) replacing the line of text rescaled in step (d) with at least one number associated with k-mean cluster centroids of script components to which at least one portion of the line of text most closely matches;

    (f) scoring the line of text replaced in step (e) against the user-definable number of documents of known scripts using the n-gram weights assigned in step (a);

    (g) identifying the highest score attained in step (f);

    (h) identifying the user-definable document of known script against which the highest score in step (f) was attained;

    (i) declaring the line of text identified in step (b) as having been written in the script identified in step (h); and

    (j) returning to step (b) if another line of text of unknown script is desired to be processed.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×