×

Automatic language identification by stroke geometry analysis

  • US 6,064,767 A
  • Filed: 01/16/1998
  • Issued: 05/16/2000
  • Est. Priority Date: 01/16/1998
  • Status: Expired due to Fees
First Claim
Patent Images

1. A computer automated method for identifying an unknown language used to create a document, including the steps of:

  • defining a set of training documents in a variety of known languages and formed from a variety of text styles;

    forming black and white pixel images of text material defining said training documents and said document in said unknown language;

    locating a plurality of seed black pixels from a region growing algorithm;

    progressively locating black pixels having a selected relationship with said seed pixels to define a plurality of line stroke segments that connect to form a line stroke;

    identifying black pixels to define a head and a tail black pixel for each said line stroke;

    extracting point features from said line stroke segments, where the point features include a vertical position and slope of individual line stroke segments, and locally-averaged radius of curvature that are effective to characterize each of said languages;

    forming feature profiles from said point features for an unknown language and each of said known languages; and

    comparing said feature profile from said unknown language with each of said feature profiles from said known languages to identify one of said known languages that best represents said unknown language.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×