×

Method and apparatus for automatic language determination of European script documents

  • US 5,377,280 A
  • Filed: 04/19/1993
  • Issued: 12/27/1994
  • Est. Priority Date: 04/19/1993
  • Status: Expired due to Term
First Claim
Patent Images

1. An automatic language determining apparatus for determining a language of a text portion of document having a known script-type, comprising:

  • input means for inputting a digital data signal representative of the text portion of the document, the text portion being in an unknown language;

    word token generating means for converting the digital data signal to a plurality of word tokens, each word token comprising at least one of a limited number of abstract-coded character classes, each abstract-coded character class representing a group of characters of the known script-type;

    feature determining means for determining at least one word token occurrence value of word tokens occurring within the plurality of word tokens and corresponding to at least one predetermined word token; and

    language determining means for determining the language of the text portion of the document based on the at least one word token occurrence value.

View all claims
  • 4 Assignments
Timeline View
Assignment View
    ×
    ×