×

Language identification process using coded language words

  • US 5,548,507 A
  • Filed: 03/14/1994
  • Issued: 08/20/1996
  • Est. Priority Date: 03/14/1994
  • Status: Expired due to Fees
First Claim
Patent Images

1. A machine process for identifying a human language used in a computer coded document from text in the document, comprising the steps ofreading a sequence of words from the document,comparing each word obtained by the reading step to words in a plurality of Word Frequency Tables (WFTs) respectively associated with languages of interest, each WFT containing a set of most frequently used words in an associated language, and each word in a WFT having an associated numerical value representing a previously determined frequency of occurrence (FO) value for the word in a sample of documents written in the associated language,associating a Word frequency Accumulator (WFA) with each WFT, and resetting each WFA to a predetermined WFA value prior to reading each document by the reading step,outputting the FO value associated with each word matched by the comparing step with a word read by the reading step,inputting each FO value provided by the outputting step to the associated WFA,adding each FO value to a current sum contained in the associated WFA to generate an accumulated amount,detecting which of the plural WFAs has the largest accumulated amount, andidentifying the human language associated with the WFA detected to have the largest accumulated value.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×