×

Natural language determination using correlation between common words

  • US 6,023,670 A
  • Filed: 12/20/1996
  • Issued: 02/08/2000
  • Est. Priority Date: 08/19/1996
  • Status: Expired due to Fees
First Claim
Patent Images

1. A method for identifying the language of a document in which a computer document is written, comprising the steps of:

  • comparing a plurality of words from the document to a word list associated with a candidate language, wherein words in the word list are a selection of a small number of the most frequently used words in the candidate language;

    accumulating a count of matches between words in the document and words in the word list for each word in the word list to produce a sample count for each word in the word list;

    correlating the sample count to a reference count for each word in the word list for the candidate language to produce a correlation score for the candidate language, wherein the correlation score is a statistical measure of a collective strength of association between the sample counts and reference counts; and

    identifying the language of the document based on the correlation score.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×