×

Identifying language origin of words

  • US 8,185,376 B2
  • Filed: 03/20/2006
  • Issued: 05/22/2012
  • Est. Priority Date: 03/20/2006
  • Status: Active Grant
First Claim
Patent Images

1. A method for determining a language of origin of a word comprising analyzing non-uniform letter sequence portions of the word, wherein analyzing comprises:

  • using one or more processors of a computing system, segmenting the word into strings of letter chunks based on different criteria, the letter chunks being of non-uniform length of one or more letters;

    using one or more processors of a computing system, ascertaining a probability of the word belonging to a selected language by using a plurality of N-gram models based directly on the letter chunks segmented with the different criteria for each of a plurality of different languages, and providing results from using the plurality of N-gram models based directly on letter chunks extracted with the different criteria to a combined classifier that merges the results from the plurality of N-gram models to provide a hypothesis of the language of origin, wherein the combined classifier comprises a plurality of Gaussian mixture models wherein scores from multiple letter chunks models are treated as an eigenvector of a word and a Gaussian mixture model is provided for each of the plurality of different languages, and wherein the results from the plurality of N-gram models are scored by each of the Gaussian mixture models; and

    outputting the hypothesis of the language of origin of the word provided by the combined classifier.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×