×

Named entity transliteration using comparable CORPRA

  • US 8,560,298 B2
  • Filed: 10/21/2008
  • Issued: 10/15/2013
  • Est. Priority Date: 10/21/2008
  • Status: Active Grant
First Claim
Patent Images

1. A method of mining multilingual named entity transliteration comprising:

  • obtaining a document in a first language;

    obtaining a plurality of additional documents, each additional document being in a second language that is different than the first language;

    calculating a first probability distribution of the document based on words in the document in the first language;

    for each additional document of the plurality of additional documents,calculating a second probability distribution of the additional document based on words in the additional document in the second language; and

    calculating a cross language similarity score based on the first probability distribution of the document in the first language and the second probability distribution of the additional document in the second language;

    selecting at least one of the additional documents based on a comparison of the cross language similarity scores calculated for the plurality of additional documents;

    selecting a named entity in the document;

    searching the selected additional document to identify a word in the selected additional document as a corresponding named entity by comparing the named entity to a one or more words in the selected additional document; and

    storing the named entity and the identified word as named entity transliterations.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×