×

Mining transliterations for out-of-vocabulary query terms

  • US 8,332,205 B2
  • Filed: 01/09/2009
  • Issued: 12/11/2012
  • Est. Priority Date: 01/09/2009
  • Status: Active Grant
First Claim
Patent Images

1. A method, implemented using electrical data processing functionality, for retrieving information, comprising:

  • receiving a query in a source language, the query having one or more query terms;

    determining whether each of the query terms is present in a translation dictionary, the translation dictionary mapping terms from the source language to a target language, each query term that is present in the translation dictionary comprising an in-vocabulary term, and each query term that is not present in the translation dictionary comprising an out-of-vocabulary (OOV) term;

    translating each in-vocabulary term to a translated term in the target language using the translation dictionary, to provide a set of one or more translated terms;

    identifying at least one document that is selected from a collection of documents in the target language based on the set of translated terms, said at least one document including a plurality of candidate words in the target language;

    performing mining analysis to attempt to extract a viable transliteration of each OOV term of the query from said at least one document by;

    (a) selecting an OOV term in the query for analysis, to provide a selected OOV term;

    (b) determining, after said selecting of the OOV term, whether the OOV term is a qualifying OOV term;

    (c) selecting a candidate word in said at least one document, to provide a selected candidate word;

    (d) determining, after said selecting of the candidate word, whether the selected candidate word is a qualifying candidate word;

    (e) determining a transliteration measure between the selected candidate word and the selected OOV term, without first having generated a transliteration for the selected OOV term, said determining of the transliteration measure being performed when the selected candidate word is a qualifying candidate word and the selected OOV term is a qualifying OOV term;

    (f) determining whether the selected candidate word is a viable transliteration of the selected OOV term based on the transliteration measure; and

    performing operations (a) through (f) for each possible other pairing of an OOV term in the query and a candidate word in said at least one document;

    updating the translation dictionary to include each viable transliteration identified by the mining analysis, to provide an updated translation dictionary and an updated set of translated terms for the query; and

    repeating said identifying, performing the mining analysis, and updating, at least one time,said receiving, determining whether each of the query terms is present in a translation dictionary, translating, identifying, performing the mining analysis, updating, and repeating being performed by the electrical data processing functionality.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×