×

Word breaker from cross-lingual phrase table

  • US 9,330,087 B2
  • Filed: 04/11/2013
  • Issued: 05/03/2016
  • Est. Priority Date: 04/11/2013
  • Status: Active Grant
First Claim
Patent Images

1. A computer-implemented process, comprising:

  • receiving a parallel corpus of a source language and a target language;

    applying a machine translation training process to the parallel corpus to generate a cross-lingual phrase table comprising a plurality of source language phrases, each source language phrase having at least one target language translation;

    applying a blocking operation to the cross-lingual phrase table to group phrases of the source language into blocks by searching the cross-lingual phrase table to find blocks of two or more source language phrases that share similar translations in the target language;

    searching each of the different source language phrases in each block to identify a stem of a word of the source language, the stem in each block comprising a same sequence of characters occurring in each of the different source language phrases of that block;

    searching each of the different source language phrases in each block to find a plurality of affixes of the stem of that block, each affix in each block comprising a sequence of characters preceding or following the characters comprising the stem in any of the different source language phrases in that block;

    generating a set of morphemes comprising the stems and affixes of words of the source language;

    in response to receipt of a user query in the source language, applying the set of morphemes to automatically create one or more different forms of one or more words of the user query; and

    performing an expanded query search using the automatically created different forms of the words of the user query.

View all claims
  • 3 Assignments
Timeline View
Assignment View
    ×
    ×