×

Selection of domain-adapted translation subcorpora

  • US 8,838,433 B2
  • Filed: 02/08/2011
  • Issued: 09/16/2014
  • Est. Priority Date: 02/08/2011
  • Status: Active Grant
First Claim
Patent Images

1. A computer-implemented selection system, comprising:

  • linguistic data corpora that include an in-domain corpus and an out-domain corpus for domain adaptation for machine translation model training, the in-domain corpus and the out-domain corpus including multi-lingual data translated to the corpora in parallel;

    a relevance component that selects relevant multi-lingual data from the out-domain corpus based on a similarity measure, the similarity measure considering a difference of cross-entropy scores according to an in-domain language model and an out-domain language model, the relevant multi-lingual data utilized in combination with the in-domain corpus or in isolation without the in-domain corpus; and

    a processor that executes computer-executable instructions associated with at least the relevance component.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×