×

Estimation of parameters for machine translation without in-domain parallel data

  • US 9,652,453 B2
  • Filed: 04/14/2014
  • Issued: 05/16/2017
  • Est. Priority Date: 04/14/2014
  • Status: Expired due to Fees
First Claim
Patent Images

1. A method for estimating parameters for features of a translation scoring function and for scoring candidate translations in a target domain comprising:

  • receiving a monolingual source corpus for a target domain and deriving n-gram counts from the monolingual source corpus or receiving n-gram counts derived only from the monolingual source corpus, the monolingual source corpus comprising sentences in a source language;

    generating a multi-model for the target domain based on a phrase table for each of a set of comparative domains and a measure of similarity between the n-gram counts derived only from the source corpus for the target domain and the phrase tables for the comparative domains, each of the phrase tables storing a value for each of a set of features for each of a set of biphrases, the generated target domain multi-model being a weighted combination of two or more of the phrase tables for the comparative domains;

    for the target domain, computing a measure of similarity between the monolingual source corpus and the target domain multi-model;

    for each of a plurality of the comparative domains, computing a measure of similarity between a source corpus for the comparative domain and a respective comparative domain multi-model that is derived from phrase tables for others of the set of the comparative domains, each of the plurality of comparative domains being associated with parameters for at least some of the features of the translation scoring function;

    estimating the parameters of the translation scoring function for the target domain based on the computed measure of similarity between the source corpus and the target domain multi-model, the computed measures of similarity for the comparative domains, and the parameters for the scoring function for the comparative domains; and

    with a statistical machine translation component, scoring a translation with the translation scoring function,wherein the generating of the target domain multi-model, computing the measure of similarity between the source corpus and the target domain multi-model, computing the measure of similarity between a source corpus for the comparative domains and the respective comparative domain multi-models, and the estimating the parameters for the translation scoring function are performed with a computer processor.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×