×

PARALLEL FRAGMENT EXTRACTION FROM NOISY PARALLEL CORPORA

  • US 20090299729A1
  • Filed: 06/02/2008
  • Published: 12/03/2009
  • Est. Priority Date: 06/02/2008
  • Status: Active Grant
First Claim
Patent Images

1. A method of extracting parallel fragments from a first corpus in a first language and a second corpus in a second language, comprising:

  • for respective elements of the first corpus, calculating;

    a monolingual probability of the element with respect to preceding elements of the first corpus, anda bilingual probability of the element with respect to an aligned element of the second corpus;

    aligning elements of the first corpus with elements of the second corpus to identify candidate fragments comprising a sequence of first corpus elements having a greater bilingual probability than a monolingual probability; and

    extracting parallel fragments respectively comprising;

    the first corpus elements of a candidate fragment, andthe second corpus elements aligned with the first corpus elements of the candidate fragment.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×