×

Parallel fragment extraction from noisy parallel corpora

  • US 8,504,354 B2
  • Filed: 06/02/2008
  • Issued: 08/06/2013
  • Est. Priority Date: 06/02/2008
  • Status: Active Grant
First Claim
Patent Images

1. A method of extracting parallel fragments from a first corpus in a first language and a second corpus in a second language on a computer having a processor, the method comprising:

  • executing on the processor instructions configured to;

    for respective elements of the first corpus, calculate;

    a monolingual probability of the element with respect to preceding elements of the first corpus, anda bilingual probability of the element with respect to an aligned element of the second corpus;

    for respective elements of the first corpus, identify candidate fragments of the first corpus comprising respective elements of the first corpus having a greater bilingual probability of the element with aligned elements of the second corpus than only the monolingual probability of the element with respect to preceding elements of the first corpus to align elements of the first corpus with elements of the second corpus; and

    extract parallel fragments respectively comprising;

    the first corpus elements of a candidate fragment, andthe second corpus elements aligned with the first corpus elements of the candidate fragment.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×