×

Method for building parallel corpora

  • US 20080262826A1
  • Filed: 04/20/2007
  • Published: 10/23/2008
  • Est. Priority Date: 04/20/2007
  • Status: Active Grant
First Claim
Patent Images

1. A method for identifying documents for enriching a statistical:

  • translation tool comprising;

    retrieving at least one source document which is responsive to a source language query;

    for each retrieved source document;

    extracting a set of text segments from the retrieved source document;

    translating the extracted text segments into target language segments with a statistical translation tool to be enriched;

    formulating target language queries based on the target language segments;

    for each of a plurality of the target language queries, retrieving a set of target documents responsive to the target language query;

    filtering the sets of retrieved target documents that are responsive to the target language queries, the filtering including identifying candidate documents which meet a selection criterion that is based on co-occurrence of a target document in a plurality of the sets; and

    comparing the candidate documents with the retrieved source document for determining whether any of the candidate documents match the source document.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×