×

Discovery of parallel text portions in comparable collections of corpora and training using comparable texts

  • US 8,296,127 B2
  • Filed: 03/22/2005
  • Issued: 10/23/2012
  • Est. Priority Date: 03/23/2004
  • Status: Active Grant
First Claim
Patent Images

1. A method, comprising:

  • obtaining, via a processing module that is executable by a processor, a collection of texts which are not parallel texts;

    determining sentences within the collection of texts, whose meaning is substantially the same, by comparing a plurality of sentences within the collection of texts, and determining at least one parameter indicative of a sentence in the first document and a sentence in the second document, and using said at least one parameter to determine sentences which have similar meanings; and

    using said sentences which have similar meanings to create training data for a machine translation system.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×