×

Discovery of parallel text portions in comparable collections of corpora and training using comparable texts

  • US 20050228643A1
  • Filed: 03/22/2005
  • Published: 10/13/2005
  • Est. Priority Date: 03/23/2004
  • Status: Active Grant
First Claim
Patent Images

1. A method, comprising:

  • Obtaining a collection of texts which are not parallel texts;

    determining sentence portions within the collection of texts, whose meaning is substantially the same, by comparing a plurality of sentence portions within the collection of texts, and determining at least one parameter indicative of a sentence portion in the first document and a sentence portion in the second document, and using said at least one parameter to determine sentence portions which have similar meanings; and

    using said sentence portions which have similar meanings to create training data for a machine translation system.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×