×

Methods for automatic generation of parallel corpora

  • US 9,881,006 B2
  • Filed: 12/31/2014
  • Issued: 01/30/2018
  • Est. Priority Date: 02/28/2014
  • Status: Active Grant
First Claim
Patent Images

1. A computer implemented method comprising:

  • receiving a first set of item listings for the sale of products or services in a first language and a second set of item listings for the sale of products or services in a second language, each of the item listings in the first and second sets of item listings comprising one or more descriptions and metadata identifying the products or services corresponding to the respective item listing;

    collecting the metadata from the first and second sets of item listings and aligning, using the collected metadata identifying the products or services, a first item listing of the first set of item listings with a second item listing of the second set of item listings in which the first item listing and the second item listing are aligned based on the first item listing and the second item listing being directed toward the same products or services;

    mapping the first item listing to the second item listing based on the aligning of the first item listing with the second item listing;

    fetching a first description of the first item listing and a second description of the second item listing;

    measuring the structural similarity of the fetched first description with respect to the fetched second description to assess whether the first description and the second description are likely to be translations of each other; and

    in response to the first description and the second description being structurally similar, forming the first description into a first sentence in the first language as a translation of the second description into the first language and forming the second description into a second sentence in the second language as a translation of the first description into the second language.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×