Methods for automatic generation of parallel corpora
First Claim
1. A computer implemented method comprising:
- receiving a first set of item listings for the sale of products or services in a first language and a second set of item listings for the sale of products or services in a second language, each of the item listings in the first and second sets of item listings comprising one or more descriptions and metadata identifying the products or services corresponding to the respective item listing;
collecting the metadata from the first and second sets of item listings and aligning, using the collected metadata identifying the products or services, a first item listing of the first set of item listings with a second item listing of the second set of item listings in which the first item listing and the second item listing are aligned based on the first item listing and the second item listing being directed toward the same products or services;
mapping the first item listing to the second item listing based on the aligning of the first item listing with the second item listing;
fetching a first description of the first item listing and a second description of the second item listing;
measuring the structural similarity of the fetched first description with respect to the fetched second description to assess whether the first description and the second description are likely to be translations of each other; and
in response to the first description and the second description being structurally similar, forming the first description into a first sentence in the first language as a translation of the second description into the first language and forming the second description into a second sentence in the second language as a translation of the first description into the second language.
2 Assignments
0 Petitions
Accused Products
Abstract
A method of forming parallel corpora comprises receiving sets of items in first language and second languages, each of the sets having one or more associated descriptions and metadata. The metadata is collected from the two sets of items and are aligned using the metadata. The aligned metadata are mapped from the first language to the second language for each of the sets. The descriptions of two items are fetched and the structural similarity of the descriptions is measured to assess whether two items are likely to be translations of each other. For mapped items with structurally similar descriptions, the mapped item descriptions are formed into respective sentences in first language and in the second language. The sentences are parallel corpora which may be used to translate an item from the first language to the second language, and also to train a machine translation system.
-
Citations
20 Claims
-
1. A computer implemented method comprising:
-
receiving a first set of item listings for the sale of products or services in a first language and a second set of item listings for the sale of products or services in a second language, each of the item listings in the first and second sets of item listings comprising one or more descriptions and metadata identifying the products or services corresponding to the respective item listing; collecting the metadata from the first and second sets of item listings and aligning, using the collected metadata identifying the products or services, a first item listing of the first set of item listings with a second item listing of the second set of item listings in which the first item listing and the second item listing are aligned based on the first item listing and the second item listing being directed toward the same products or services; mapping the first item listing to the second item listing based on the aligning of the first item listing with the second item listing; fetching a first description of the first item listing and a second description of the second item listing; measuring the structural similarity of the fetched first description with respect to the fetched second description to assess whether the first description and the second description are likely to be translations of each other; and in response to the first description and the second description being structurally similar, forming the first description into a first sentence in the first language as a translation of the second description into the first language and forming the second description into a second sentence in the second language as a translation of the first description into the second language. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. One or more computer-readable hardware storage device having embedded therein a set of instructions which, in response to being executed by one or more processors of a system, causes the system to execute operations comprising:
-
receiving a first set of item listings for the sale of products or services in a first language and a second set of item listings for the sale of products or services in a second language, each of the item listings in the first and second sets of item listings comprising one or more descriptions and metadata identifying the products or services corresponding to the respective item listing; collecting the metadata from the first and second sets of item listings and aligning, using the collected metadata identifying the products or services, a first item listing of the first set of item listings with a second item listing of the second set of item listings in which the first item listing and the second item listing are aligned based on the first item listing and the second item listing being directed toward the same products or services; mapping the first item listing to the second item listing based on the aligning of the first item listing with the second item listing; fetching a first description of the first item listing and a second description of the second item listing; measuring the structural similarity of the fetched first description with respect to the fetched second description to assess whether the first description and the second description are likely to be translations of each other; and in response to the first description and the second description being structurally similar, forming the first description into a first sentence in the first language as a translation of the second description into the first language and forming the second description into a second sentence in the second language as a translation of the first description into the second language. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A system comprising:
-
one or more computer-readable hardware storage devices having embedded therein a set of instructions; and one or more hardware processors communicatively coupled to the one or more computer-readable hardware storage devices, and configured to, in response to execution of the set of instructions, cause the system to perform operations, the operations comprising; receiving a first set of item listings for the sale of products or services in a first language and a second set of item listings for the sale of products or services in a second language, each of the item listings in the first and second sets of item listings comprising one or more descriptions and metadata identifying the products or services corresponding to the respective item listing; collecting the metadata from the first and second sets of item listings and aligning, using the collected metadata identifying the products or services, a first item listing of the first set of item listings with a second item listing of the second set of item listings in which the first item listing and the second item listing are aligned based on the first item listing and the second item listing being directed toward the same products or services; mapping the first item listing to the second item listing based on the aligning of the first item listing with the second item listing; fetching a first description of the first item listing and a second description of the second item listing measuring the structural similarity of the fetched first description with respect to the fetched second description to assess whether the first description and the second description item listings are likely to be translations of each other; and in response to the first description and the second description being structurally similar, forming the first description into a first sentence in the first language as a translation of the second description into the first language and forming the second description into a second sentence in the second language as a translation of the first description into the second language. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification