METHODS FOR AUTOMATIC GENERATION OF PARALLEL CORPORA
2 Assignments
0 Petitions
Accused Products
Abstract
A method of forming parallel corpora comprises receiving sets of items in first language and second languages, each of the sets having one or more associated descriptions and metadata. The metadata is collected from the two sets of items and are aligned using the metadata. The aligned metadata are mapped from the first language to the second language for each of the sets. The descriptions of two items are fetched and the structural similarity of the descriptions is measured to assess whether two items are likely to be translations of each other. For mapped items with structurally similar descriptions, the mapped item descriptions are formed into respective sentences in first language and in the second language. The sentences are parallel corpora which may be used to translate an item from the first language to the second language, and also to train a machine translation system.
14 Citations
21 Claims
-
1. (canceled)
-
2. A computer implemented method comprising:
-
aligning a first item listing in a first language with a second item listing in a second language in which the first item listing and the second item listing are aligned based on the first item listing and the second item listing being directed toward the same products or services; measuring, based on the aligning of the first item listing with the second item listing, a structural similarity of a first description of the first item listing with respect to a second description of the second item listing; and in response to the first description and the second description being structurally similar, forming the first description into a first sentence in the first language as a translation of the second description into the first language. - View Dependent Claims (3, 4, 5, 6, 7, 8, 9)
-
-
10. A system comprising:
-
one or more processors; a memory to store instructions that, in response to being executed by the one or more processors, cause the system to perform operations comprising; aligning a first item listing in a first language with a second item listing in a second language in which the first item listing and the second item listing are aligned based on the first item listing and the second item listing being directed toward the same products or services; measuring, based on the aligning of the first item listing with the second item listing, a structural similarity of a first description of the first item listing with respect to a second description of the second item listing; in response to the first description and the second description being structurally similar, forming the first description into a first sentence in the first language and forming the second description into a second sentence in the second language in which the first sentence and the second sentence are parallel corpora; and using the parallel corpora to perform one or more operations selected from a group of operations consisting of;
translating another item listing from the first language to the second language; and
training a machine translation system. - View Dependent Claims (11, 12, 13, 14, 15)
-
-
16. One or more non-transitory computer-readable media embodying instructions that, in response to being executed by one or more processors of a system, cause the system to perform operations comprising:
-
measuring a structural similarity of a first description of a first item listing in a first language with respect to a second description of a second item listing in a second language based on the first item listing and the second item listing being directed toward the same products or services; and in response to the first description and the second description being structurally similar, forming the first description into a first sentence in the first language as a translation of the second description into the first language. - View Dependent Claims (17, 18, 19, 20, 21)
-
Specification