Learning apparatus, translation apparatus, learning method, and translation method
First Claim
1. A learning apparatus, comprising:
- one or more non-transitory storage media; and
a computer, wherein;
the one or more non-transitory storage media includes;
a parallel corpus in which one or more pairs of original and translated sentences are stored, each of the one or more pairs having a source language sentence and a target language sentence that is a result obtained by translating the source language sentence;
an element pair storage unit in which one or more element pairs are stored, each of the one or more element pairs being a pair of a source language element and a target language element; and
a program,the program, when executed by the computer, causes the computer to function as;
a parser unit that parses the target language sentence contained in the one or more pairs of original and translated sentences, thereby acquiring a binary tree of the target language sentence, the binary tree having one or more target language partial structures, each of which indicates the order of two or more elements forming the target language sentence and contains a parent node having a phrase label, and two child nodes, each of which is a child node of the parent node and has a target language phrase label or a target language element;
a source language element acquiring unit that acquires, from the one or more element pairs in the element pair storage unit, one or more elements forming the source language sentence corresponding to the target language sentence, the one or more elements being one or more source language elements corresponding to target language elements that are child nodes at terminal ends of the one or more target language partial structures contained in the binary tree of the target language sentence;
a source language partial structure acquiring unit that applies a structure indicated by the one or more target language partial structures contained in the binary tree of the target language sentence, to the one or more source language elements forming the source language sentence, thereby acquiring one or more source language partial structures, each of which indicates the order of two or more elements forming the source language sentence and contains a parent node having a phrase label, and two child nodes, each of which is a child node of the parent node and has a phrase label or a source language POS tag;
a labeling unit that provides the one or more source language partial structures with a reordering label that is a label that enables to distinguish a source language partial structure, in which the order of two child nodes contained in a source language partial structure corresponding to a target language partial structure is different from the order of two child nodes contained in the target language partial structure from a source language partial structure, in which the order of two child nodes contained in a source language partial structure corresponding to a target language partial structure is the same as the order of two child nodes contained in the target language partial structure, thereby acquiring one or more labeled source language partial structures;
a model building unit that builds one or more parsing models each having appearance probability information regarding a labeled source language partial structure, using the one or more labeled source language partial structures;
an accumulating unit that causes the one or more non-transitory storage media to accumulate the one or more parsing models built by the model building unit; and
a reordering unit that acquires, using the one or more source language partial structures each indicating the order of two or more elements forming the source language sentence, one or more source language partial structures, in each of which the two or more elements forming the source language sentence are reordered such that the order is close enough to satisfy a predetermined condition with respect to the order of the elements in the target language sentence,wherein the labeling unit provides the reordering label to the one or more source language partial structures reordered by the reordering unit.
1 Assignment
0 Petitions
Accused Products
Abstract
In order to solve a conventional problem that an accurate translation cannot be realized, a learning apparatus includes: a parser unit parsing a target language sentence, thereby acquiring a binary tree of the target language sentence; a source language element acquiring unit acquiring one or more source language elements; a source language partial structure acquiring unit acquiring one or more source language partial structures each containing a parent node having a phrase label and two child nodes each having a phrase label or a source language element; a labeling unit providing a reordering label to the one or more source language partial structures; a model building unit that builds one or more parsing models each having appearance probability information regarding a labeled source language partial structure; and an accumulating unit accumulating a binary tree of a source language sentence having the one or more parsing models.
-
Citations
8 Claims
-
1. A learning apparatus, comprising:
-
one or more non-transitory storage media; and a computer, wherein; the one or more non-transitory storage media includes; a parallel corpus in which one or more pairs of original and translated sentences are stored, each of the one or more pairs having a source language sentence and a target language sentence that is a result obtained by translating the source language sentence; an element pair storage unit in which one or more element pairs are stored, each of the one or more element pairs being a pair of a source language element and a target language element; and a program, the program, when executed by the computer, causes the computer to function as; a parser unit that parses the target language sentence contained in the one or more pairs of original and translated sentences, thereby acquiring a binary tree of the target language sentence, the binary tree having one or more target language partial structures, each of which indicates the order of two or more elements forming the target language sentence and contains a parent node having a phrase label, and two child nodes, each of which is a child node of the parent node and has a target language phrase label or a target language element; a source language element acquiring unit that acquires, from the one or more element pairs in the element pair storage unit, one or more elements forming the source language sentence corresponding to the target language sentence, the one or more elements being one or more source language elements corresponding to target language elements that are child nodes at terminal ends of the one or more target language partial structures contained in the binary tree of the target language sentence; a source language partial structure acquiring unit that applies a structure indicated by the one or more target language partial structures contained in the binary tree of the target language sentence, to the one or more source language elements forming the source language sentence, thereby acquiring one or more source language partial structures, each of which indicates the order of two or more elements forming the source language sentence and contains a parent node having a phrase label, and two child nodes, each of which is a child node of the parent node and has a phrase label or a source language POS tag; a labeling unit that provides the one or more source language partial structures with a reordering label that is a label that enables to distinguish a source language partial structure, in which the order of two child nodes contained in a source language partial structure corresponding to a target language partial structure is different from the order of two child nodes contained in the target language partial structure from a source language partial structure, in which the order of two child nodes contained in a source language partial structure corresponding to a target language partial structure is the same as the order of two child nodes contained in the target language partial structure, thereby acquiring one or more labeled source language partial structures; a model building unit that builds one or more parsing models each having appearance probability information regarding a labeled source language partial structure, using the one or more labeled source language partial structures; an accumulating unit that causes the one or more non-transitory storage media to accumulate the one or more parsing models built by the model building unit; and a reordering unit that acquires, using the one or more source language partial structures each indicating the order of two or more elements forming the source language sentence, one or more source language partial structures, in each of which the two or more elements forming the source language sentence are reordered such that the order is close enough to satisfy a predetermined condition with respect to the order of the elements in the target language sentence, wherein the labeling unit provides the reordering label to the one or more source language partial structures reordered by the reordering unit. - View Dependent Claims (2, 3, 6, 7, 8)
-
-
4. A learning method using a computer and one or more non-transitory storage media, the one or more non-transitory storage media including:
-
a parallel corpus in which one or more pairs of original and translated sentences are stored, each of the one or more pairs having a source language sentence and a target language sentence that is a result obtained by translating the source language sentence; and an element pair storage unit in which one or more element pairs are stored, each of the one or more element pairs being a pair of a source language element and a target language element; the learning method, which is realized by the computer functioning a parser unit, a source language element acquiring unit, a source language partial structure acquiring unit, a labeling unit, a model building unit, an accumulating unit, and a reordering unit, comprising; a parsing step of the parser unit parsing the target language sentence contained in the one or more pairs of original and translated sentences, thereby acquiring a binary tree of the target language sentence, the binary tree having one or more target language partial structures, each of which indicates the order of two or more elements forming the target language sentence and contains a parent node having a phrase label, and two child nodes, each of which is a child node of the parent node and has a phrase label or a target language element; a source language element acquiring step of the source language element acquiring unit acquiring, from the one or more element pairs in the element pair storage unit, one or more elements forming the source language sentence corresponding to the target language sentence, the one or more elements being one or more source language elements corresponding to target language elements that are child nodes at terminal ends of the one or more target language partial structures contained in the binary tree of the target language sentence; a source language partial structure acquiring step of the source language partial structure acquiring unit applying a structure indicated by the one or more target language partial structures contained in the binary tree of the target language sentence, to the one or more source language elements forming the source language sentence, thereby acquiring one or more source language partial structures, each of which indicates the order of two or more elements forming the source language sentence and contains a parent node having a phrase label, and two child nodes, each of which is a child node of the parent node and has a phrase label or a source language element; a labeling step of the labeling unit providing the one or more source language partial structures with a reordering label that is a label that makes it possible to distinguish a source language partial structure in which the order of two child nodes contained in a source language partial structure corresponding to a target language partial structure is different from the order of two child nodes contained in the target language partial structure from a source language partial structure in which the order of two child nodes contained in a source language partial structure corresponding to a target language partial structure is the same as the order of two child nodes contained in the target language partial structure, thereby acquiring one or more labeled source language partial structures; a model building step of the model building unit building one or more parsing models each having appearance probability information regarding a labeled source language partial structure, using the one or more labeled source language partial structures; an accumulating step of the accumulating unit causing the one or more non-transitory storage media to accumulate the one or more parsing models built in the model building step; and a reordering step of the reordering unit acquiring, using the one or more source language partial structures each indicating the order of two or more elements forming the source language sentence, one or more source language partial structures, in each of which the two or more elements forming the source language sentence are reordered such that the order is close enough to satisfy a predetermined condition with respect to the order of the elements in the target language sentence, wherein, in the labeling step, the reordering label is provided to the one or more source language partial structures reordered by the reordering unit.
-
-
5. A non-transitory computer-accessible storage medium in which a program is stored,
the program, when executed by a computer: -
causes one or more non-transitory storage media to have; a parallel corpus in which one or more pairs of original and translated sentences are stored, each of the one or more pairs having a source language sentence and a target language sentence that is a result obtained by translating the source language sentence; and an element pair storage unit in which one or more element pairs are stored, each of the one of more element pairs being a pair of a source language element and a target language element; and causes the computer to function as; a parser unit that parses the target language sentence contained in the one or more pairs of original and translated sentences, thereby acquiring a binary tree of the target language sentence, the binary tree having one or more target language partial structures, each of which indicates the order of two or more elements forming the target language sentence and contains a parent node having a phrase label, and two child nodes, each of which is a child node of the parent node and has a phrase label or a target language element; a source language element acquiring unit that acquires, from the one or more element pairs in the element pair storage unit, one or more elements forming the source language sentence corresponding to the target language sentence, the one or more elements being one or more source language elements corresponding to target language elements that are child nodes at terminal ends of the one or more target language partial structures contained in the binary tree of the target language sentence; a source language partial structure acquiring unit that applies a structure indicated by the one or more target language partial structures contained in the binary tree of the target language sentence, to the one or more source language elements forming the source language sentence, thereby acquiring one or more source language partial structures each of which indicates the order of two or more elements forming the source language sentence and contains a parent node having a phrase label, and two child nodes, each of which is a child node of the parent node and has a phrase label or a source language element; a labeling unit that provides the one or more source language partial structures with a reordering label that is a label that makes it possible to distinguish a source language partial structure in which the order of two child nodes contained in a source language partial structure corresponding to a target language partial structure is different from the order of two child nodes contained in the target language partial structure from a source language partial structure in which the order of two child nodes contained in a source language partial structure corresponding to a target language partial structure is the same as the order of two child nodes contained in the target language partial structure, thereby acquiring one or more labeled source language partial structures; a model building unit that builds one or more parsing models each having appearance probability information regarding a labeled source language partial structure, using the one or more labeled source language partial structures; and an accumulating unit that causes the one or more non-transitory storage media to accumulate the one or more parsing models built by the model building unit, wherein the one or more non-transitory storage media further includes; a statistical model storage unit in which a CFG rule statistical model is stored, the statistical model containing a parent node having a phrase label, and two child nodes each of which is a child node of the parent node and has a phrase label or a source language part-of-speech tag, and the source language partial structure acquiring unit includes; a source language partial structure acquiring part that applies a structure indicated by the one or more target language partial structures contained in the binary tree of the target language sentence, to the one or more source language elements forming the source language sentence, thereby acquiring one or more source language partial structures, each of which indicates the order of two or more elements forming the source language sentence and contains a parent node having a phrase label, and two child nodes, each of which is a child node of the parent node and has a phrase label or a source language element; and a partial structure complementing part that, in a case where there is an incomplete source language partial structure among the one or more source language partial structures acquired by the source language partial structure acquiring part, applies the statistical model to the source language partial structure, thereby acquiring a complete source language partial structure.
-
Specification