Machine translation device and machine translation method in which a syntax conversion model and a word translation model are combined
First Claim
1. A statistical machine translation device, comprising:
- a language model generator configured to generate a language model by extracting a creation probability of a language from a single corpus configured by a target language;
a syntax conversion knowledge extractor configured to;
extract syntax conversion knowledge for the target language by using word reordering information between a source language and the target language in a plurality of parallel corpora that does not include the single corpus, and syntax analysis information of the source language, andcalculate a syntax conversion probability with respect to the syntax conversion knowledge corresponding to the plurality of parallel corpora that does not include the single corpus;
a word translation knowledge extractor configured to;
extract word translation knowledge by using the word reordering information and the syntax analysis information, andcalculate a word translation probability with respect to the word translation knowledge based on a feature function in which a predetermined constraint condition is defined in the word reordering information and the syntax analysis information;
a translation model learning device configured to generate a syntax conversion model and a word translation model by learning the syntax conversion knowledge, the word translation knowledge, the syntax conversion probability and the word translation probability; and
a translated sentence generator configured to;
decode a source sentence into the target sentence by applying the syntax conversion model and the word translation model; and
generate a target vocabulary string having a high probability into a final translation sentence by combining the syntax conversion probability and the creation probability,wherein the syntax conversion knowledge extractor includes;
a tree generator configured to generate a target tree of the target language by using the word reordering information and the syntax analysis information,a tree node reorderer configured to reorder nodes based on the target tree and a source tree depending on the syntax analysis information of the source language,a tree conversion knowledge extractor configured to extract the syntax conversion knowledge of a sub-tree at each reordered node of the target tree and the source tree, anda probability calculator configured to calculate the syntax conversion probability with respect to the syntax conversion knowledge,wherein the feature function is a function configured to constrain, from a syntax of the target language and a syntax of the source language, and intersyntax arrangement information between the syntax of the target language and the syntax of the source language;
a part of speech string of the target language, anda translation order of words included in the source language, and output the constrained part of speech string and translation order as a feature.
2 Assignments
0 Petitions
Accused Products
Abstract
The present invention relates to statistical machine translation, and provides a machine translation device and a machine translation method that acquire a creation probability for a target language from a single corpus while extracting respective conversion probabilities by extracting syntax conversion knowledge and word translation knowledge from a parallel corpus, model a weighted translation model by allowing each of the conversion knowledge and each of the probabilities to learn using a translation model learning device, and generate a target sentence through decoding processes of a syntax converter and a word translator by applying the translation model to a source sentence input in real time, thereby resolving disadvantages of the existing phrase-based SMT and syntax-based SMT and combining advantages thereof.
-
Citations
6 Claims
-
1. A statistical machine translation device, comprising:
-
a language model generator configured to generate a language model by extracting a creation probability of a language from a single corpus configured by a target language; a syntax conversion knowledge extractor configured to; extract syntax conversion knowledge for the target language by using word reordering information between a source language and the target language in a plurality of parallel corpora that does not include the single corpus, and syntax analysis information of the source language, and calculate a syntax conversion probability with respect to the syntax conversion knowledge corresponding to the plurality of parallel corpora that does not include the single corpus; a word translation knowledge extractor configured to; extract word translation knowledge by using the word reordering information and the syntax analysis information, and calculate a word translation probability with respect to the word translation knowledge based on a feature function in which a predetermined constraint condition is defined in the word reordering information and the syntax analysis information; a translation model learning device configured to generate a syntax conversion model and a word translation model by learning the syntax conversion knowledge, the word translation knowledge, the syntax conversion probability and the word translation probability; and a translated sentence generator configured to; decode a source sentence into the target sentence by applying the syntax conversion model and the word translation model; and generate a target vocabulary string having a high probability into a final translation sentence by combining the syntax conversion probability and the creation probability, wherein the syntax conversion knowledge extractor includes; a tree generator configured to generate a target tree of the target language by using the word reordering information and the syntax analysis information, a tree node reorderer configured to reorder nodes based on the target tree and a source tree depending on the syntax analysis information of the source language, a tree conversion knowledge extractor configured to extract the syntax conversion knowledge of a sub-tree at each reordered node of the target tree and the source tree, and a probability calculator configured to calculate the syntax conversion probability with respect to the syntax conversion knowledge, wherein the feature function is a function configured to constrain, from a syntax of the target language and a syntax of the source language, and intersyntax arrangement information between the syntax of the target language and the syntax of the source language; a part of speech string of the target language, and a translation order of words included in the source language, and output the constrained part of speech string and translation order as a feature.
-
-
2. A translation sentence generating apparatus, comprising:
-
a syntax converter configured to; analyze a syntax of a source sentence, extract syntax conversion knowledge of a target sentence from the analyzed syntax of the source sentence, and calculate a syntax conversion probability with respect to the syntax conversion knowledge based on a plurality of parallel corpora that does not include a first corpus of a target language; a feature extractor configured to extract a feature constraining a part of speech string, a constraint of the word order and a translation order of the source sentence based on word reordering information between a source language and the target language, syntax analysis information of the source sentence, and the syntax conversion knowledge of the target sentence; a translation option constraining device configured to constrain a translation option from the part of speech string and the translation order; a translation distortion constraining device configured to rearrange the translation order by the constraint of the word order; a hypothesis searcher configured to search a hypothesis by reflecting the constrained translation option and the rearranged translation order to the feature; a tracker configured to select a target vocabulary string having a creation probability for creation of same target vocabulary string with respect to the hypothesis, wherein the creation probability is calculated based on the first corpus of the target language; a probability calculator configured to; generate a combined probability by combining the creation probability of the selected target vocabulary string with the syntax conversion probability, and generate a target vocabulary string having the highest combined probability into a translation sentence, wherein the syntax converter includes; a tree generator configured to generate a target tree of the target language by using the word reordering information and the syntax analysis information, a tree node reorderer configured to reorder nodes based on the target tree and a source tree depending on the syntax analysis information, a tree conversion knowledge extractor configured to extract the syntax conversion knowledge of a sub-tree at each reordered node of the target tree and the source tree.
-
-
3. A method for constructing a translation model, the method performed by a machine translation device including a processor and comprising:
-
generating a language model by extracting a creation probability of a language from a single corpus configured by a target language; generating a syntax tree of the target language by using word reordering information between a source language and the target language in a plurality of parallel corpora that does not include the single corpus; arranging nodes based on the syntax tree and syntax analysis information of the source language; extracting syntax conversion knowledge of a sub-tree at each node of the arranged nodes; extracting word translation knowledge by using the word reordering information and the syntax analysis information; calculating a syntax conversion probability with respect to the syntax conversion knowledge and a word translation probability with respect to the word translation knowledge by applying a feature function in which a predetermined constraint condition is defined in the word reordering information and syntax analysis information of the target language; generating a target vocabulary string having a high probability into a final translation sentence by combining the syntax conversion probability and the creation probability; making a weight to be learned with respect to the syntax conversion probability and word translation probability, wherein the feature function uses a function; constraining a part of speech string of the target language and a translation order of words included in the source language from a syntax of the target language and a syntax of the source language, and intersyntax arrangement information, and outputting the constrained part of speech string and translation order as the feature.
-
-
4. A non-transitory computer-readable recording medium recording a computer program for translating a source sentence, the computer program comprising:
-
analyzing a syntax of the source sentence; generating a syntax tree of a target language by using word reordering information between a source language and the target language in the plurality of parallel corpora that does not include a first corpus; arranging nodes based on the syntax tree and the analyzed syntax analysis information of the source sentence; extracting syntax conversion knowledge of a sub-tree at each node of the arranged nodes; calculating a syntax conversion probability with respect to the syntax conversion knowledge; extracting a feature to constrain a part of speech string, a constraint of the word order and a translation order of the source sentence based on the word reordering information between the source language and the target language, the syntax analysis information of the source sentence, and the syntax conversion knowledge; generating a constraint of a translation option from the part of speech string and the translation order; rearranging the translation order by the constraint of the word order; searching a hypothesis by reflecting the constraint of the translation option and the rearranged translation order to the feature; selecting a target vocabulary string having a creation probability for creation of same target vocabulary string with respect to the hypothesis from the first corpus configured by a target language; combining the creation probability with the syntax conversion probability; and generating a target vocabulary string having the highest probability among the creation probability and the syntax conversion probability into a translation sentence.
-
-
5. A method for translating a source sentence, the method performed by a machine translation device including a processor and comprising:
-
analyzing a syntax of the source sentence; generating a syntax tree of a target language by using word reordering information between a source language and the target language in the plurality of parallel corpora that does not include a first corpus; arranging nodes based on the syntax tree and the analyzed syntax analysis information of the source sentence; extracting syntax conversion knowledge of a sub-tree at each node of the arranged nodes; calculating a syntax conversion probability with respect to the syntax conversion knowledge; extracting a feature to constrain a part of speech string, a constraint of the word order and a translation order of the source sentence based on word reordering information between the source language and the target language, the syntax analysis information of the source sentence, and the syntax conversion knowledge; generating a constraint of a translation option from the part of speech string and the translation order; rearranging the translation order by the constraint of the word order; searching a hypothesis by reflecting the constraint of the translation option and the rearranged translation order to the feature; selecting a target vocabulary string having a creation probability for creation of same target vocabulary string with respect to the hypothesis from the first corpus configured by a target language; combining the creation probability with the syntax conversion probability; and generating a target vocabulary string having the highest probability into a translation sentence. - View Dependent Claims (6)
-
Specification