Method and apparatus for developing a transfer dictionary used in transfer-based machine translation system
First Claim
1. A method of generating a transfer dictionary used in a transfer-based translation machine system, the transfer dictionary storing a pair of the source/target language sentences and corresponding source/target language structure information, the method comprising the steps of:
- receiving from a user a pair of source/target language sentences, wherein said source language sentence comprises at least one idiom, at least one argument and at least one collocation and the collocation and the idiom therein are marked, respectively, and said target language sentence comprises a target language translation for the idiom and the source language word(s) for the argument;
parsing the source language sentence to generate a corresponding source language syntactic tree data structure including a part-of-speech and syntactic information of each word in the source language sentence;
extracting, from said source language syntactic tree data structure, nodes corresponding to the idiom, the collocation and the argument, respectively, in said source language sentence;
calculating a least common ancestor node of the extracted nodes;
generating source language structure information based on said source language syntactic tree data structure, wherein said language structure information starts from the information of the least common ancestor node;
generating target language structure information by adding the part-of-speech information to each morpheme in said target language sentence and by replacing each source language word in said target language with the corresponding syntactic information within said source language syntactic tree data structure; and
storing, as a new entry, the received source/language sentences and the generated source/target language structure information in the transfer dictionary.
1 Assignment
0 Petitions
Accused Products
Abstract
Generating a transfer dictionary used in a transfer-based translation machine system. A pair of source/target language sentences are received. The source language sentence comprises at least one marked idiom, at least one argument and at least one marked collocation. The target language sentence comprises the target language translation for the idiom and the source language word(s) for the argument. The source language sentence is parsed to generate a source language syntactic tree. Nodes are extracted from the source language syntactic tree. A least common ancestor node of the extracted nodes is calculated and source language structure information is generated based on the source language syntactic tree data structure. Target language structure information is generated by adding the part-of-speech information to each morpheme in the target language sentence and by replacing each source language word in the target language with the corresponding syntactic information within the source language syntactic tree.
27 Citations
9 Claims
-
1. A method of generating a transfer dictionary used in a transfer-based translation machine system, the transfer dictionary storing a pair of the source/target language sentences and corresponding source/target language structure information, the method comprising the steps of:
-
receiving from a user a pair of source/target language sentences, wherein said source language sentence comprises at least one idiom, at least one argument and at least one collocation and the collocation and the idiom therein are marked, respectively, and said target language sentence comprises a target language translation for the idiom and the source language word(s) for the argument;
parsing the source language sentence to generate a corresponding source language syntactic tree data structure including a part-of-speech and syntactic information of each word in the source language sentence;
extracting, from said source language syntactic tree data structure, nodes corresponding to the idiom, the collocation and the argument, respectively, in said source language sentence;
calculating a least common ancestor node of the extracted nodes;
generating source language structure information based on said source language syntactic tree data structure, wherein said language structure information starts from the information of the least common ancestor node;
generating target language structure information by adding the part-of-speech information to each morpheme in said target language sentence and by replacing each source language word in said target language with the corresponding syntactic information within said source language syntactic tree data structure; and
storing, as a new entry, the received source/language sentences and the generated source/target language structure information in the transfer dictionary.
-
-
2. The method of 1, further comprising the step of determining a validity of said source language structure information by checking if it forms a phrase.
-
3. The method of 2, wherein the step of determining includes the steps of checking whether a parent node of each argument node extracted from the source language syntactic tree is an idiom node or not and checking whether there are more than two idiom nodes of which the parent node is not an idiom node.
-
4. The method of 1, further comprising the step of checking a redundancy of the generated source/target language structure information by retrieving all the entries having the same idiom with that of the generated source language structure information and checking if there already exists the same target language structure information with the generated target language structure information.
-
5. An apparatus for generating a transfer dictionary used in a transfer-based translation machine system, the transfer dictionary storing a pair of the source/target language sentences and corresponding source/target language structure information, comprising:
-
a receiving unit for receiving from a user a pair of source/target language sentences, wherein said source language sentence comprises at least one idiom, at least one argument and at least one collocation and the collocation and the idiom therein are marked, respectively, and said target language sentence comprises a target language translation for the idiom and the source language word(s) for the argument;
a parsing unit for parsing the source language sentence to generate a corresponding source language syntactic tree data structure including a part-of-speech and syntactic information of each word in the source language sentence;
an extracting unit for extracting, from said source language syntactic tree data structure, nodes corresponding to the idiom, the collocation and the argument, respectively, in said source language sentence;
a calculating unit for calculating a least common ancestor node of the extracted nodes;
a source language structure generating unit for generating source language structure information based on said source language syntactic tree data structure, wherein said language structure information starts from the information of the least common ancestor node;
a target language structure generating unit for generating target language structure information by adding the part-of-speech information to each morpheme in said target language sentence and by replacing each source language word in said target language with the corresponding syntactic information within said source language syntactic tree data structure; and
a transfer dictionary for storing, as a new entry, the received source/language sentences and the generated source/target language structure information.
-
-
6. The apparatus of 5, further comprising a determining unit for determining a validity of said source language structure information by checking if it forms a phrase.
-
7. The apparatus of 6, wherein said determining unit includes a first checking unit for checking whether a parent node of each argument node extracted from the source language syntactic tree is an idiom node or not and a second checking unit for checking whether there are more than two idiom nodes of which the parent node is not an idiom node.
-
8. The apparatus of 5, further comprising a redundancy-checking unit for checking a redundancy of the generated source/target language structure information by retrieving all the entries having the same idiom with that of the generated source language structure information and checking if there already exists the same target language structure information with the generated target language structure information.
-
9. A computer-readable medium stored thereon program instructions executable by the computer to perform the method of generating a transfer dictionary used in a transfer-based translation machine system, the transfer dictionary storing a pair of the source/target language sentences and corresponding source/target language structure information, the method comprising the steps of:
-
receiving from a user a pair of source/target language sentences, wherein said source language sentence comprises at least one idiom, at least one argument and at least one collocation and the collocation and the idiom therein are marked, respectively, and said target language sentence comprises a target language translation for the idiom and the source language word(s) for the argument;
parsing the source language sentence to generate a corresponding source language syntactic tree data structure including a part-of-speech and syntactic information of each word in the source language sentence;
extracting, from said source language syntactic tree data structure, nodes corresponding to the idiom, the collocation and the argument, respectively, in said source language sentence;
calculating a least common ancestor node of the extracted nodes;
generating source language structure information based on said source language syntactic tree data structure, wherein said language structure information starts from the information of the least common ancestor node;
generating target language structure information by adding the part-of-speech information to each morpheme in said target language sentence and by replacing each source language word in said target language with the corresponding syntactic information within said source language syntactic tree data structure; and
storing, as a new entry, the received source/language sentences and the generated source/target language structure information in the transfer dictionary.
-
Specification