Example based machine translation system
First Claim
1. A method of performing machine translation of a source language (SL) input to a translation output in a target language (TL), comprising:
- matching fragments of the SL input to SL fragments of examples in an example base;
identifying all matched blocks in the SL input as blocks of terms in the SL input that are matched by one or more SL fragments in an example;
selecting block combinations of the matched blocks to cover one or more fragments of the SL input;
for each block in the selected block combinations, identifying an example associated with the block;
aligning TL portions of the identified example with SL portions of the identified example that match the one or more fragments of the SL input; and
providing the translation output based on the aligned portions wherein identifying an example associated with a block comprises;
calculating a block score corresponding to each example containing the block by calculating the block score as follows;
Where,TFIDF is term frequency inverse document frequency;
K=a total number of common terms included both in example j and the SL input;
TFIDFkj=Term k'"'"'s TF/IDF weight in example j; and
Similarityj=matching weight between the example j and the SL input; and
identifying the example associated with the block based on the block score.
2 Assignments
0 Petitions
Accused Products
Abstract
The present invention performs machine translation by matching fragments of a source language sentence to be translated to source language portions of an example in example base. When all relevant examples have been identified in the example base, the examples are subjected to phrase alignment in which fragments of the target language sentence in each example are aligned against the matched fragments of the source language sentence in the same example. A translation component then substitutes the aligned target language phrases from the matched examples for the matched fragments in the source language sentence.
108 Citations
21 Claims
-
1. A method of performing machine translation of a source language (SL) input to a translation output in a target language (TL), comprising:
-
matching fragments of the SL input to SL fragments of examples in an example base; identifying all matched blocks in the SL input as blocks of terms in the SL input that are matched by one or more SL fragments in an example; selecting block combinations of the matched blocks to cover one or more fragments of the SL input; for each block in the selected block combinations, identifying an example associated with the block; aligning TL portions of the identified example with SL portions of the identified example that match the one or more fragments of the SL input; and providing the translation output based on the aligned portions wherein identifying an example associated with a block comprises; calculating a block score corresponding to each example containing the block by calculating the block score as follows; Where, TFIDF is term frequency inverse document frequency; K=a total number of common terms included both in example j and the SL input; TFIDFkj=Term k'"'"'s TF/IDF weight in example j; and Similarityj=matching weight between the example j and the SL input; and identifying the example associated with the block based on the block score. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
where, ConL;
is the translation confidence level;c1,c2, . . . ,c4;
are constants,AlignCon;
is an alignment confidence level;TransPercent;
is a weighted translation percentage;Example13num;
is an employed example number identifying the identified example;Valid13block13num;
is a fragment number in a possible TLTranslation under consideration; PhrSL;
is a SL phrase that relates to a given input string;PhrTL;
is a TL correspondence in the possible translation of the SL input;|PhrTL|;
is a word number of PhrTL;Ci . . . j;
is a connection between SL word i and TL word j; andConf(Ci . . . j);
is the translation confidence level of word Alignment.
-
-
5. The method of claim 3 and further comprising:
identifying portions of the translation output that require a user'"'"'s attention.
-
6. The method of claim 1 wherein matching fragments of the SL input to fragments of examples comprises:
-
identifying bi-terms in the SL input; and accessing a bi-term index of the example base that includes example identifiers identifying examples that contain indexed bi-terms.
-
-
7. The method of claim 6 wherein accessing a bi-term index comprises:
accessing a bi-term index of the example base that includes word position information indicative of a word position in the example where the bi-term resides.
-
8. The method of claim 7 wherein accessing a bi-term index comprises:
accessing a bi-term index of the example base that includes a score indicative of a term frequency/inverse document frequency (TF/IDF) score for the bi-term in the example.
-
9. The method of claim 8 wherein accessing a bi-term index comprises:
accessing a bi-term index of the example base that includes a corpus score indicative of a representative TF/IDF score for the bi-term across the example base.
-
10. The method of claim 1 wherein aligning TL portions of the example with the SL portions comprises:
-
performing word alignment to identify anchor alignment points between the SL portion and the TL portion of the example; finding all continuous alignments between the TL portion and the SL portion based on the anchor alignment points; and finding all non-continuous alignments between the TL portion and the SL portion based on the anchor alignment points.
-
-
11. The method of claim 1 wherein selecting block combinations comprises:
-
calculating a block combination score for different combinations of the identified blocks; and identifying N best block combinations based on the block combination scores.
-
-
12. The method of claim 11 wherein calculating a block combination score comprises:
where, i=an “
edge”
(block) index number in the SL input;m=a word indexing number of the “
edge”
i'"'"'s starting point;n=a word indexing number of the “
edge”
i'"'"'s ending point;k=a word indexing number of the “
edge”
i'"'"'s each term;TFIDFk=term k'"'"'s average TF/IDF weight in the example base; and EdgeLeni=a weight of block i.
-
13. A method of performing machine translation of a source language (SL) input to a translation output in a target language (TL), comprising:
-
matching fragments of the SL input to SL fragments of examples in an example base; identifying all matched blocks in the SL input as blocks of terms in the SL input that are matched by one or more SL fragments in an example; selecting block combinations of the matched blocks to cover one or more fragments of the SL input; for each block in the selected block combinations calculating a block score corresponding to each example containing the block, and, identifying an example associated with the block based on the block score; aligning TL portions of the identified example with SL portions of the identified example that match the one or more fragments of the SL input; providing the translation output as a plurality of possible translation outputs based on the aligned portions; and calculating a confidence measure for each translation output, as a translation confidence level, as follows; where, ConL;
is the translation confidence level;c1,c2, . . . ,c4;
are constants,AlignCon;
is an alignment confidence level;TransPercent;
is a weighted translation percentage;Example13num;
is an employed example number identifying the identified example;Valid13block13num;
is a fragment number in a possible TL Translation under consideration;PhrSL;
is a SL phrase that relates to a given input string;PhrTL;
is a TL correspondence in the possible translation of the SL input;|PhrTL|;
is a word number of PhrTL;Ci . . . j;
is a connection between SL word i and TL word j; andConf(Ci . . . j);
is the translation confidence level of word Alignment.- View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21)
where, i=an “
edge”
(block) index number in the SL input;m=a word indexing number of the “
edge”
i'"'"'s starting point;n=a word indexing number of the “
edge”
i'"'"'s ending point;k=a word indexing number of the “
edge”
i'"'"'s each term;TFIDFk=term k'"'"'s average TF/IDF weight in the example base; and EdgeLeni=a weight of block i.
-
Specification