Method and apparatus for evaluating machine translation quality
First Claim
1. A method for computing machine translation performance, comprising:
- receiving a sequence of natural language data in a first language;
translating the sequence of natural language data to a second language to define a machine translation of the sequence of natural language data comprising symbols;
receiving a reference translation of the sequence of natural language data in the second language comprising symbols;
with a computer processor, computing a sequence kernel that provides a similarity measure between the machine translation and the reference translation based on occurrences of subsequences that are shared by the machine translation and the reference translation, for a selected subsequence length, including performing an inner product in a feature space of all possible subsequences of a selected subsequence length;
outputting a signal indicating the similarity measure;
wherein the similarity measure accounts for non-contiguous occurrences of subsequences of the selected subsequence length that are shared between the machine translation and the reference translation, in which the non-contiguous subsequences share symbols and comprise a gap of at least one symbol which has been determined not to match a symbol in the gap of the other.
9 Assignments
0 Petitions
Accused Products
Abstract
Quality of machine translation of natural language is determined by computing a sequence kernel that provides a measure of similarity between a first sequence of symbols representing a machine translation in a target natural language and a second sequence of symbols representing a reference translation in the target natural language. The measure of similarity takes into account the existence of non-contiguous subsequences shared by the first sequence of symbols and the second sequence of symbols. When the similarity measure does not meet an acceptable threshold level, the translation model of the machine translator may be adjusted to improve subsequent translations performed by the machine translator.
84 Citations
22 Claims
-
1. A method for computing machine translation performance, comprising:
-
receiving a sequence of natural language data in a first language; translating the sequence of natural language data to a second language to define a machine translation of the sequence of natural language data comprising symbols; receiving a reference translation of the sequence of natural language data in the second language comprising symbols; with a computer processor, computing a sequence kernel that provides a similarity measure between the machine translation and the reference translation based on occurrences of subsequences that are shared by the machine translation and the reference translation, for a selected subsequence length, including performing an inner product in a feature space of all possible subsequences of a selected subsequence length; outputting a signal indicating the similarity measure; wherein the similarity measure accounts for non-contiguous occurrences of subsequences of the selected subsequence length that are shared between the machine translation and the reference translation, in which the non-contiguous subsequences share symbols and comprise a gap of at least one symbol which has been determined not to match a symbol in the gap of the other. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A method for computing machine translation performance, comprising:
-
receiving into computer readable memory a sequence of natural language data in a first language; translating the sequence of natural language data to a second language to define a machine translation of the sequence of natural language data; receiving a reference translation of the sequence of natural language data in the second language; computing a sequence kernel that provides a similarity measure between the machine translation and the reference translation which includes performing an inner product in a feature space of all possible subsequences of a selected subsequence length; outputting a signal indicating the similarity measure; wherein the similarity measure accounts for non-contiguous occurrences of subsequences shared between the machine translation and the reference translation and wherein non-consecutive sequences with gaps, the gaps comprising at least one symbol, are scored differently depending on what symbol or symbols occupies the gaps.
-
-
20. A method for computing machine translation performance, comprising:
-
receiving a sequence of natural language data in a first language; translating the sequence of natural language data to a second language to define a machine translation of the sequence of natural language data comprising symbols; receiving a reference translation of the sequence of natural language data in the second language comprising symbols; computing a sequence kernel that provides a similarity measure between the machine translation and the reference translation; outputting a signal indicating the similarity measure; wherein the similarity measure accounts for non-contiguous occurrences of subsequences shared between the machine translation and the reference translation, in which the non-contiguous subsequences share symbols and comprise a gap of at least one symbol which has been determined not to match a symbol in the gap of the other, and wherein the sequence kernel is given by the following equation;
-
-
21. A system for computing machine translation performance, comprising:
-
means for receiving a sequence of natural language data in a first language; means for translating the sequence of natural language data to a second language to define a machine translation of the sequence of natural language data comprising symbols; means for receiving a reference translation of the sequence of natural language data in the second language comprising symbols; means including a computer processor for computing a sequence kernel that provides a similarity measure between the machine translation and the reference translation which includes performing an inner product in a feature space of all possible subsequences of a selected subsequence length; means for outputting a signal indicating the similarity measure; wherein the similarity measure computed by said computing means accounts for non-contiguous occurrences of subsequences shared between the machine translation and the reference translation in which the non-contiguous subsequences share symbols and comprise a gap of at least one symbol which has been determined not to match a symbol in the gap of the other.
-
-
22. An article of manufacture for use in a machine comprising:
-
a) a memory; b) instructions stored in the memory for a method of computing machine translation performance, the method comprising; receiving a sequence of natural language data in a first language; translating the sequence of natural language data to a second language to define a machine translation of the sequence of natural language data comprising symbols; receiving a reference translation of the sequence of natural language data in the second language comprising symbols; computing a sequence kernel that provides a similarity measure between the machine translation and the reference translation which includes performing an inner product in a feature space of all possible subsequences of a selected subsequence length; outputting a signal indicating the similarity measure; wherein the similarity measure accounts for non-contiguous occurrences of subsequences shared between the machine translation and the reference translation, in which the non-contiguous subsequences share symbols and comprise a gap of at least one symbol which has been determined not to match a symbol in the gap of the other.
-
Specification