Statistical method and apparatus for learning translation relationships among phrases
First Claim
1. A method of identifying a translation relationship between a phrase in a source language and a phrase in a target language, comprising:
- receiving access to an aligned pair of multiple word units, one being a source unit in the source language and another being a target unit in the target language, the source language phrase being identified in the source unit;
generating at least one candidate phrase in the target unit, the candidate phrase being a hypothesized translation of the source language phrase;
calculating a score for each candidate phrase including an inside component based on associations between words inside the source language phrase and candidate phrase and an outside component based on associations between words outside the source language phrase and candidate phrase; and
identifying the translation relationship between the source language phrase and candidate phrase based on the score.
2 Assignments
0 Petitions
Accused Products
Abstract
The present invention learns phrase translation relationships by receiving a parallel aligned corpus with phrases to be learned identified in a source language. Candidate phrases in a target language are generated and an inside score is calculated based on word association scores for words inside the source language phrase and candidate phrase. An outside score is calculated based on word association scores for words outside the source language phrase and candidate phrase. The inside and outside scores are combined to obtain a joint score.
-
Citations
33 Claims
-
1. A method of identifying a translation relationship between a phrase in a source language and a phrase in a target language, comprising:
-
receiving access to an aligned pair of multiple word units, one being a source unit in the source language and another being a target unit in the target language, the source language phrase being identified in the source unit;
generating at least one candidate phrase in the target unit, the candidate phrase being a hypothesized translation of the source language phrase;
calculating a score for each candidate phrase including an inside component based on associations between words inside the source language phrase and candidate phrase and an outside component based on associations between words outside the source language phrase and candidate phrase; and
identifying the translation relationship between the source language phrase and candidate phrase based on the score. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28)
-
-
29. A system for identifying phrase translations in multi-word target units for identified source language phrases in multi-word source units, comprising:
an individual word association model configured to generate one or more candidate phrases and a score for each candidate phrase based on word associations between words inside the source and target language phrases and word associations between words outside the source and target language phrases. - View Dependent Claims (30, 31)
-
32. A method of generating candidate phrases in multi-word target units in a target language as hypothesized translations of an identified phrase in a multi-word source unit in a source language, comprising:
-
identifying first target language words in the target unit that are most strongly associated with a word in the source language phrase;
identifying second target language words in the target unit that have a word in the source language phrase that is most strongly associated it; and
generating the candidate phrases as phrases that begin and end with first or second target language words. - View Dependent Claims (33)
-
Specification