METHOD FOR ALIGNING SENTENCES AT THE WORD LEVEL ENFORCING SELECTIVE CONTIGUITY CONSTRAINTS
First Claim
1. An alignment method comprising:
- for a source sentence in a source language, identifying whether the sentence includes at least one candidate term comprising a contiguous subsequence of words of the source sentence;
aligning a target sentence in a target language with the source sentence, comprising;
developing a probabilistic model which models conditional probability distributions for alignments between words of the source sentence and words of the target sentence; and
generating an optimal alignment based on the probabilistic model, including, where the source sentence includes the at least one candidate term, enforcing a contiguity constraint which requires that all the words of the target sentence which are aligned with an identified candidate term form a contiguous subsequence of the target sentence.
1 Assignment
0 Petitions
Accused Products
Abstract
An alignment method includes, for a source sentence in a source language, identifying whether the sentence includes at least one candidate term comprising a contiguous subsequence of words of the source sentence. A target sentence in a target language is aligned with the source sentence. This includes developing a probabilistic model which models conditional probability distributions for alignments between words of the source sentence and words of the target sentence and generating an optimal alignment based on the probabilistic model, including, where the source sentence includes the at least one candidate term, enforcing a contiguity constraint which requires that all the words of the target sentence which are aligned with an identified candidate term form a contiguous subsequence of the target sentence.
149 Citations
23 Claims
-
1. An alignment method comprising:
-
for a source sentence in a source language, identifying whether the sentence includes at least one candidate term comprising a contiguous subsequence of words of the source sentence; aligning a target sentence in a target language with the source sentence, comprising; developing a probabilistic model which models conditional probability distributions for alignments between words of the source sentence and words of the target sentence; and generating an optimal alignment based on the probabilistic model, including, where the source sentence includes the at least one candidate term, enforcing a contiguity constraint which requires that all the words of the target sentence which are aligned with an identified candidate term form a contiguous subsequence of the target sentence. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
-
-
21. A system comprising:
-
a sentence aligner which aligns sentences of a target document in a target language with respective sentences of a source document in a source language; a source term tagger which tags terms of each source sentence which meet criteria for at least one class of candidate terms, each of the candidate terms comprising a contiguous subsequence of words of the source sentence; a word aligner which, for a pair of sentences aligned by the sentence aligner, generates an alignment between the words of a target sentence and the words of the source sentence, the word aligner using a probabilistic model which models conditional probability distributions for alignments between words of the source sentence and words of the target sentence and generating an optimal alignment based on the probabilistic model, the word aligner enforcing a contiguity constraint which requires that all the words of the target sentence which are aligned with one of the candidate terms identified by the term tagger form a contiguous subsequence of the target sentence. - View Dependent Claims (22)
-
-
23. A method of generating a terminological lexicon comprising:
-
providing a parallel corpus, the corpus comprising source sentences in a source language and target sentences in target language; providing for the identifying of noun phrases in the source sentences but not in the target sentences; for each of plurality of source sentences, generating an alignment in which words of a respective target sentence are aligned with words of the source sentence, whereby words of the target sentence which are aligned with a selected noun phrase are identified, wherein in generating the alignment, a contiguity constraint is enforced which requires that all the words of the target sentence which are aligned with the selected noun phrase form a contiguous subsequence of words of the target sentence; optionally, where a plurality of contiguous subsequences of aligned target sentences are aligned with a common noun phrase, filtering the contiguous subsequences to remove contiguous subsequences which are less probable translations of the noun phrase; and incorporating the noun phrase together with at least one identified contiguous sequence which has been aligned with the noun phrase in a terminological lexicon.
-
Specification