Apparatus and methods for aligning words in bilingual sentences
First Claim
1. A method for aligning words of natural language sentences, comprising:
- receiving a corpus of aligned source sentences f=f1 . . . fi . . . fl composed of source words f1, . . . fl, and target sentences e=e1 . . . ej . . . eJ composed of target words e1, . . . eJ;
the source sentences being in a first natural language and the target sentences being in a second natural language;
producing a translation matrix M with association measures mij;
each association measure mij in the translation matrix providing a valuation of association strength between each source word fi and each target word ej;
producing one or more of an alignment matrix A and cepts that link aligned source and target words;
the alignment matrix and cepts defining a proper N;
M alignment between source and target words by satisfying coverage and transitive closure;
wherein coverage is satisfied when each source word is aligned with at least one target word and each target word is aligned to at least one source word; and
wherein transitive closure is satisfied if when source word fi is aligned to target words ej and el, and source word fk is aligned to target word el, then source word fk is also aligned to target word ej.
1 Assignment
0 Petitions
Accused Products
Abstract
Methods are disclosed for performing proper word alignment that satisfy constraints of coverage and transitive closure. Initially, a translation matrix which defines word association measures between source and target words of a corpus of bilingual translations of source and target sentences is computed. Subsequently, in a first method, the association measures in the translation matrix are factorized and orthogonalized to produce cepts for the source and target words, which resulting matrix factors may then be, optionally, multiplied to produce an alignment matrix. In a second method, the association measures in the translation matrix are thresholded, and then closed by transitivity, to produce an alignment matrix, which may then be, optionally, factorized to produce cepts. The resulting cepts or alignment matrices may then be used by any number of natural language applications for identifying words that are properly aligned.
-
Citations
21 Claims
-
1. A method for aligning words of natural language sentences, comprising:
-
receiving a corpus of aligned source sentences f=f1 . . . fi . . . fl composed of source words f1, . . . fl, and target sentences e=e1 . . . ej . . . eJ composed of target words e1, . . . eJ;
the source sentences being in a first natural language and the target sentences being in a second natural language;
producing a translation matrix M with association measures mij;
each association measure mij in the translation matrix providing a valuation of association strength between each source word fi and each target word ej;
producing one or more of an alignment matrix A and cepts that link aligned source and target words;
the alignment matrix and cepts defining a proper N;
M alignment between source and target words by satisfying coverage and transitive closure;
wherein coverage is satisfied when each source word is aligned with at least one target word and each target word is aligned to at least one source word; and
wherein transitive closure is satisfied if when source word fi is aligned to target words ej and el, and source word fk is aligned to target word el, then source word fk is also aligned to target word ej. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. An apparatus for aligning words of natural language sentences, comprising:
-
a memory for storing natural language processing instructions of the apparatus; and
a processor coupled to the memory for executing the natural language processing instructions of the apparatus;
the processor in executing the natural language processing instructions;
receiving a corpus of aligned source sentences f=f1 . . . fi . . . fl composed of source words f1, . . . fl, and target sentences e=e1 . . . ej . . . eJ composed of target words e1, . . . eJ;
the source sentences being in a first natural language and the target sentences being in a second natural language;
producing a translation matrix M with association measures mij;
each association measure mij in the translation matrix providing a valuation of association strength between each source word fi and each target word ej;
producing one or more of an alignment matrix A and cepts that link aligned source and target words;
the alignment matrix and cepts defining a proper N;
M alignment between source and target words by satisfying coverage and transitive closure;
wherein coverage is satisfied when each source word is aligned with at least one target word and each target word is aligned to at least one source word; and
wherein transitive closure is satisfied if when source word fi is aligned to target words ej and el, and source word fk is aligned to target word el, then source word fk is also aligned to target word ej. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20)
-
-
21. An article of manufacture for use in a machine, comprising:
-
a memory;
instructions stored in the memory a method for aligning words of natural language sentences, the method comprising;
receiving a corpus of aligned source sentences f=f1 . . . fi . . . fl composed of source words f1, . . . fl, and target sentences e=e1 . . . ej . . . eJ composed of target words e1, . . . eJ;
the source sentences being in a first natural language and the target sentences being in a second natural language;
producing a translation matrix M with association measures mij;
each association measure mij in the translation matrix providing a valuation of association strength between each source word fi and each target word ej;
producing one or more of an alignment matrix A and cepts that link aligned source and target words;
the alignment matrix and cepts defining a proper N;
M alignment between source and target words by satisfying coverage and transitive closure;
wherein coverage is satisfied when each source word is aligned with at least one target word and each target word is aligned to at least one source word; and
wherein transitive closure is satisfied if when source word fi is aligned to target words ej and el, and source word fk is aligned to target word el, then source word fk is also aligned to target word ej.
-
Specification