Methods and systems for alignment of parallel text corpora
First Claim
1. A computer-implemented method of aligning fragments of a first text in a first language with corresponding fragments of a second text, which is a translation of the first text into a second language, comprising:
- preliminarily dividing the first and second texts into fragments;
generating a hypothesis about correspondence between at least first fragment in the first text and at least second fragment in the second text;
determining estimations reflecting correspondence between the first and the second fragments, wherein each estimation is based at least on a ratio between;
(a) a number of words in at least one of the first segment or the second segment; and
(b) a number of words in the first fragment that have a corresponding translation in the second fragment according to a normalized one-to-one dictionary;
determining a degree of correspondence between the first and the second fragments based on the estimations, including adjusting the estimations by using weight coefficients selected on the basis of heuristics or training; and
comparing the degree of correspondence to a predetermined threshold.
3 Assignments
0 Petitions
Accused Products
Abstract
Computer-implemented systems and methods align fragments of a first text with corresponding fragments of a second text, which is a translation of the first text. One preferred embodiment preliminarily divides the first and second texts into fragments; generates a hypothesis about the correspondence between the fragments of the first and second texts; performs a lexico-morphological analysis of the fragments using linguistic descriptions; performs a syntactic analysis of the fragments using linguistic descriptions and generates syntactic structures for the fragments; generates semantic structures for the fragments; and estimates the degree of correspondence between the semantic structures.
-
Citations
28 Claims
-
1. A computer-implemented method of aligning fragments of a first text in a first language with corresponding fragments of a second text, which is a translation of the first text into a second language, comprising:
-
preliminarily dividing the first and second texts into fragments; generating a hypothesis about correspondence between at least first fragment in the first text and at least second fragment in the second text; determining estimations reflecting correspondence between the first and the second fragments, wherein each estimation is based at least on a ratio between;
(a) a number of words in at least one of the first segment or the second segment; and
(b) a number of words in the first fragment that have a corresponding translation in the second fragment according to a normalized one-to-one dictionary;determining a degree of correspondence between the first and the second fragments based on the estimations, including adjusting the estimations by using weight coefficients selected on the basis of heuristics or training; and comparing the degree of correspondence to a predetermined threshold. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A computer-implemented method of aligning fragments of a first text in a first language with corresponding fragments of a second text, which is a translation of the first text into a second language, comprising:
-
preliminarily dividing the first and second texts into fragments; generating a hypothesis about correspondence between at least first fragment in the first text and at least second fragment in the second text; performing a lexico-morphological analysis of the first and the second fragments using linguistic descriptions; performing a syntactic analysis of the first and the second fragments using linguistic descriptions and generating a syntactic structure for the first fragment and a syntactic structure for the second fragment; generating a semantic structure for the first fragment and a semantic structure for the second fragment, wherein the semantic structures are directional acyclic graphs with nodes that are assigned elements of semantic hierarchy; estimating the degree of correspondence between the semantic structure for the first fragment and the semantic structure for the second fragment, wherein estimating the degree of correspondence between the semantic structures includes identifying correspondence of tree structure, deep slots, non-tree links, and semantic classes; and if the degree of correspondence between the semantic structure for the first fragment and the semantic structure for the second fragment satisfies a predetermined threshold, saving the generated syntactic and semantic structures for the first fragment in connection with the first fragment; and saving the generated syntactic and semantic structures for the second fragment in connection with the second fragment. - View Dependent Claims (9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
-
-
23. A computer system for aligning fragments of a first text in a first language with corresponding fragments of a second text, which is a translation of the first text into a second language, comprising:
-
a processor; and a memory coupled to the processor, the memory storing instructions, which when executed by the computer system for aligning fragments cause the system to perform a method comprising; preliminarily dividing the first and second texts into fragments; generating a hypothesis about correspondence between at least first fragment in the first text and at least second fragment in the second text; performing a lexico-morphological analysis of the first and the second fragments using linguistic descriptions; performing a syntactic analysis of the first and the second fragments using linguistic descriptions and generating a syntactic structure for the first fragment and a syntactic structure for the second fragment; generating a semantic structure for the first fragment and a semantic structure for the second fragment, wherein the semantic structures are directional acyclic graphs with nodes that are assigned elements of semantic hierarchy; estimating the degree of correspondence between the semantic structure for the first fragment and the semantic structure for the second fragment, wherein estimating the degree of correspondence between the semantic structures includes identifying correspondence of tree structure, deep slots, non-tree links, and semantic classes; and if the degree of correspondence between the semantic structure for the first fragment and the semantic structure for the second fragment satisfies a predetermined threshold, saving the generated syntactic and semantic structures for the first fragment in connection with the first fragment; and saving the generated syntactic and semantic structures for the second fragment in connection with the second fragment. - View Dependent Claims (24, 25)
-
-
26. A computer-readable medium having stored thereon a sequence of instructions, which, when executed by a computer system for aligning fragments of a first text in a first language with corresponding fragments of a second text, which is a translation of the first text into a second language, causes the system to perform a method, comprising:
-
preliminarily dividing the first and second texts into fragments; generating a hypothesis about correspondence between at least first fragment in the first text and at least second fragment in the second text; performing a lexico-morphological analysis of the first and the second fragments using linguistic descriptions; performing a syntactic analysis of the first and the second fragments using linguistic descriptions and generating a syntactic structure for the first fragment and a syntactic structure for the second fragment; generating a semantic structure for the first fragment and a semantic structure for the second fragment, wherein the semantic structures are directional acyclic graphs with nodes that are assigned elements of semantic hierarchy; estimating the degree of correspondence between the semantic structure for the first fragment and the semantic structure for the second fragment, wherein estimating the degree of correspondence between the semantic structures includes identifying correspondence of tree structure, deep slots, non-tree links, and semantic classes; and if the degree of correspondence between the semantic structure for the first fragment and the semantic structure for the second fragment satisfies a predetermined threshold, saving the generated syntactic and semantic structures for the first fragment in connection with the first fragment; and saving the generated syntactic and semantic structures for the second fragment in connection with the second fragment. - View Dependent Claims (27, 28)
-
Specification