Method of computer-based automatic extraction of translation pairs of words from a bilingual text
First Claim
1. A method for automatic extraction of a translation pair of words which comprises a word of a first language and a word of a second language corresponding thereto, comprising steps executed by a computer, the steps including:
- extracting a plurality of words of the first language occurring in a first text described in the first language, from first text data which represents said first text;
extracting, in correspondence to each occurrent word of the first language, a set of a plurality of co-occurrent words of the first language for said each occurrent word of the first language, each co-occurrent word of the first language being a word which occurs in a neighborhood of at least one of a plurality of positions within said first text where said each occurrent word of the first language occurs, and which fulfills at the same time a first predetermined condition related to said each occurrent word of the first language, said extracting being done from said first text data;
extracting a plurality of words of the second language occurring in a second text corresponding to the first text and described in the second language, from second text data which represents said second text;
extracting, in correspondence to each occurrent word of the second language, a set of a plurality of co-occurrent words of the second language for said each occurrent word of the second language, each co-occurrent word of the second language being a word which occurs in a neighborhood of at least one of a plurality of positions within said second text where said each occurrent word of the second language occurs, and which fulfills at the same time a second predetermined condition related to said each occurrent word of the second language, said extracting being done from said second text data;
calculating a correlation between each occurrent word of the first language and each occurrent word of the second language, said calculating being done based on said set of a plurality of co-occurrent words of the first language extracted in correspondence to said each occurrent word of the first language and said set of a plurality of co-occurrent words of the second language extracted in correspondence to said each occurrent word of the second language; and
selecting, as a translation pair of words, at least one pair of words from a plurality of pairs of words, each pair of words comprising one of said plurality of occurrent words of the first language and one of said plurality of occurrent words of the second language, said at least one pair of words being a pair of words between which a correlation satisfies a predetermined condition related to a translation pair of words, said selecting being done based upon a plurality of pairs of words.
1 Assignment
0 Petitions
Accused Products
Abstract
For each word occurring in Japanese text, a set of words co-occurring with it and their co-occurrence frequencies are extracted, where two words are regarded as co-occurring with each other when they occur in the same sentence. Likewise, for each word occurring in an English text that corresponds to the Japanese text, a set of words co-occurring with it and their co-occurrence frequencies are extracted. A correlation is calculated between a Japanese word and an English word based upon the co-occurrent word set of the Japanese word and that of the English word, with the assistance of a Japanese-English bilingual dictionary of basic words. The correlation is defined as the ratio of the number of possible correspondences between the two co-occurrent word sets to the total of the co-occurrence frequencies in the two co-occurrent word sets. Pairs of words having a mutually maximum correlation are selected as candidate translation pairs of words, and displayed on a display device. Finally, user-selected pairs are registered in the bilingual dictionary. Thus, the bilingual dictionary is augmented incrementally.
-
Citations
39 Claims
-
1. A method for automatic extraction of a translation pair of words which comprises a word of a first language and a word of a second language corresponding thereto, comprising steps executed by a computer, the steps including:
-
extracting a plurality of words of the first language occurring in a first text described in the first language, from first text data which represents said first text; extracting, in correspondence to each occurrent word of the first language, a set of a plurality of co-occurrent words of the first language for said each occurrent word of the first language, each co-occurrent word of the first language being a word which occurs in a neighborhood of at least one of a plurality of positions within said first text where said each occurrent word of the first language occurs, and which fulfills at the same time a first predetermined condition related to said each occurrent word of the first language, said extracting being done from said first text data; extracting a plurality of words of the second language occurring in a second text corresponding to the first text and described in the second language, from second text data which represents said second text; extracting, in correspondence to each occurrent word of the second language, a set of a plurality of co-occurrent words of the second language for said each occurrent word of the second language, each co-occurrent word of the second language being a word which occurs in a neighborhood of at least one of a plurality of positions within said second text where said each occurrent word of the second language occurs, and which fulfills at the same time a second predetermined condition related to said each occurrent word of the second language, said extracting being done from said second text data; calculating a correlation between each occurrent word of the first language and each occurrent word of the second language, said calculating being done based on said set of a plurality of co-occurrent words of the first language extracted in correspondence to said each occurrent word of the first language and said set of a plurality of co-occurrent words of the second language extracted in correspondence to said each occurrent word of the second language; and selecting, as a translation pair of words, at least one pair of words from a plurality of pairs of words, each pair of words comprising one of said plurality of occurrent words of the first language and one of said plurality of occurrent words of the second language, said at least one pair of words being a pair of words between which a correlation satisfies a predetermined condition related to a translation pair of words, said selecting being done based upon a plurality of pairs of words. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39)
-
Specification