Bilingual corpus update method, bilingual corpus update apparatus, and recording medium storing bilingual corpus update program
First Claim
1. A method for updating a bilingual corpus, the bilingual corpus including a plurality of pairs, each composed of a sentence described in a first language and a translated sentence described in a second language, the bilingual corpus including a first sentence described in the first language and a second sentence described in the second language as a pair, the second sentence being a translated sentence corresponding to the first sentence, the method comprising:
- inputting a third sentence obtained by replacing a first phrase among a plurality of phrases constituting the first sentence with a second phrase;
judging whether a third phrase is included in a first database, the third phrase including at least the second phrase and a fourth phrase immediately anterior to the second phrase in the third sentence or the second phrase and a fifth phrase immediately posterior to the second phrase in the third sentence, the first database including at least a phrase used in written text;
calculating, on the basis of the first database, a first evaluation value in the first database for a seventh phrase obtained by replacing the second phrase of the third phrase with a sixth phrase if it is judged that the third phrase is not included in the first database, the sixth phrase being different from the second phrase;
judging whether the third phrase is included in a second database and judging whether a second evaluation value calculated on the basis of the first evaluation value satisfies a predetermined condition, the second database including at least a phrase used in spoken text, the phrase used in the spoken text being associated with an occurrence frequency in the second database of the phrase used in the spoken text; and
adding the third sentence and the second sentence as a pair to the bilingual corpus if it is judged that the third phrase is included in the second database and that the second evaluation value satisfies the predetermined condition.
1 Assignment
0 Petitions
Accused Products
Abstract
A third sentence obtained by replacing a first phrase of a first sentence with a second phrase is input, and it is judged whether a third phrase is included in a first database including at least a phrase used in written text. If the third phrase is not included, a first evaluation value in the first database is calculated for a seventh phrase obtained by replacing the second phrase of the third phrase with a sixth phrase. It is judged whether the third phrase is included in a second database including at least a phrase used in spoken text and whether a second evaluation value calculated from the first evaluation value satisfies a predetermined condition. If the third phrase is included, and the second evaluation value satisfies the predetermined condition, the third sentence and the second sentence as a pair are added to a bilingual corpus.
7 Citations
20 Claims
-
1. A method for updating a bilingual corpus, the bilingual corpus including a plurality of pairs, each composed of a sentence described in a first language and a translated sentence described in a second language, the bilingual corpus including a first sentence described in the first language and a second sentence described in the second language as a pair, the second sentence being a translated sentence corresponding to the first sentence, the method comprising:
-
inputting a third sentence obtained by replacing a first phrase among a plurality of phrases constituting the first sentence with a second phrase; judging whether a third phrase is included in a first database, the third phrase including at least the second phrase and a fourth phrase immediately anterior to the second phrase in the third sentence or the second phrase and a fifth phrase immediately posterior to the second phrase in the third sentence, the first database including at least a phrase used in written text; calculating, on the basis of the first database, a first evaluation value in the first database for a seventh phrase obtained by replacing the second phrase of the third phrase with a sixth phrase if it is judged that the third phrase is not included in the first database, the sixth phrase being different from the second phrase; judging whether the third phrase is included in a second database and judging whether a second evaluation value calculated on the basis of the first evaluation value satisfies a predetermined condition, the second database including at least a phrase used in spoken text, the phrase used in the spoken text being associated with an occurrence frequency in the second database of the phrase used in the spoken text; and adding the third sentence and the second sentence as a pair to the bilingual corpus if it is judged that the third phrase is included in the second database and that the second evaluation value satisfies the predetermined condition. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. An apparatus for updating a bilingual corpus, the bilingual corpus including a plurality of pairs, each composed of a sentence described in a first language and a translated sentence described in a second language, the bilingual corpus including a first sentence described in the first language and a second sentence described in the second language as a pair, the second sentence being a translated sentence corresponding to the first sentence, the apparatus comprising:
-
an inputter which inputs a third sentence obtained by replacing a first phrase among a plurality of phrases constituting the first sentence with a second phrase; a first database judge which judges whether a third phrase is included in a first database, the third phrase including at least the second phrase and a fourth phrase immediately anterior to the second phrase in the third sentence or the second phrase and a fifth phrase immediately posterior to the second phrase in the third sentence, the first database including at least a phrase used in written text; a calculator which calculates, on the basis of the first database, a first evaluation value in the first database for a seventh phrase obtained by replacing the second phrase of the third phrase with a sixth phrase if it is judged that the third phrase is not included in the first database, the sixth phrase being different from the second phrase; a second database judge which judges whether the third phrase is included in a second database and judges whether a second evaluation value calculated on the basis of the first evaluation value satisfies a predetermined condition, the second database including at least a phrase used in spoken text, the phrase used in the spoken text being associated with an occurrence frequency in the second database of the phrase used in the spoken text; and an outputter which adds the third sentence and the second sentence as a pair to the bilingual corpus if it is judged that the third phrase is included in the second database and that the second evaluation value satisfies the predetermined condition.
-
-
20. A Non-transitory recording medium storing a program for causing a computer to function as an apparatus for updating a bilingual corpus,
the bilingual corpus including a plurality of pairs, each composed of a sentence described in a first language and a translated sentence described in a second language, the bilingual corpus including a first sentence described in the first language and a second sentence described in the second language as a pair, the second sentence being a translated sentence corresponding to the first sentence, the program causing the computer to execute: -
inputting a third sentence obtained by replacing a first phrase among a plurality of phrases constituting the first sentence with a second phrase; judging whether a third phrase is included in a first database, the third phrase including at least the second phrase and a fourth phrase immediately anterior to the second phrase in the third sentence or the second phrase and a fifth phrase immediately posterior to the second phrase in the third sentence, the first database including at least a phrase used in written text; calculating, on the basis of the first database, a first evaluation value in the first database for a seventh phrase obtained by replacing the second phrase of the third phrase with a sixth phrase if it is judged that the third phrase is not included in the first database, the sixth phrase being different from the second phrase; judging whether the third phrase is included in a second database and judging whether a second evaluation value calculated on the basis of the first evaluation value satisfies a predetermined condition, the second database including at least a phrase used in spoken text, the phrase used in the spoken text being associated with an occurrence frequency in the second database of the phrase used in the spoken text; and adding the third sentence and the second sentence as a pair to the bilingual corpus if it is judged that the third phrase is included in the second database and that the second evaluation value satisfies the predetermined condition.
-
Specification