Method for generating paraphrases for use in machine translation system
First Claim
Patent Images
1. A paraphrase generation method, comprising:
- receiving an original text;
generating, using a processor, one or more paraphrase candidate sentences by paraphrasing one or more of a plurality of fragments included in the original text into another expression in a language of the original text, the plurality of fragments being obtained by dividing the original text in accordance with a predetermined rule;
determining whether the paraphrasing of one or more of the plurality of fragments in each of the one or more paraphrase candidate sentences is acceptable by comparing each of a cumulative total of paraphrase acceptability scores with a first acceptable limit, each of the cumulative total of paraphrase acceptability scores being calculated by adding each of paraphrase acceptability scores in each of the one or more paraphrase candidate sentences, each of the paraphrase acceptability scores being assigned to a paraphrase pair including a first fragment and a second fragment that represents another expression of the first fragment, each of the paraphrase acceptability scores indicating a degree to which paraphrasing from the first fragment into the second fragment is accepted;
determining whether each of the one or more paraphrase candidate sentences is acceptable by comparing each of linguistic acceptability scores with a second acceptable limit, each of the linguistic acceptability scores indicating a degree to which the paraphrase candidate sentence is accepted as having a linguistically correct meaning, and each of the linguistic acceptability scores being obtained from a language model; and
outputting each of the one or more paraphrase candidate sentences when (i) each of the cumulative total of paraphrase acceptability scores is within the first acceptable limit, and (ii) each of the linguistic acceptability scores is within the second acceptable limit.
1 Assignment
0 Petitions
Accused Products
Abstract
A paraphrase generation method according to the present disclosure generates one or more paraphrases of an original text by paraphrasing, within an acceptable limit for accepting paraphrasing, one or more of a plurality of fragments included in the original text into another expression in the language of the original text, the plurality of fragments being obtained by dividing the original text in accordance with a predetermined rule.
23 Citations
7 Claims
-
1. A paraphrase generation method, comprising:
-
receiving an original text; generating, using a processor, one or more paraphrase candidate sentences by paraphrasing one or more of a plurality of fragments included in the original text into another expression in a language of the original text, the plurality of fragments being obtained by dividing the original text in accordance with a predetermined rule; determining whether the paraphrasing of one or more of the plurality of fragments in each of the one or more paraphrase candidate sentences is acceptable by comparing each of a cumulative total of paraphrase acceptability scores with a first acceptable limit, each of the cumulative total of paraphrase acceptability scores being calculated by adding each of paraphrase acceptability scores in each of the one or more paraphrase candidate sentences, each of the paraphrase acceptability scores being assigned to a paraphrase pair including a first fragment and a second fragment that represents another expression of the first fragment, each of the paraphrase acceptability scores indicating a degree to which paraphrasing from the first fragment into the second fragment is accepted; determining whether each of the one or more paraphrase candidate sentences is acceptable by comparing each of linguistic acceptability scores with a second acceptable limit, each of the linguistic acceptability scores indicating a degree to which the paraphrase candidate sentence is accepted as having a linguistically correct meaning, and each of the linguistic acceptability scores being obtained from a language model; and outputting each of the one or more paraphrase candidate sentences when (i) each of the cumulative total of paraphrase acceptability scores is within the first acceptable limit, and (ii) each of the linguistic acceptability scores is within the second acceptable limit. - View Dependent Claims (2, 3, 7)
-
-
4. An apparatus, comprising:
-
a processor; and a medium having a computer program stored thereon, the computer program causing the processor to execute operations including receiving an original text; generating, using the processor, one or more paraphrase candidate sentences by paraphrasing one or more of a plurality of fragments included in the original text into another expression in a language of the original text, the plurality of fragments being obtained by dividing the original text in accordance with a predetermined rule; determining whether the paraphrasing of one or more of the plurality of fragments in each of the one or more paraphrase candidate sentences is acceptable by comparing each of a cumulative total of paraphrase acceptability scores with a first acceptable limit, each of the cumulative total of paraphrase acceptability scores being calculated by adding each of paraphrase acceptability scores in each of the one or more paraphrase candidate sentences, each of the paraphrase acceptability scores being assigned to a paraphrase pair including a first fragment and a second fragment that represents another expression of the first fragment, each of the paraphrase acceptability scores indicating a degree to which paraphrasing from the first fragment into the second fragment is accepted; determining whether each of the one or more paraphrase candidate sentences is acceptable by comparing each of linguistic acceptability scores with a second acceptable limit, each of the linguistic acceptability scores indicating a degree to which the paraphrase candidate sentence is accepted as having a linguistically correct meaning, and each of the linguistic acceptability scores being obtained from a language model; and outputting each of the one or more paraphrase candidate sentences when (i) each of the cumulative total of paraphrase acceptability scores is within the first acceptable limit, and (ii) each of the linguistic acceptability scores is within the second acceptable limit.
-
-
5. A non-transitory recording medium having a computer program stored thereon, the computer program causing a processor to execute operations comprising:
-
receiving an original text; generating, using the processor, one or more paraphrase candidate sentences by paraphrasing one or more of a plurality of fragments included in the original text into another expression in a language of the original text, the plurality of fragments being obtained by dividing the original text in accordance with a predetermined rule; determining whether the paraphrasing of one or more of the plurality of fragments in each of the one or more paraphrase candidate sentences is acceptable by comparing each of a cumulative total of paraphrase acceptability scores with a first acceptable limit, each of the cumulative total paraphrase acceptability scores being calculated by adding each of paraphrase acceptability scores in each of the one or more paraphrase candidate sentences, each of the paraphrase acceptability scores being assigned to a paraphrase pair including a first fragment and a second fragment that represents another expression of the first fragment, each of the paraphrase acceptability scores indicating a degree to which paraphrasing from the first fragment into the second fragment is accepted; determining whether each of the one or more paraphrase candidate sentences is acceptable by comparing each of linguistic acceptability scores with a second acceptable limit, each of the linguistic acceptability scores indicating a degree to which the paraphrase candidate sentence is accepted as having a linguistically correct meaning, and each of the linguistic acceptability score being obtained from a language model; and outputting each of the one or more paraphrase candidate sentences when (i) each of the cumulative total of paraphrase acceptability scores is within the first acceptable limit, and (ii) each of the linguistic acceptability scores is within the second acceptable limit.
-
-
6. A machine translation system, comprising:
-
a processor; and a medium having a computer program stored therein, the computer programs causing the processor to execute operations, including receiving an original text; generating, using the processor, one or more paraphrase candidate sentences by paraphrasing one or more of a plurality of fragments included in the original text into another expression in a language of the original text, the plurality of fragments being obtained by dividing the original text in accordance with a predetermined rule; determining whether the paraphrasing of one or more of the plurality of fragments in each of the one or more paraphrase candidate sentences is acceptable by comparing each of a cumulative total of paraphrase acceptability scores with a first acceptable limit, each of the cumulative total of paraphrase acceptability scores being calculated by adding each of paraphrase acceptability scores in each of the one or more paraphrase candidate sentences, each of the paraphrase acceptability scores being assigned to a paraphrase pair including a first fragment and a second fragment that represents another expression of the first fragment, each of the paraphrase acceptability scores indicating a degree to which paraphrasing from the first fragment into the second fragment is accepted; determining whether each of the one or more paraphrase candidate sentences is acceptable by comparing each of linguistic acceptability scores with a second acceptable limit, each of the linguistic acceptability scores indicating a degree to which the paraphrase candidate sentence is accepted as having a linguistically correct meaning and each of the linguistic acceptability scores being obtained from a language model; outputting each of the one or more paraphrase candidate sentences when (i) each of the cumulative total of paraphrase acceptability scores is within the first acceptable limit, and (ii) each of the linguistic acceptability scores is within the second acceptable limit; creating a translation corpus including a collection of a plurality of text pairs, the plurality of text pairs each including a first text in a first language paired with a second text in a second language different from the first language; and translating a source text between the first language and the second language using the translation corpus, the source text representing a text to be translated, wherein the creating of the translation corpus creates one or more new text pairs, the one or more new text pairs each including the second text and the one or more paraphrase candidate sentences, the created one or more new text pairs forming a new part of the translation corpus.
-
Specification