System and method for automatic, unsupervised paraphrase generation using a novel framework that learns syntactic construct while retaining semantic meaning
First Claim
1. A method, comprising:
- receiving, by a paraphrase generation system, a first phrase and a second phrase, the first and second phrases being paraphrases of each other;
converting, by the paraphrase generation system, the first phrase into a first logical form and the second phrase into a second logical form;
generating, by the paraphrase generation system, a plurality of phrasal edits that includes differences between the first logical form and the second logical form;
converting, by the paraphrase generation system, the plurality of phrasal edits into a plurality of disjunctive logical forms in two directions;
generating, by the paraphrase generation system based on the plurality of disjunctive logical forms in two directions, a first plurality of paraphrases of the first and second phrases;
determining a first score for a first paraphrase of the first plurality of paraphrases;
determining a second score for a second paraphrase of the first plurality of paraphrases, wherein the first score is higher than the second score based in part on a first syntactic variation between the first paraphrase and the first phrase and the second phrase being greater than a second syntactic variation between the second paraphrase and the first phrase and the second phrase; and
pruning, by the paraphrase generation system, the first plurality of paraphrases to yield a second plurality of paraphrases including grammatical alternatives to the first and second phrases, wherein the first and second paraphrases are pruned based on the first score and the second score such that the first paraphrase is included in the second plurality of paraphrases and the second paraphrase is not included in the second plurality of paraphrases based on the first score being higher than the second score.
1 Assignment
0 Petitions
Accused Products
Abstract
A system includes a question answering system executed by a computer, a processor, and a memory coupled to the processor. The memory is encoded with instructions that when executed cause the processor to provide training for training the question answering system. The training system is configured to receive a first phrase and a second phrase, the first and second phrases being paraphrases of each other, convert the first phrase into a first logical form and the second phrase into a second logical form, generate a phrasal edit that includes a difference between the first logical form and the second logical form, convert the phrasal edit into a disjunctive logical form in two directions, and generate a first plurality of paraphrases of the first and second phrases based on the disjunctive logical form.
-
Citations
17 Claims
-
1. A method, comprising:
-
receiving, by a paraphrase generation system, a first phrase and a second phrase, the first and second phrases being paraphrases of each other; converting, by the paraphrase generation system, the first phrase into a first logical form and the second phrase into a second logical form; generating, by the paraphrase generation system, a plurality of phrasal edits that includes differences between the first logical form and the second logical form; converting, by the paraphrase generation system, the plurality of phrasal edits into a plurality of disjunctive logical forms in two directions; generating, by the paraphrase generation system based on the plurality of disjunctive logical forms in two directions, a first plurality of paraphrases of the first and second phrases; determining a first score for a first paraphrase of the first plurality of paraphrases; determining a second score for a second paraphrase of the first plurality of paraphrases, wherein the first score is higher than the second score based in part on a first syntactic variation between the first paraphrase and the first phrase and the second phrase being greater than a second syntactic variation between the second paraphrase and the first phrase and the second phrase; and pruning, by the paraphrase generation system, the first plurality of paraphrases to yield a second plurality of paraphrases including grammatical alternatives to the first and second phrases, wherein the first and second paraphrases are pruned based on the first score and the second score such that the first paraphrase is included in the second plurality of paraphrases and the second paraphrase is not included in the second plurality of paraphrases based on the first score being higher than the second score. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A system, comprising:
-
a question answering system executed by a computer; a processor; and a memory coupled to the processor, the memory encoded with instructions that when executed cause the processor to provide training for the question answering system at least in part by causing the processor to; receive a first phrase and a second phrase, the first and second phrases being paraphrases of each other; convert the first phrase into a first logical form and the second phrase into a second logical form; generate a phrasal edit that includes a difference between the first logical form and the second logical form; convert the phrasal edit into a disjunctive logical form in two directions; generate a first plurality of paraphrases of the first and second phrases based on the disjunctive logical form; determine a first score for a first paraphrase of the first plurality of paraphrases; determine a second score for a second paraphrase of the first plurality of paraphrases, wherein the first score is higher than the second score based in part on a first syntactic variation between the first paraphrase and the first phrase and the second phrase being greater than a second syntactic variation between the second paraphrase and the first phrase and the second phrase; and prune the first plurality of paraphrases to yield a second plurality of paraphrases including grammatical alternatives to the first and second phrases, wherein the first and second paraphrases are pruned based on the first score and the second score such that the first paraphrase is included in the second plurality of paraphrases and the second paraphrase is not included in the second plurality of paraphrases based on the first score being higher than the second score. - View Dependent Claims (7, 8, 9, 10, 11)
-
-
12. A computer program product for generating paraphrases, the computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a computer to cause the computer to:
-
receive a first phrase and a second phrase, the first and second phrases being paraphrases of each other; convert the first phrase into a first logical form and the second phrase into a second logical form; generate a plurality of phrasal edits that include differences between the first logical form and the second logical form; convert the plurality of phrasal edits into a plurality of disjunctive logical forms in two directions; generate a first plurality of paraphrases of the first and second phrases based on the plurality of disjunctive logical forms; determine a first score for a first paraphrase of the first plurality of paraphrases; determine a second score for a second paraphrase of the first plurality of paraphrases, wherein the first score is higher than the second score based in part on a first syntactic variation between the first paraphrase and the first phrase and the second phrase being greater than a second syntactic variation between the second paraphrase and the first phrase and the second phrase; and prune the first plurality of paraphrases to yield a second plurality of paraphrases containing grammatical alternatives to the first and second phrases, wherein the first and second paraphrases are pruned based on the first score and the second score such that the first paraphrase is included in the second plurality of paraphrases and the second paraphrase is not included in the second plurality of paraphrases based on the first score being higher than the second score. - View Dependent Claims (13, 14, 15, 16, 17)
-
Specification