System for identifying paraphrases using machine translation
First Claim
Patent Images
1. A method of training a paraphrase processing system, comprising:
- i. accessing a plurality of documents;
ii. identifying, from the plurality of documents, a cluster of related texts that are written by different authors about a common subject, wherein the related texts are further identified as being from different news agencies and about a common event;
iii. receiving the cluster of related texts;
iv. selecting a set of text segments from the cluster, wherein selecting comprises grouping desired text segments of the related texts into a set of related text segments;
v. using textual alignment to identify paraphrase relationships between texts in the text segments included in the set of related text segments; and
vi. wherein textual alignment comprises;
using statistical textual alignment to align words in the text segments in the set; and
identifying the paraphrase relationships based on the aligned words.
2 Assignments
0 Petitions
Accused Products
Abstract
The present invention obtains a set of text segments from a cluster of different articles written about a common event. The set of text segments is then subjected to textual alignment techniques to identify paraphrases from the text segments in the text. The invention can also be used to generate paraphrases.
38 Citations
7 Claims
-
1. A method of training a paraphrase processing system, comprising:
-
i. accessing a plurality of documents; ii. identifying, from the plurality of documents, a cluster of related texts that are written by different authors about a common subject, wherein the related texts are further identified as being from different news agencies and about a common event; iii. receiving the cluster of related texts; iv. selecting a set of text segments from the cluster, wherein selecting comprises grouping desired text segments of the related texts into a set of related text segments; v. using textual alignment to identify paraphrase relationships between texts in the text segments included in the set of related text segments; and vi. wherein textual alignment comprises; using statistical textual alignment to align words in the text segments in the set; and identifying the paraphrase relationships based on the aligned words. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
Specification