System for identifying paraphrases using machine translation

US 7,412,385 B2
Filed: 11/12/2003
Issued: 08/12/2008
Est. Priority Date: 11/12/2003
Status: Active Grant

First Claim

Patent Images

1. A method of training a paraphrase processing system, comprising:

i. accessing a plurality of documents;

ii. identifying, from the plurality of documents, a cluster of related texts that are written by different authors about a common subject, wherein the related texts are further identified as being from different news agencies and about a common event;

iii. receiving the cluster of related texts;

iv. selecting a set of text segments from the cluster, wherein selecting comprises grouping desired text segments of the related texts into a set of related text segments;

v. using textual alignment to identify paraphrase relationships between texts in the text segments included in the set of related text segments; and

vi. wherein textual alignment comprises;

using statistical textual alignment to align words in the text segments in the set; and

identifying the paraphrase relationships based on the aligned words.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present invention obtains a set of text segments from a cluster of different articles written about a common event. The set of text segments is then subjected to textual alignment techniques to identify paraphrases from the text segments in the text. The invention can also be used to generate paraphrases.

38 Citations

View as Search Results

7 Claims

1. A method of training a paraphrase processing system, comprising:
- i. accessing a plurality of documents;
  
  ii. identifying, from the plurality of documents, a cluster of related texts that are written by different authors about a common subject, wherein the related texts are further identified as being from different news agencies and about a common event;
  
  iii. receiving the cluster of related texts;
  
  iv. selecting a set of text segments from the cluster, wherein selecting comprises grouping desired text segments of the related texts into a set of related text segments;
  
  v. using textual alignment to identify paraphrase relationships between texts in the text segments included in the set of related text segments; and
  
  vi. wherein textual alignment comprises;
  
  using statistical textual alignment to align words in the text segments in the set; and
  
  identifying the paraphrase relationships based on the aligned words.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1 and further comprising:
    - calculating an alignment model based on the paraphrase relationships identified.
  - 3. The method of claim 2 and further comprising:
    - receiving an input text; and
      
      generating a paraphrase of the input text based on the alignment model.
  - 4. The method of claim 1 and wherein selecting a set of text segments comprises:
    - selecting text segments for the set based on a number of shared words in the text segments.
  - 5. The method of claim 1 wherein identifying a cluster of related texts comprises identifying texts written within a predetermined time of one another.
  - 6. The method of claim 1 wherein grouping desired text segments comprises grouping a first predetermined number of sentences of each news article in each cluster into the set of related text segments.
  - 7. The method of claim 6 wherein selecting a set of text segments comprises:
    - pairing each sentence in a given set of related text segments with each other sentence in the given set.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Brockett, Christopher J., Dolan, William B., Quirk, Christopher B.
Primary Examiner(s)
Hudspeth; David R.
Assistant Examiner(s)
Rider; Justin W

Application Number

US10/706,102
Publication Number

US 20050102614A1
Time in Patent Office

1,735 Days
Field of Search

704/10, 704/245
US Class Current

704/245
CPC Class Codes

G06F 40/253   Grammatical analysis; Style...

G06F 40/279   Recognition of textual enti...

G06F 40/44   Statistical methods, e.g. p...

G06F 40/58   Use of machine translation,...

Y10S 707/99931   Database or file accessing

System for identifying paraphrases using machine translation

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

38 Citations

7 Claims

Specification

Solutions

Use Cases

Quick Links

System for identifying paraphrases using machine translation

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

38 Citations

7 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links