×

Identifying documents which form translated pairs, within a document collection

  • US 7,813,918 B2
  • Filed: 08/03/2005
  • Issued: 10/12/2010
  • Est. Priority Date: 08/03/2005
  • Status: Active Grant
First Claim
Patent Images

1. A method for identifying documents that represent similar information to train a text-to-text application, the method comprising:

  • obtaining a group of documents;

    determining reduced size versions of the documents, wherein the reduced size versions summarize information about words contained in the documents and the determining is performed by a processor;

    changing an order of information within the reduced size versions;

    sorting the reduced size versions;

    comparing the reduced size versions to determine documents that represent similar information, wherein the comparing is performed by a processor; and

    using the documents that represent similar information for training for the text-to-text application.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×