×

System and method for detecting duplicate and similar documents

  • US 7,139,756 B2
  • Filed: 01/22/2002
  • Issued: 11/21/2006
  • Est. Priority Date: 01/22/2002
  • Status: Expired due to Fees
First Claim
Patent Images

1. A method for processing data representing documents, comprising:

  • for individual documents of a set of documents, executing a software program to obtain a list of terms found in each document;

    comparing the list of terms for a first document to the list of terms for a second document;

    declaring the first document to be substantially identical to, or substantially similar to, the second document if some predetermined number of terms are found in each of the lists of the first document and the second document; and

    wherein the step of comparing includes a preliminary step of sorting the documents into a document list in order of increasing size, and where the step of comparing compares the list of terms for a given document with the list of terms for the for the next larger-documents in the document list.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×