×

System and method for detecting duplicate and similar documents

  • US 20030172066A1
  • Filed: 01/22/2002
  • Published: 09/11/2003
  • Est. Priority Date: 01/22/2002
  • Status: Active Grant
First Claim
Patent Images

1. A method for processing data representing documents, comprising:

  • for individual documents of a set of documents, executing a software program to obtain a list of terms found in each document;

    comparing the list of terms for a first document to the list of terms for a second document; and

    declaring the first document to be substantially identical to, or substantially similar to, the second document if some predetermined number of terms are found in each of the lists of the first document and the second document.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×