×

Identifying duplicate documents from search results without comparing document content

  • US 5,913,208 A
  • Filed: 07/09/1996
  • Issued: 06/15/1999
  • Est. Priority Date: 07/09/1996
  • Status: Expired due to Term
First Claim
Patent Images

1. A method of automatically determining duplicate documents on a hit-list containing one or more duplicate documents and document instances, the hit-list having a hit-list record for each instance of the documents, each hit-list record having one or more attribute fields, each attribute field containing one or more attributes of the documents, the method comprising the steps of:

  • selecting one or more of the attributes that are intrinsic attributes, the intrinsic attributes being established at a time of document creation and that are invariant with a location and replication of the document;

    generating a pair of the hit-list records associated with the documents and intrinsic attributes;

    comparing one or more of the intrinsic attributes of the pair of hit-list records;

    using the comparison of the intrinsic attributes of the pair of hit-list records to determine if the documents are instances of the same document.

View all claims
  • 3 Assignments
Timeline View
Assignment View
    ×
    ×