×

Method for identifying near duplicate pages in a hyperlinked database

  • US 6,138,113 A
  • Filed: 08/10/1998
  • Issued: 10/24/2000
  • Est. Priority Date: 08/10/1998
  • Status: Expired due to Term
First Claim
Patent Images

1. A method for identifying pages that are near duplicates in a linked database, the pages in the database having incoming links and outgoing links, comprising the steps of:

  • selecting a first page and a second page;

    determining the outgoing links for the first page and the second page;

    determining the number of outgoing links that are common for the first page and the second page;

    marking the first page and the second page as near duplicate pages based on the number of common outgoing links.

View all claims
  • 13 Assignments
Timeline View
Assignment View
    ×
    ×