Method for identifying near duplicate pages in a hyperlinked database
First Claim
Patent Images
1. A method for identifying pages that are near duplicates in a linked database, the pages in the database having incoming links and outgoing links, comprising the steps of:
- selecting a first page and a second page;
determining the outgoing links for the first page and the second page;
determining the number of outgoing links that are common for the first page and the second page;
marking the first page and the second page as near duplicate pages based on the number of common outgoing links.
13 Assignments
0 Petitions
Accused Products
Abstract
A method is described for identifying pages that are near duplicates in a linked database. In the linked database, pages can have incoming links and outgoing links. Two pages are selected, a first page and a second page. For each selected page, the number of outgoing links is determined. The two pages are marked as near duplicates based on the number of common outgoing links for the two pages.
-
Citations
4 Claims
-
1. A method for identifying pages that are near duplicates in a linked database, the pages in the database having incoming links and outgoing links, comprising the steps of:
-
selecting a first page and a second page; determining the outgoing links for the first page and the second page; determining the number of outgoing links that are common for the first page and the second page; marking the first page and the second page as near duplicate pages based on the number of common outgoing links. - View Dependent Claims (2, 3, 4)
-
Specification