Method for identifying related pages in a hyperlinked database
First Claim
Patent Images
1. A method for identifying pages related to an initial page, comprising:
- identifying a plurality of pages linked to the initial page;
representing the plurality of pages as a graph of nodes;
scoring the plurality of pages on connectivity of said plurality of pages to the initial page to generate a connectivity score for each of said plurality of pages;
removing from the graph of nodes pages with an undue influence on the scoring of other pages in the plurality of pages, wherein a page has the undue influence on the scoring of other pages in the plurality of pages, when said page has a score greater than a predetermined fraction of a total connectivity score, said total connectivity score computed by summing connectivity scores of the plurality of pages;
re-scoring remaining pages represented in the graph of nodes; and
selecting a subset of the remaining pages represented in the graph of nodes that have connectivity scores greater than a first predetermined threshold as the pages related to the initial page.
9 Assignments
0 Petitions
Accused Products
Abstract
A method is described for identifying related pages among a plurality of pages in a linked database such as the World Wide Web. An initial page is selected from the plurality of pages. Pages linked to the initial page are represented as a graph in a memory. The pages represented in the graph are scored on content, and a set of pages is selected, the selected set of pages having scores greater than a first predetermined threshold. The selected set of pages is scored on connectivity, and a subset of the set of pages that have scores greater than a second predetermined threshold are selected as related pages.
-
Citations
54 Claims
-
1. A method for identifying pages related to an initial page, comprising:
-
identifying a plurality of pages linked to the initial page;
representing the plurality of pages as a graph of nodes;
scoring the plurality of pages on connectivity of said plurality of pages to the initial page to generate a connectivity score for each of said plurality of pages;
removing from the graph of nodes pages with an undue influence on the scoring of other pages in the plurality of pages, wherein a page has the undue influence on the scoring of other pages in the plurality of pages, when said page has a score greater than a predetermined fraction of a total connectivity score, said total connectivity score computed by summing connectivity scores of the plurality of pages;
re-scoring remaining pages represented in the graph of nodes; and
selecting a subset of the remaining pages represented in the graph of nodes that have connectivity scores greater than a first predetermined threshold as the pages related to the initial page. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A method for identifying pages related to an initial page, comprising:
-
identifying a plurality of pages linked to the initial page;
representing the plurality of pages as a graph of nodes;
scoring the plurality of pages on connectivity of said plurality of pages to the initial page to generate a connectivity score for each of said plurality of pages;
removing from the graph of nodes pages with an undue influence on the scoring of other pages in the plurality of pages, wherein a page has the undue influence on the scoring of other pages in the plurality of pages when said page has a score greater than each score of all other pages in the plurality of pages and said score is at least three times greater than a next highest score of another page in said plurality of pages;
re-scoring remaining pages represented in the graph of nodes; and
selecting a subset of the remaining pages represented in the graph of nodes that have connectivity scores greater than a first predetermined threshold as the pages related to the initial page. - View Dependent Claims (16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27)
-
-
28. A computer program product readable by a computing system and encoding a computer program of instructions for executing a computer process for identifying pages related to an initial page, said computer process comprising:
-
identifying a plurality of pages linked to the initial page;
representing the plurality of pages as a graph of nodes;
scoring the plurality of pages on connectivity of said plurality of pages to the initial page to generate a connectivity score for each of said plurality of pages;
removing from the graph of nodes pages with an undue influence on the scoring of other pages in the plurality of pages, wherein a page has the undue influence on the scoring of other pages in the plurality of pages, when said page has a score greater than a predetermined fraction of a total connectivity score, said total connectivity score computed by sung connectivity scores of the plurality of pages;
re-scoring remaining pages represented in the graph of nodes; and
selecting a subset of the remaining pages represented in the graph of nodes that have connectivity scores greater than a first predetermined threshold as the pages related to the initial page. - View Dependent Claims (29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41)
-
-
42. A computer program product readable by a computing system and encoding a computer program of instructions for executing a computer process for identifying pages related to an initial page, said computer process comprising:
-
identifying a plurality of pages linked to the initial page;
representing the plurality of pages as a graph of nodes;
scoring the plurality of pages on connectivity of said plurality of pages to the initial page to generate a connectivity score for each of said plurality of pages;
removing from the graph of nodes pages with an undue influence on the scoring of other pages in the plurality of pages, wherein a page has the undue influence on the scoring of other pages in the plurality of pages when said page has a score greater than each score of all other pages in the plurality of pages and said score is at least three times greater than a next highest score of another page in said plurality of pages;
re-scoring remaining pages represented in the graph of nodes; and
selecting a subset of the remaining pages represented in the graph of nodes that have connectivity scores greater than a first predetermined threshold as the pages related to the initial page. - View Dependent Claims (43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54)
-
Specification