Web page ranking with hierarchical considerations
First Claim
1. A method of evaluating content on the Web comprising:
- identifying a plurality of pages on the Web;
identifying a plurality of nodes, each node associated with a hierarchical structure to which at least one of the pages corresponds;
grouping the plurality of pages to the corresponding nodes;
for each node, determining a first value based, at least in part, on the linking relationships between that node and the other nodes;
for each page, determining a unique second value based, at least in part, on characteristics of that page, wherein the characteristics of that page includes a level value representing a hierarchical level of that page within the hierarchical structure of the corresponding node to which the page corresponds, wherein the unique second value is uniquely determined for each page based on the level value;
for each node, identifying inter-links of pages corresponding to the node, wherein the inter-links represent links that point to the pages corresponding to the node and that are included in other pages corresponding to other nodes;
aggregating the identified inter-links;
determining a third value based, at least in part, on the total number of aggregated inter-links and as a variable that represents a relative weight distribution between the identified inter-links and intra-links for the node, such that the relative weight distribution represents an inverse relationship between the weight of the identified inter-links and the weight of the intra-links for the node, a predetermined value of the variable would indicate that inter-links having more relative link weight than the intra-links;
determining the first value for the node based, at least in part, on the third value; and
determining, storing an importance value and outputting for each page based, at least in part, on the unique second value associated with that page and the first value associated with the node to which the page corresponds and a reputation value, wherein the reputation value is an indication of the influence of host'"'"'s importance on the overall importance of the page.
2 Assignments
0 Petitions
Accused Products
Abstract
The described systems, methods and data structures are directed to ranking Web pages with hierarchical considerations. The hierarchical structures and the linking relationships of the World Wide Web are used to provide a page importance ranking for Web searches. The linking relationships are aggregated to a high level node at each of the hierarchical structures. A link graph analysis is performed on the aggregated linking relationships to determine the importance of each node. The importance of each node may be propagated to pages associated with that node. For each page, the importance of that page and the importance of the node associated with the page are used to calculate the page importance ranking.
-
Citations
23 Claims
-
1. A method of evaluating content on the Web comprising:
-
identifying a plurality of pages on the Web; identifying a plurality of nodes, each node associated with a hierarchical structure to which at least one of the pages corresponds; grouping the plurality of pages to the corresponding nodes; for each node, determining a first value based, at least in part, on the linking relationships between that node and the other nodes; for each page, determining a unique second value based, at least in part, on characteristics of that page, wherein the characteristics of that page includes a level value representing a hierarchical level of that page within the hierarchical structure of the corresponding node to which the page corresponds, wherein the unique second value is uniquely determined for each page based on the level value; for each node, identifying inter-links of pages corresponding to the node, wherein the inter-links represent links that point to the pages corresponding to the node and that are included in other pages corresponding to other nodes; aggregating the identified inter-links; determining a third value based, at least in part, on the total number of aggregated inter-links and as a variable that represents a relative weight distribution between the identified inter-links and intra-links for the node, such that the relative weight distribution represents an inverse relationship between the weight of the identified inter-links and the weight of the intra-links for the node, a predetermined value of the variable would indicate that inter-links having more relative link weight than the intra-links; determining the first value for the node based, at least in part, on the third value; and determining, storing an importance value and outputting for each page based, at least in part, on the unique second value associated with that page and the first value associated with the node to which the page corresponds and a reputation value, wherein the reputation value is an indication of the influence of host'"'"'s importance on the overall importance of the page. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A system comprising:
-
a data store containing data about pages on the Web, the data for each page indicating characteristics of the page and a host to which the page corresponds; and a computer comprising a ranking module configured to determine links pointing to each of the pages from the data in the data store, the ranking module also configured to aggregate the links associated with each host and to calculate a weight value for the host based on the aggregated links such that the weight value is a function of, at least in part, on the total number of aggregated inter-links and as a variable that represents a relative weight distribution between the identified inter-links and intra-links for the node, such that the relative weight distribution represents an inverse relationship between the weight of the identified inter-links and the weight of the intra-links for the node, a predetermined value of the variable would indicate that inter-links having more relative link weight than the intra-links, the ranking module further configured to compute and store a unique importance value for each page based, at least in part, on the weight value of the host corresponding to the page and the characteristics of the page, wherein the unique importance value of each page is calculated at least partly based on a hierarchical level value of that page representing a hierarchical level within a hierarchical structure associated with the corresponding host, wherein the unique importance value is uniquely determined for each page and the unique importance value is also a function of a reputation of the host to which the page corresponds and outputting the importance value. - View Dependent Claims (11, 12, 13, 14, 15)
-
-
16. An apparatus comprising:
-
means for gathering data about pages on the Web; means for determining hosts to which the pages correspond; means for determining an importance of each host at least in part, on a hierarchical random walk analysis; means for determining an importance of each page; means for ranking each page based, at least in part, on the importance of the page and the importance of the corresponding host, a reputation of the page and a reputation of the host, wherein the reputation value is an indication of the influence of host'"'"'s importance on the overall importance of the page, and means for storing a ranking of each page, wherein the means for determining the importance of each page uses a hierarchical level value representing the hierarchical level of that page within a hierarchical structure associated with the corresponding host to which that page corresponds, wherein the importance of each page is uniquely determined, means for aggregating linking relationships of the pages at each host; means for calculating a weight value for the corresponding host to which that page corresponds, based on the aggregated links such that the weight value is a function of, at least in part, on the total number of aggregated inter-links and as a variable that represents a relative weight distribution between the identified inter-links and intra-links for the node, such that the relative weight distribution represents an inverse relationship between the weight of the identified inter-links and the weight of the intra-links for the node, a predetermined value of the variable would indicate that inter-links having more relative link weight than the intra-links, means for determining the importance of each host based, at least in part, on the aggregated linking relationships associated with the host; wherein the means for determining the importance of each page determines and outputs the importance of the page based on whether the page is an index page or a content page. - View Dependent Claims (17, 18, 19, 20, 21, 22, 23)
-
Specification