Propagating useful information among related web pages, such as web pages of a website
First Claim
Patent Images
1. A computer-implemented method comprising:
- maintaining an index of Web pages, the index storing data used by a search engine to score Web pages responsive to queries;
crawling Web pages accessible on a network to identify information pertaining to a Website having a plurality of Web pages arranged according to a URL hierarchy, wherein the Website has one or more strongly associated terms, wherein each Web page includes one or more terms, and wherein the plurality of Web pages includes a home or root page that is above the other Web pages in the URL hierarchy;
selecting a candidate term from the terms in a first Web page of the Website;
determining that the candidate term is one of the strongly associated terms for the Website by using data external to the Website;
identifying a second Web page of the Website that is below the first Web page in the URL hierarchy of the Website;
updating the index to include data associating the candidate term with the second Web page, wherein the association results in the second Web page having a higher search score for a search query including the candidate term than the second Web page would otherwise have.
2 Assignments
0 Petitions
Accused Products
Abstract
Web pages of a Website may be processed to improve search results. For example, information likely to pertain to more than just the Web page it is directly associated with may be identified. One or more other, related, Web pages that such information is likely to pertain to is also identified. The identified information is associated with the identified other Web page(s) and this association is saved in a way to affect a search result score of the Web page(s).
-
Citations
39 Claims
-
1. A computer-implemented method comprising:
-
maintaining an index of Web pages, the index storing data used by a search engine to score Web pages responsive to queries; crawling Web pages accessible on a network to identify information pertaining to a Website having a plurality of Web pages arranged according to a URL hierarchy, wherein the Website has one or more strongly associated terms, wherein each Web page includes one or more terms, and wherein the plurality of Web pages includes a home or root page that is above the other Web pages in the URL hierarchy; selecting a candidate term from the terms in a first Web page of the Website; determining that the candidate term is one of the strongly associated terms for the Website by using data external to the Website; identifying a second Web page of the Website that is below the first Web page in the URL hierarchy of the Website; updating the index to include data associating the candidate term with the second Web page, wherein the association results in the second Web page having a higher search score for a search query including the candidate term than the second Web page would otherwise have. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A computer-implemented method comprising:
-
maintaining an index of Web pages, the index storing data used by a search engine to score Web pages responsive to queries; crawling Web pages accessible on a network to identify information pertaining to a Website having a plurality of Web pages arranged according to a URL hierarchy, wherein each Web page includes one or more terms, and wherein the plurality of Web pages includes a home or root page that is above the other Web pages in the URL hierarchy; selecting a candidate term from the terms in a first Web page of the Website, determining that the candidate term is a highly descriptive term because it is uncommon; identifying a second Web page of the Website that is above the first Web page in the URL hierarchy of the Website; and updating the index to include data associating the candidate term with the second Web page in an index of Web pages, wherein the association results in the second Web page having a higher search score for a search query including the candidate term than the second Web page would otherwise have. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21)
-
-
22. An apparatus comprising:
-
one or more processors; at least one input device; and one or more storage devices storing processor executable instructions which, when executed by the one or more processors, cause the one or more processors to perform operations comprising; maintaining an index of Web pages, the index storing data used by a search engine to score Web pages responsive to queries; crawling Web pages accessible on a network to identify information pertaining to a Website having a plurality of Web pages arranged according to a URL hierarchy, wherein the Website has one or more strongly associated terms, wherein each Web page includes one or more terms, and wherein the plurality of Web pages includes a home or root page that is above the other Web pages in the URL hierarchy; selecting a candidate term from the terms in a first Web page of the Website; determining that the candidate term is one of the strongly associated terms for the Website by using data external to the Website; identifying a second Web page of the Website that is below the first Web page in the URL hierarchy of the Website; updating the index to include data associating the candidate term with the second Web page, wherein the association results in the second Web page having a higher search score for a search query including the candidate term than the second Web page would otherwise have. - View Dependent Claims (23, 24, 25, 26, 27, 28, 29)
-
-
30. A system comprising:
-
one or more processors; at least one input device; and one or more storage devices storing processor-executable instructions which, when executed by the one or more processors, perform operations comprising; maintaining an index of Web pages, the index storing data used by a search engine to score Web pages responsive to queries; crawling Web pages accessible on a network to identify information pertaining to a Website having a plurality of Web pages arranged according to a URL hierarchy, wherein each Web page includes one or more terms, and wherein the plurality of Web pages includes a home or root page that is above the other Web pages in the URL hierarchy; selecting a candidate term from the terms in a first Web page of the Website; determining that the candidate term is a highly descriptive term because it is uncommon; identifying a second Web page of the Website that is above the first Web page in the URL hierarchy of the Website; updating the index to include data associating the candidate term with the second Web page, wherein the association results in the second Web page having a higher search result score for a search query including the candidate term than the second Web page would otherwise have. - View Dependent Claims (31, 32, 33)
-
-
34. A computer-implemented method comprising:
-
maintaining an index of Web pages, the index storing data used by a search engine to score Web pages responsive to queries; crawling Web pages accessible on a network to identify information pertaining to a Website having a plurality of Web pages arranged according to a URL hierarchy, wherein each Web page includes one or more terms, and wherein the plurality of Web pages includes a home or root page that is above the other Web pages in the URL hierarchy; selecting a candidate term from the terms in a first Web page of the Website; determining that the candidate term is either; 1) a strongly associated term for the Website, or 2) a highly descriptive term; identifying a second Web page of the Website that is either; 1) below the first Web page in the URL hierarchy of the Website in the case that the candidate term is a strongly associated term, or 2) above the first Web page in the URL hierarchy of the Website in the case that the candidate term is a highly descriptive term; and updating the index to include data associating the candidate term with the second Web page, wherein the association results in the second Web page having a higher search score for a search query including the candidate term than the second Web page would otherwise have. - View Dependent Claims (35, 36)
-
-
37. A system comprising:
-
one or more processors; at least one input device; and one or more storage devices storing processor-executable instructions which, when executed by the one or more processors, perform operations comprising; maintaining an index of Web pages, the index storing data used by a search engine to score Web pages responsive to queries; crawling Web pages accessible on a network to identify information pertaining to a Website having a plurality of Web pages arranged according to a URL hierarchy, wherein each Web page includes one or more terms, and wherein the plurality of Web pages includes a home or root page that is above the other Web pages in the URL hierarchy; selecting a candidate term from the terms in a first Web page of the Website; determining that the candidate term is either; 1) a strongly associated term for the Website, or 2) a highly descriptive term; identifying a second Web page of the Website that is either; 1) below the first Web page in the URL hierarchy of the Website in the case that the candidate term is a strongly associated term, or 2) above the first Web page in the URL hierarchy of the Website in the case that the candidate term is a highly descriptive term; and updating the index to include data associating the candidate term with the second Web page, wherein the association results in the second Web page having a higher search score for a search query including the candidate term than the second Web page would otherwise have. - View Dependent Claims (38, 39)
-
Specification