Propagating useful information among related web pages, such as web pages of a website

US 7,933,890 B2
Filed: 03/31/2006
Issued: 04/26/2011
Est. Priority Date: 03/31/2006
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method comprising:

maintaining an index of Web pages, the index storing data used by a search engine to score Web pages responsive to queries;

crawling Web pages accessible on a network to identify information pertaining to a Website having a plurality of Web pages arranged according to a URL hierarchy, wherein the Website has one or more strongly associated terms, wherein each Web page includes one or more terms, and wherein the plurality of Web pages includes a home or root page that is above the other Web pages in the URL hierarchy;

selecting a candidate term from the terms in a first Web page of the Website;

determining that the candidate term is one of the strongly associated terms for the Website by using data external to the Website;

identifying a second Web page of the Website that is below the first Web page in the URL hierarchy of the Website;

updating the index to include data associating the candidate term with the second Web page, wherein the association results in the second Web page having a higher search score for a search query including the candidate term than the second Web page would otherwise have.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Web pages of a Website may be processed to improve search results. For example, information likely to pertain to more than just the Web page it is directly associated with may be identified. One or more other, related, Web pages that such information is likely to pertain to is also identified. The identified information is associated with the identified other Web page(s) and this association is saved in a way to affect a search result score of the Web page(s).

Citations

39 Claims

1. A computer-implemented method comprising:
- maintaining an index of Web pages, the index storing data used by a search engine to score Web pages responsive to queries;
  
  crawling Web pages accessible on a network to identify information pertaining to a Website having a plurality of Web pages arranged according to a URL hierarchy, wherein the Website has one or more strongly associated terms, wherein each Web page includes one or more terms, and wherein the plurality of Web pages includes a home or root page that is above the other Web pages in the URL hierarchy;
  
  selecting a candidate term from the terms in a first Web page of the Website;
  
  determining that the candidate term is one of the strongly associated terms for the Website by using data external to the Website;
  
  identifying a second Web page of the Website that is below the first Web page in the URL hierarchy of the Website;
  
  updating the index to include data associating the candidate term with the second Web page, wherein the association results in the second Web page having a higher search score for a search query including the candidate term than the second Web page would otherwise have.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. The computer-implemented method of claim 1, further comprising determining that a term is a strongly associated term for the Website based on past user search queries using the term and past user selections of search results corresponding to the Website.
  - 3. The computer-implemented method of claim 1, further comprising determining that a term is a strongly associated term for the Website based on a use of anchor text including the term, wherein the anchor text is from one or more links referring to one or more Web pages of the Website.
  - 4. The computer-implemented method of claim 1, further comprising determining that a term is a strongly associated term for the Website based on a use of a yellow page entry that lists the term as a business name and a Web page of the Website as a home page for the business.
  - 5. The computer-implemented method of claim 1, further comprising determining that a term is a strongly associated term for the Website based on a use of trademark registration information that lists the term as a trademark and a Web page of the Website as a home page.
  - 6. The computer-implemented method of claim 1, further comprising determining that a term is a strongly associated term for the Website based on a use of a domain name registration information that lists the term in a domain name and the home page of the Website.
  - 7. The computer-implemented method of claim 1 wherein identifying a second Web page comprises identifying a second Web page that is not a press release Web page, a message board Web page, a forum Web page, or a foreign language Web page.
  - 8. The computer-implemented method of claim 1 wherein identifying a second Web page comprises:
    - identifying a second Web page that is within a predetermined number of links of the first Web page.
  - 9. The computer-implemented method of claim 1 wherein the data associating the candidate term with the second Web page causes an increase in an information retrieval component of the search score.
  - 10. The computer-implemented method of claim 1 wherein the data associating the candidate term with the second Web page causes an increase in a content ranking component of the search score that represents a ranking of the second Web page.
  - 11. The computer-implemented method of claim 1 wherein the candidate term is a phrase.
  - 12. The computer-implemented method of claim 1 further comprising:
    - receiving a search query including the candidate term;
      
      determining a non-increased search result score of the second Web page; and
      
      increasing the search result score of the second Web page having a saved association with the candidate term.

13. A computer-implemented method comprising:
- maintaining an index of Web pages, the index storing data used by a search engine to score Web pages responsive to queries;
  
  crawling Web pages accessible on a network to identify information pertaining to a Website having a plurality of Web pages arranged according to a URL hierarchy, wherein each Web page includes one or more terms, and wherein the plurality of Web pages includes a home or root page that is above the other Web pages in the URL hierarchy;
  
  selecting a candidate term from the terms in a first Web page of the Website,determining that the candidate term is a highly descriptive term because it is uncommon;
  
  identifying a second Web page of the Website that is above the first Web page in the URL hierarchy of the Website;
  
  andupdating the index to include data associating the candidate term with the second Web page in an index of Web pages, wherein the association results in the second Web page having a higher search score for a search query including the candidate term than the second Web page would otherwise have.
- View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21)
- - 14. The computer-implemented method of claim 13, wherein determining that a term is uncommon comprises determining that the term occurs with less than a predetermined frequency among a collection of Web pages and Websites.
  - 15. The computer-implemented method of claim 13 further comprising determining that the candidate term is a product category and determining that the candidate term is a highly descriptive term because the candidate term is a product category.
  - 16. The computer-implemented method of claim 13 wherein the data associating the candidate term with the second Web page causes an increase in an information retrieval component of the search score for the second Web page and a search query including the candidate term.
  - 17. The computer-implemented method of claim 13 wherein the data associating the candidate term with the second Web page causes an increase in a page ranking component of the search score for the second Web page and a search query including the candidate term.
  - 18. The computer-implemented method of claim 13 wherein the candidate term is not found on a home Web page or root Web page of the Website, and wherein the second Web page is the home Web page or root Web page of the Website.
  - 19. The computer-implemented method of claim 13 wherein selecting the candidate term comprises:
    - determining a confidence in the candidate term, anddetermining that the confidence is greater than a predetermined threshold.
  - 20. The computer-implemented method of claim 19 wherein determining the confidence in the candidate term includes determining that the candidate term appears in a plurality of locations in the second Web page.
  - 21. The computer-implemented method of claim 13 further comprising:
    - receiving a search query including the candidate term;
      
      determining a non-increased search result score of the second Web page; and
      
      increasing the search result score of the second Web page having a stored association with the candidate term.

22. An apparatus comprising:
- one or more processors;
  
  at least one input device; and
  
  one or more storage devices storing processor executable instructions which, when executed by the one or more processors, cause the one or more processors to perform operations comprising;
  
  maintaining an index of Web pages, the index storing data used by a search engine to score Web pages responsive to queries;
  
  crawling Web pages accessible on a network to identify information pertaining to a Website having a plurality of Web pages arranged according to a URL hierarchy, wherein the Website has one or more strongly associated terms, wherein each Web page includes one or more terms, and wherein the plurality of Web pages includes a home or root page that is above the other Web pages in the URL hierarchy;
  
  selecting a candidate term from the terms in a first Web page of the Website;
  
  determining that the candidate term is one of the strongly associated terms for the Website by using data external to the Website;
  
  identifying a second Web page of the Website that is below the first Web page in the URL hierarchy of the Website;
  
  updating the index to include data associating the candidate term with the second Web page, wherein the association results in the second Web page having a higher search score for a search query including the candidate term than the second Web page would otherwise have.
- View Dependent Claims (23, 24, 25, 26, 27, 28, 29)
- - 23. The apparatus of claim 22 wherein the operations further comprise:
    - receiving a search query including the candidate term;
      
      determining a non-increased search result score of the second Web page; and
      
      increasing the non-increased search result score of the second Web page having a saved association with the candidate term.
  - 24. The apparatus of claim 22, wherein the operations further comprise determining that a term is a strongly associated term for the Website using past user search queries using the term and past user selections of search results corresponding to the Website.
  - 25. The apparatus of claim 22, wherein the operations further comprise determining that a term is a strongly associated term for the Website using anchor text including the term, wherein the anchor text is from one or more links referring to one or more Web pages of the Website.
  - 26. The apparatus of claim 22, wherein the operations further comprise determining that a term is a strongly associated term for the Website using a yellow page entry that lists the term as a business name and a Web page of the Website as a home page for the business.
  - 27. The apparatus of claim 22, wherein the operations further comprise determining that a term is a strongly associated term for the Website using trademark registration information that lists the term as a trademark and a Web page of the Website as a home page.
  - 28. The apparatus of claim 22, wherein the operations further comprise determining that a term is a strongly associated term for the Website using domain name registration information that lists the term in a domain name and the home page of the Website.
  - 29. The apparatus of claim 22 wherein identifying a second Web page comprises:
    - identifying a second Web page that is within a predetermined number of links of the first Web page.

30. A system comprising:
- one or more processors;
  
  at least one input device; and
  
  one or more storage devices storing processor-executable instructions which, when executed by the one or more processors, perform operations comprising;
  
  maintaining an index of Web pages, the index storing data used by a search engine to score Web pages responsive to queries;
  
  crawling Web pages accessible on a network to identify information pertaining to a Website having a plurality of Web pages arranged according to a URL hierarchy, wherein each Web page includes one or more terms, and wherein the plurality of Web pages includes a home or root page that is above the other Web pages in the URL hierarchy;
  
  selecting a candidate term from the terms in a first Web page of the Website;
  
  determining that the candidate term is a highly descriptive term because it is uncommon;
  
  identifying a second Web page of the Website that is above the first Web page in the URL hierarchy of the Website;
  
  updating the index to include data associating the candidate term with the second Web page, wherein the association results in the second Web page having a higher search result score for a search query including the candidate term than the second Web page would otherwise have.
- View Dependent Claims (31, 32, 33)
- - 31. The system of claim 30, wherein the operations further comprise:
    - receiving a search query including the candidate term;
      
      determining a non-increased search result score of the second Web page; and
      
      increasing the non-increased search result score of the second Web page having a saved association with the candidate term.
  - 32. The system of claim 30, wherein determining that a term is uncommon comprises determining that the term occurs with less than a predetermined frequency among a collection of Web pages and Websites.
  - 33. The system of claim 30 wherein the operations further comprise determining that the candidate term is a product category and determining that the candidate term is a highly descriptive term because the candidate term is a product category.

34. A computer-implemented method comprising:
- maintaining an index of Web pages, the index storing data used by a search engine to score Web pages responsive to queries;
  
  crawling Web pages accessible on a network to identify information pertaining to a Website having a plurality of Web pages arranged according to a URL hierarchy, wherein each Web page includes one or more terms, and wherein the plurality of Web pages includes a home or root page that is above the other Web pages in the URL hierarchy;
  
  selecting a candidate term from the terms in a first Web page of the Website;
  
  determining that the candidate term is either;
  
  1) a strongly associated term for the Website, or2) a highly descriptive term;
  
  identifying a second Web page of the Website that is either;
  
  1) below the first Web page in the URL hierarchy of the Website in the case that the candidate term is a strongly associated term, or2) above the first Web page in the URL hierarchy of the Website in the case that the candidate term is a highly descriptive term; and
  
  updating the index to include data associating the candidate term with the second Web page, wherein the association results in the second Web page having a higher search score for a search query including the candidate term than the second Web page would otherwise have.
- View Dependent Claims (35, 36)
- - 35. The computer-implemented method of claim 34, further comprising:
    - determining that the candidate term is strongly associated by using data external to the Website.
  - 36. The computer-implemented method of claim 34, further comprising:
    - determining that the candidate term is highly descriptive because it is uncommon.

37. A system comprising:
- one or more processors;
  
  at least one input device; and
  
  one or more storage devices storing processor-executable instructions which, when executed by the one or more processors, perform operations comprising;
  
  maintaining an index of Web pages, the index storing data used by a search engine to score Web pages responsive to queries;
  
  crawling Web pages accessible on a network to identify information pertaining to a Website having a plurality of Web pages arranged according to a URL hierarchy, wherein each Web page includes one or more terms, and wherein the plurality of Web pages includes a home or root page that is above the other Web pages in the URL hierarchy;
  
  selecting a candidate term from the terms in a first Web page of the Website;
  
  determining that the candidate term is either;
  
  1) a strongly associated term for the Website, or2) a highly descriptive term;
  
  identifying a second Web page of the Website that is either;
  
  1) below the first Web page in the URL hierarchy of the Website in the case that the candidate term is a strongly associated term, or2) above the first Web page in the URL hierarchy of the Website in the case that the candidate term is a highly descriptive term; and
  
  updating the index to include data associating the candidate term with the second Web page, wherein the association results in the second Web page having a higher search score for a search query including the candidate term than the second Web page would otherwise have.
- View Dependent Claims (38, 39)
- - 38. The computer-implemented method of claim 37, further comprising:
    - determining that the candidate term is strongly associated by using data external to the Website.
  - 39. The computer-implemented method of claim 38, further comprising:
    - determining that the candidate term is highly descriptive because it is uncommon.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Egnor, Daniel, Lamping, John, Singhal, Amitabh K., Lacker, Kevin, Yang, Ke, Haahr, Paul
Primary Examiner(s)
Wassum; Luke S.
Assistant Examiner(s)
Allen; Nicholas E

Application Number

US11/396,301
Publication Number

US 20070233808A1
Time in Patent Office

1,852 Days
Field of Search

707/709, 707/999.005, 707/740, 707/5
US Class Current

707/709
CPC Class Codes

G06F 16/2228   Indexing structures

G06F 16/951   Indexing; Web crawling tech...

G06F 16/958   Organisation or management ...

Propagating useful information among related web pages, such as web pages of a website

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

39 Claims

Specification

Solutions

Use Cases

Quick Links

Propagating useful information among related web pages, such as web pages of a website

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

39 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links