AVOIDING MASKED WEB PAGE CONTENT INDEXING ERRORS FOR SEARCH ENGINES
First Claim
1. A method comprising:
- accessing a web page using a server operating a web crawling application to create first index information for the web page;
receiving, at the server, second index information generated from a cached copy of the web page received by a client via a browser application operated at the client; and
ranking the web page in a search results list generated at the server, based on a comparison between the first and second indexes of the web page.
0 Assignments
0 Petitions
Accused Products
Abstract
Multiple non-host client sites provide cached user copies of web pages and/or web content, or summaries thereof, to a server. Obtaining data from non-host sources for indexing purposes avoids masked web page content indexing errors for search engines. The server aggregates, summarizes and indexes the web pages and/or web content in an index of cached content, in conjunction with updating, generating and storing a search index using an indexing agent such as a web crawler or spider. In response to receiving search requests from end users, the search engine uses comparisons between the index of cached content and the index of crawled content to identify potential page masking errors for specific search results and appropriately rank or omit results with a high risk of masking errors in a search result list.
45 Citations
25 Claims
-
1. A method comprising:
-
accessing a web page using a server operating a web crawling application to create first index information for the web page; receiving, at the server, second index information generated from a cached copy of the web page received by a client via a browser application operated at the client; and ranking the web page in a search results list generated at the server, based on a comparison between the first and second indexes of the web page. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A system for indexing web pages comprising:
-
means for accessing a web page to create a first index information of the web page; means for receiving a cached copy of the web page at a client; means for generating a second index information of the web page using the cached copy; and means for ranking the web page based on a comparison between the first index information and the second index information of the web page. - View Dependent Claims (12, 13, 14, 15, 16, 18, 19)
-
-
17. A system for indexing web pages comprising:
-
a crawler to access a web page to create a first index information of the web page; a server to receive second index information generated from a cached copy of the web page obtained by a browser application operating on a remote client; an index generator to generate a second index information of the web page using the cached copy; and an analyzer to rank the web page based on a comparison between the first index information and the second index information of the web page.
-
-
20. A method for assisting a server to index web pages, the method comprising:
-
distributing an application to a client, the application configured to operate on the client and to cause the client to periodically transmit the cached copy of a web page to a server; generating a first index information based on a cached copy of the web page received from the client; updating a search index based on a comparison between the first index and a second index generated from a sample of the web page by a web crawling application. - View Dependent Claims (21)
-
-
22. A method for avoiding masked web page content indexing errors, the method comprising:
-
receiving cached user copies of web pages from client sources at a server, wherein the cached user copies are identified by respective URLs that designate network addresses other than the client sources; updating a search index with the cached user copies; and storing the updated search index, wherein the updated search index is used to generate search results for at least one client. - View Dependent Claims (23, 24, 25)
-
Specification