Sitemap Generating Client for Web Crawler
First Claim
1. A method of listing documents performed by a website server system having one or more processors and memory storing one or more programs for execution by the one or more processors, comprising:
- accessing one or more sources of document information, wherein the one or more sources of document information are associated with a website server;
extracting the document information including metadata from the sources;
generating a sitemap of a website at the website server, the sitemap including a list of documents and corresponding metadata for each of a plurality of documents in the list of documents based on the document information;
storing the sitemap at a location; and
transmitting a notification from the website server to a remote computer associated with a web crawler system, the notification including information that identifies the location of the sitemap, the notification functioning as an indication that the sitemap is available for access.
1 Assignment
0 Petitions
Accused Products
Abstract
Methods and systems for a sitemap generating client for web crawlers are described. The client accesses one or more sources of document information about the documents available on a website, such as the file system, access logs, or pre-made URL lists. Document information is extracted from the sources and one or more sitemaps are generated based on the extracted document information. A notification is transmitted to a remote computer, informing that the sitemap(s) are available for access and likely have been updated. If the remote computer is associated with a web crawler, the remote computer may access the sitemap(s) and use the sitemaps to schedule a crawl of documents included or available on the website.
244 Citations
30 Claims
-
1. A method of listing documents performed by a website server system having one or more processors and memory storing one or more programs for execution by the one or more processors, comprising:
-
accessing one or more sources of document information, wherein the one or more sources of document information are associated with a website server; extracting the document information including metadata from the sources; generating a sitemap of a website at the website server, the sitemap including a list of documents and corresponding metadata for each of a plurality of documents in the list of documents based on the document information; storing the sitemap at a location; and transmitting a notification from the website server to a remote computer associated with a web crawler system, the notification including information that identifies the location of the sitemap, the notification functioning as an indication that the sitemap is available for access. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A system for listing documents, comprising:
-
one or more processors and memory, the memory comprising one or more sources of document information; and one or more modules including instructions to; access the sources of document information, wherein the sources are associated with a website server; extract the document information including metadata from the sources; generate a sitemap of a website at the website server, the sitemap including a list of documents and corresponding metadata for each of a plurality of documents in the list of documents based on the document information; store the sitemap at a location; and transmit a notification from the website server to a remote computer associated with a web crawler system, the notification including information that identifies the location of the sitemap, the notification functioning as an indication that the sitemap is available for access. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20)
-
-
21. A non-transitory computer readable storage medium and one or more computer programs embedded therein, the computer programs comprising instructions, which when executed by a computer system, cause the computer system to:
-
access one or more sources of document information, wherein the sources are associated with a website server; extract the document information including metadata from the sources; generate a sitemap of a website at the website server, the sitemap including a list of documents and corresponding metadata for each of a plurality of documents in the list of documents based on the document information; store the sitemap at a location; and transmit a notification from the website server to a remote computer associated with a web crawler system, the notification including information that identifies the location of the sitemap, the notification functioning as an indication that the sitemap is available for access. - View Dependent Claims (22, 23, 24, 25, 26, 27, 28, 29)
-
-
30. A system for listing documents, comprising:
-
one or more processors and memory, the memory comprising one or more sources of document information; means for accessing the sources of document information, wherein the sources are associated with a website server; means for extracting the document information including metadata from the sources; means for generating a sitemap of a website at the website server, the sitemap including a list of documents and corresponding metadata for each of a plurality of documents in the list of documents based on the document information; means for storing the sitemap at a location; and means for transmitting a notification from the website server to a remote computer associated with a web crawler system, the notification including information that identifies the location of the sitemap, the notification functioning as an indication that the sitemap is available for access.
-
Specification