AUTOMATED CLIENT SITEMAP GENERATION
First Claim
1. One or more computer storage media having computer-executable instructions embodied thereon for performing a method for automated sitemap generation, the method comprising:
- receiving a universal resource locator (URL) of a web site for which a sitemap is to be generated;
analyzing one or more files that log user visits to web pages based on respective URLs to determine if the URL of the web site has been previously crawled;
upon determining the URL of the web site has not been previously crawled, crawling at least one web page having a same domain as the web site URL in accordance with control permissions associated with the at least one web page to determine one or more data items relevant to generating the sitemap for the web site;
determining a relational structure of a plurality of web pages having the same domain as the web site URL, including the at least one web page;
analyzing one or more items of metadata respectively related to the plurality of web pages; and
generating a current sitemap utilizing the web site URL, one or more data items, relational structure and one or more items of metadata.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods and computer-storage media for automated generation of domain sitemap files are provided. A universal resource locator (URL) for a web site having a plurality of web pages associated therewith is received. Log files and permission controls are analyzed to ascertain whether each web page has been previously crawled and which web pages may be crawled and/or indexed. The permitted, not-previously-crawled web pages are subsequently crawled and the relational structure of the web site is ascertained. Other items of metadata, such as web page modification frequency or priority values, also are determined. Once the structure and metadata are available, a current sitemap is generated that provides the hierarchy and related details in the form of metadata. The sitemap file is then written to a disk and may then be sent to search engines as generated or in a compressed format.
-
Citations
20 Claims
-
1. One or more computer storage media having computer-executable instructions embodied thereon for performing a method for automated sitemap generation, the method comprising:
-
receiving a universal resource locator (URL) of a web site for which a sitemap is to be generated; analyzing one or more files that log user visits to web pages based on respective URLs to determine if the URL of the web site has been previously crawled; upon determining the URL of the web site has not been previously crawled, crawling at least one web page having a same domain as the web site URL in accordance with control permissions associated with the at least one web page to determine one or more data items relevant to generating the sitemap for the web site; determining a relational structure of a plurality of web pages having the same domain as the web site URL, including the at least one web page; analyzing one or more items of metadata respectively related to the plurality of web pages; and generating a current sitemap utilizing the web site URL, one or more data items, relational structure and one or more items of metadata. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A method for automatically generating a sitemap file at a client device, the method comprising:
-
receiving one or more web server log files associated with a universal resource locator (URL); receiving one or more files controlling permission for programmed crawling of a plurality of web pages having a same domain as the URL; analyzing data in the one or more web server log files and one or more permission-control files; crawling one or more permitted web pages of the plurality of web pages; determining a sitemap file structure for the URL based upon a relational structure of the plurality of web pages; for each respective web page of the one or more permitted web pages, determining one or more values that are incorporated as elements of metadata; generating a sitemap file that includes the sitemap file structure and respective elements of metadata for each of the one or more permitted web pages; and notifying one or more search engines of the sitemap file. - View Dependent Claims (16, 17, 18, 19)
-
-
20. A computer system embodied on one or more computer-storage media having computer-executable instructions embodied thereon for performing a method for automatically generating sitemap metadata, the system comprising:
-
receiving a universal resource indicator (URL) for a web site; based on the URL received, crawling each web page having a same domain as the web site URL that does not restrict programmed crawling; determining a relational structure of the web pages having the same domain as the web site URL; without invention from a user, calculating a priority value for each web page having the same domain as the web site URL based on a number of links each crawled web page having the same domain as the web site URL has from other web pages having the same domain as the web site URL, the number a web pages having the same domain as the web site URL, a number of times recorded in a URL log file that each web page URL has been visited, and a total number of visits to any URL recorded in the URL log file; without invention from the user, calculating a frequency of change for each crawled web page based on a time the calculation is performed, a time each respective crawled web page was last changed and one or more frequency threshold values; and generating a sitemap that includes the relational structure of the web pages having the same domain as the web site URL, the priority value for each crawled web page, any indication that the priority value has been manually modified for each crawled web page, the frequency of change of each crawled web page, and any indication that the frequency of change has been manually modified for each crawled web page.
-
Specification