SYSTEM AND METHOD FOR DOWNLOADING HYPERTEXT MARKUP LANGUAGE FORMATTED WEB PAGES
First Claim
1. A system for downloading Hypertext Markup Language (HTML) formatted Web pages, the system comprising an application server, a local server coupled to the application server, a Web server electronically connected with the application server via a network and a database, the application server comprising:
- a writing module for extracting a Uniform Resource Locator (URL) of the HTML formatted Web page to be downloaded and writing the URL extracted to an XQuery script document;
an analyzing module for analyzing the XQuery script document to obtain the URL of the HTML Web page to be downloaded via the Web server, saving the downloaded HTML Web page in the database as the local Web page, and for analyzing the contents of the local Web page to identify target contents by invoking the XQuery script document;
a converting module for extracting the relative URLs of all image files of the target contents of the local Web page and converting the relative URLs of all image files to absolute URLs of the image files, and for extracting all relative URLs of source of embedded links of the target contents of the local Web page and converting the relative URLs of source of the embedded links to absolute URLs of the source of the embedded links;
a downloading module for downloading all the image files of the target contents via the Web server according to the absolute URLs, and saving the image files in an local image file path of the local server;
a saving module for saving all the converted absolute URLs of the source of the embedded links in the database, creating an identifier for each the converted absolute URL, and saving all the identifiers in the database; and
a replacing module for replacing the absolute URLs of the image files of the local Web page with the local image file path, and for writing all the identifiers and the Java Server Pages (JSP) language into an embedded link local path in the local server of the source of the embedded links, and replacing the converted absolute URLs of the source of the embedded links with the embedded link local path.
3 Assignments
0 Petitions
Accused Products
Abstract
A method for downloading HTML formatted Web pages is provided. The method includes the steps of writing a URL of a Web page to be downloaded to an XQuery script; analyzing the XQuery script to obtain the URL of the HTML Web page and saving the downloaded Web page in a database as the local Web page; analyzing the contents of the local Web page to obtain target contents; converting the relative URLs of all image files to the absolute URLs; downloading all the image files according to the absolute URLs; replacing the absolute URLs of the image files with an local image file path; converting the relative URLs of the embedded links to the absolute URLs of the embedded links; saving all the converted absolute URLs in the database, creating identifiers; replacing the converted absolute URLs of the embedded links with an embedded link local path. A related system is also disclosed.
-
Citations
6 Claims
-
1. A system for downloading Hypertext Markup Language (HTML) formatted Web pages, the system comprising an application server, a local server coupled to the application server, a Web server electronically connected with the application server via a network and a database, the application server comprising:
-
a writing module for extracting a Uniform Resource Locator (URL) of the HTML formatted Web page to be downloaded and writing the URL extracted to an XQuery script document; an analyzing module for analyzing the XQuery script document to obtain the URL of the HTML Web page to be downloaded via the Web server, saving the downloaded HTML Web page in the database as the local Web page, and for analyzing the contents of the local Web page to identify target contents by invoking the XQuery script document; a converting module for extracting the relative URLs of all image files of the target contents of the local Web page and converting the relative URLs of all image files to absolute URLs of the image files, and for extracting all relative URLs of source of embedded links of the target contents of the local Web page and converting the relative URLs of source of the embedded links to absolute URLs of the source of the embedded links; a downloading module for downloading all the image files of the target contents via the Web server according to the absolute URLs, and saving the image files in an local image file path of the local server; a saving module for saving all the converted absolute URLs of the source of the embedded links in the database, creating an identifier for each the converted absolute URL, and saving all the identifiers in the database; and a replacing module for replacing the absolute URLs of the image files of the local Web page with the local image file path, and for writing all the identifiers and the Java Server Pages (JSP) language into an embedded link local path in the local server of the source of the embedded links, and replacing the converted absolute URLs of the source of the embedded links with the embedded link local path. - View Dependent Claims (2, 3)
-
-
4. A computer-based method for downloading Hypertext Markup Language (HTML) formatted Web pages, the method comprising the steps of:
-
writing a Uniform Resource Locator (URL) of the HTML formatted Web page to be downloaded to an XQuery script document; analyzing the XQuery script document to obtain the URL of the HTML Web page to be downloaded via Web server, and saving the downloaded HTML Web page in a database as the local Web page; analyzing the contents of the local Web page to identify the target contents by invoking the XQuery script document; extracting the relative URLs of all the image files of the target contents of the local Web page and converting the relative URLs of all the image files to the absolute URLs of the image files; downloading all the image files of the target contents via the Web server according to the absolute URLs, and saving the image files in the local image file path of a local server; replacing the absolute URLs of the image files of the local Web page with the local image file path; extracting all the relative URLs of the source of the embedded links of the target contents of the local Web page and converting the relative URLs of the source of the embedded links to absolute URLs of the source of the embedded links; saving the converted absolute URLs of the source of the embedded links in the database, creating an identifier for each the converted absolute URL, and saving all the identifiers in the database; writing all the identifiers and the Java Server Pages (JSP) language into an embedded link local path in the local server of the source of the embedded links, and replacing the converted absolute URLs of the source of the embedded links with the embedded link local path. - View Dependent Claims (5)
-
-
6. A software for downloading Hypertext Markup Language (HTML) formatted Web pages, the software comprising:
-
a writing module for extracting a Uniform Resource Locator (URL) of the HTML formatted Web page to be downloaded and writing the URL extracted to an XQuery script document; an analyzing module for analyzing the XQuery script document to obtain the URL of the HTML Web page to be downloaded, saving the downloaded HTML Web page in a database as the local Web page, and for analyzing the contents of the local Web page to identify target contents by invoking the XQuery script document; a converting module for extracting the relative URLs of all image files of the target contents of the local Web page and converting the relative URLs of all image files to absolute URLs of the image files, and for extracting all relative URLs of source of embedded links of the target contents of the local Web page and converting the relative URLs of source of the embedded links to absolute URLs of the source of the embedded links; a downloading module for downloading all the image files of the target contents according to the absolute URLs, and saving the image files in a local server; a saving module for saving all the converted absolute URLs of the source of the embedded links in the database, creating an identifier for each converted absolute URL, and saving all the identifiers in the database; and a replacing module for replacing the absolute URLs of the image files of the local Web page with the local image file path, and for writing all the identifiers and the Java Server Pages (JSP) language into an embedded link local path in the local server of the source of the embedded links, and replacing the converted absolute URLs of the source of the embedded links with the embedded link local path.
-
Specification