×

SYSTEM AND METHOD FOR DOWNLOADING HYPERTEXT MARKUP LANGUAGE FORMATTED WEB PAGES

  • US 20080046449A1
  • Filed: 05/31/2007
  • Published: 02/21/2008
  • Est. Priority Date: 08/18/2006
  • Status: Active Grant
First Claim
Patent Images

1. A system for downloading Hypertext Markup Language (HTML) formatted Web pages, the system comprising an application server, a local server coupled to the application server, a Web server electronically connected with the application server via a network and a database, the application server comprising:

  • a writing module for extracting a Uniform Resource Locator (URL) of the HTML formatted Web page to be downloaded and writing the URL extracted to an XQuery script document;

    an analyzing module for analyzing the XQuery script document to obtain the URL of the HTML Web page to be downloaded via the Web server, saving the downloaded HTML Web page in the database as the local Web page, and for analyzing the contents of the local Web page to identify target contents by invoking the XQuery script document;

    a converting module for extracting the relative URLs of all image files of the target contents of the local Web page and converting the relative URLs of all image files to absolute URLs of the image files, and for extracting all relative URLs of source of embedded links of the target contents of the local Web page and converting the relative URLs of source of the embedded links to absolute URLs of the source of the embedded links;

    a downloading module for downloading all the image files of the target contents via the Web server according to the absolute URLs, and saving the image files in an local image file path of the local server;

    a saving module for saving all the converted absolute URLs of the source of the embedded links in the database, creating an identifier for each the converted absolute URL, and saving all the identifiers in the database; and

    a replacing module for replacing the absolute URLs of the image files of the local Web page with the local image file path, and for writing all the identifiers and the Java Server Pages (JSP) language into an embedded link local path in the local server of the source of the embedded links, and replacing the converted absolute URLs of the source of the embedded links with the embedded link local path.

View all claims
  • 3 Assignments
Timeline View
Assignment View
    ×
    ×