Filter definition for distribution mechanism for filtering, formatting and reuse of web based content
First Claim
1. A method for defining a filter used to extract web content for a web page wherein the extracted content is used in a recast web page produced by a hosting site, comprising the steps of:
- retrieving multiple versions of at least one original web page from a content provider web server;
parsing the multiple versions of the original web page to identify a set of selectable content elements;
comparing the multiple versions of the web page to identify static and dynamic content elements;
presenting a representation of the original web page in a user interface, wherein the selectable content elements are demarcated and marked as either static or dynamic elements;
responsive to user input, selecting content elements for inclusion in the filter; and
constructing the filter so that the selected content elements are extracted from a retrieved web page from the content provider web server and reused in the recast web page.
1 Assignment
0 Petitions
Accused Products
Abstract
An automated means for defining a filter used to extract web content for a web page is disclosed wherein the extracted content is used in a recast web page. The recast web page may be produced by a hosting site, or may be part of an effort to revise a web site at a web content provider. First, a set of pages, possibly a single page, is retrieved from a content provider web server. Next, the web page is parsed to identify a set of selectable content elements. Next, a representation of the original web page is presented in a user interface, wherein the selectable content elements are demarcated. The user will select some of the elements for inclusion in the filter through the user interface, whereby the tool will indicate the selected content elements for inclusion in the filter. The tool constructs the filter so that when the filter is used, the selected content elements are extracted from a retrieved web page from the content provider web server and reused in the recast web page. As part of the process of identifying the selectable content elements, a set of varied headers can be used to retrieve multiple versions of the same web page. In this way, the multiple versions of the web page are compared to identify static and dynamic content elements and marked as static or dynamic.
-
Citations
27 Claims
-
1. A method for defining a filter used to extract web content for a web page wherein the extracted content is used in a recast web page produced by a hosting site, comprising the steps of:
-
retrieving multiple versions of at least one original web page from a content provider web server;
parsing the multiple versions of the original web page to identify a set of selectable content elements;
comparing the multiple versions of the web page to identify static and dynamic content elements;
presenting a representation of the original web page in a user interface, wherein the selectable content elements are demarcated and marked as either static or dynamic elements;
responsive to user input, selecting content elements for inclusion in the filter; and
constructing the filter so that the selected content elements are extracted from a retrieved web page from the content provider web server and reused in the recast web page. - View Dependent Claims (2, 3, 4, 5, 6, 7)
associating a URL with the filter; and
using the filter to extract web content from web pages from the associated URL.
-
-
5. The method as recited in claim 1, further comprising the steps of:
-
associating a label with each respective selected content element;
using the filter to extract selected content elements from a web page from a web content provider web site;
using the associated labels to insert the selected content elements into a web page template containing a hosting web server format, thus creating the recast web page; and
serving the recast web page to the client browser;
wherein the appearance of the recast page when presented by the client browser is as though all elements originated at the hosting web server.
-
-
6. The method as recited in claim 5, wherein one of the desired content elements is an advertisement element from the content provider web server, and the method further comprises the step of inserting a call back to the content provider web server for the advertising element.
-
7. The method as recited in claim 5, further comprising the step of processing the desired content elements to eliminate harmful code, prior to insertion in the web page template.
-
8. A method for defining a filter used to extract web content from a web page for reuse in a recast web page, comprising the steps of:
-
parsing a web page to identify a set of selectable content elements;
parsing multiple versions of the web page to identify dynamic and static selectable content elements;
presenting a representation of the original web page in a user interface, wherein whether a given selectable content element is dynamic or static is indicated;
responsive to user input, selecting content elements for inclusion in the filter; and
constructing the filter so that the selected content elements are extracted from a retrieved web page from the web server and reused in the recast web page. - View Dependent Claims (9, 10, 11, 12, 14, 15, 16)
selecting at least one web page representative of a set of web pages on a web server; and
including link data in the filter so that when one of the set of pages is called, the filter is used to extract selected content elements from the called page.
-
-
10. The method as recited in claim 9, wherein a plurality of filters are constructed for a web site on the web server, each for a respective set of pages on the web site.
-
11. The method as recited in claim 9, wherein the link data included in the filter is a URL having a wildcarded ending.
-
12. The method as recited claim 8, further comprising the steps of:
-
calling a set of web pages from a web server for a web site;
using the filter to extract selected content elements from each of the set of web pages;
using the extracted content elements to construct a new set of web pages for the web site.
-
-
14. The method as recited in claim 12, further comprising the steps of:
-
parsing data associated with each selectable content element;
matching the parsed data to data in a table of available labels, each available label corresponding to respective web page data; and
responsive to a match of the parsed data to data in the table, highlighting the corresponding label in the pop-up of labels.
-
-
15. The method as recited in claim 8, further comprising the step of presenting a demarcation of each selectable element in the web page representation.
-
16. The method as recited in claim 8, further comprising the steps of:
-
determining client specific information about a client browser from which a request originated;
selecting among a set of filters stored in a filter definition database on a hosting server based on the client specific information, wherein each of the filters extracts different selected content elements from a web page; and
using the selected filter for creating a recast web page to be sent to the client browser.
-
-
13. A method for defining a filter used to extract web content from a web page for reuse in a recast web page, comprising the steps of:
-
parsing a web page to identify a set of selectable content elements;
presenting a representation of the original web page in a user interface;
responsive to detecting selection of a content element, presenting a pop-up of labels available for the selected content element;
responsive to selection of one of the labels, associating the label with the selected content element;
responsive to user input, selecting content elements for inclusion in the filter; and
constructing the filter so that when the filter is used the selected content elements are extracted from a retrieved web page from the web server and reused in the recast web page.
-
-
17. A system including processor and memory for defining a filter used to extract web content from a web page for reuse in a recast web page, comprising:
-
means for parsing a web page to identify a set of selectable content elements;
means for parsing multiple versions of the web page to identify dynamic and static selectable content elements;
means for presenting a representation of the original web page in a user interface having user input sensitive areas corresponding to respective selectable content elements, wherein whether a given selectable content element is dynamic or static is indicated;
means responsive to user input for selecting content elements for inclusion in the filter; and
means for constructing the filter so that the selected content elements are extracted from a retrieved web page from the web server and reused in the recast web page. - View Dependent Claims (18, 19, 20, 21)
means for receiving requests from client browsers;
means for retrieving web pages from web content provider servers;
means for using the filter to extract selected content elements in the retrieved pages;
means for recasting the extracted content elements in recast pages; and
means for sending the recast pages to the client browsers.
-
-
19. The system as recited in claim 18 further comprising:
-
means for storing constructed filters;
means for selecting a filter from the storing means; and
means for using the selected filter for extracting selected content elements from the received web pages for constructing recast web pages in a hosting server format.
-
-
20. The method as recited in claim 17, further comprising:
-
means for selecting at least one web page representative of a set of web pages on a web server; and
means for including link data in the filter; and
means for using the included link data so that when one of the set of pages is retrieved responsive to a client request, the filter is used to extract selected content elements from the retrieved page.
-
-
21. The system as recited in claim 17, further comprising:
-
a store for a plurality of filters, wherein a set of the plurality of filters is constructed for a content provider web site on a web server, each filter for a respective set of pages on the content provider web site means for presenting a representation of the original web page in a user interface having user input sensitive areas corresponding to respective selectable content elements.
-
-
22. A computer program product in a computer readable medium for defining a filter used to extract web content from a web page for reuse in a recast web page, comprising:
-
means for parsing a web page to identify a set of selectable content elements;
means for parsing multiple versions of the web page to identify dynamic and static selectable content elements;
means for presenting a representation of the original web page in a user interface, wherein whether a given selectable content element is dynamic or static is indicated;
means responsive to user input for selecting content elements for inclusion in the filter; and
means for constructing the filter so that the selected content elements are extracted from a retrieved web page from the web server and reused in the recast web page. - View Dependent Claims (23, 24, 25, 27)
means for selecting at least one web page representative of a set of web pages on a web server; and
means for including link data in the filter so that when one of the set of pages is called, the filter is used to extract selected content elements from the called page.
-
-
25. The product as recited in claim 22, further comprising:
-
means for retrieving a set of web pages from a web server for a web site;
means for using the filter to extract selected content elements from each of the set of web pages;
means for using the extracted content elements to construct a new set of web pages for the web site.
-
-
27. The product as recited in claim 22, further comprising means for presenting a demarcation of each selectable element in the web page representation.
-
26. A computer program product in a computer readable medium for defining a filter used to extract web content from a web page for reuse in a recast web page, comprising:
-
means for parsing a web page to identify a set of selectable content elements;
means for presenting a representation of the original web page in a user interface;
means for presenting a set of labels available for the selected content element;
means for associating selected labels with respective selected content elements;
means responsive to user input for selecting content elements for inclusion in the filter; and
means for constructing the filter so that the selected content elements are extracted from a retrieved web page from the web server and reused in the recast web page.
-
Specification