Scalable derivative services
First Claim
Patent Images
1. A computer-implemented method for efficiently parsing received data files, comprising:
- receiving, by a virtual browser executing on a server that is intermediary to a plurality of clients and a plurality of web servers, a data file from one of the plurality of web servers, responsive to a request by one of the plurality of clients;
determining that the received data file comprises an object that is not cached on the server;
determining whether the object is currently being tracked;
retrieving a previously stored version of the data file and a syntax tree comprising nodes and tokens representing data within the previously stored version of the data file, the syntax tree including at least one static node;
comparing the previously stored version of the data file with the received data file and identifying non-matching content in the received data file;
parsing only the non-matching content of the received data file;
updating the syntax tree by replacing the at least one static node of the syntax tree with a new token;
creating a mapping from the new token to a subtree in the syntax tree; and
storing the updated syntax tree.
10 Assignments
0 Petitions
Accused Products
Abstract
An efficient method for parsing HTML pages identifies pages containing a mix of static and dynamic content. The pages are parsed to form abstract syntax trees (ASTs), which are then cached along with the pages. When a later version of a page is retrieved, it is compared against the cached version, and only those portions of the AST that contain different content are reparsed.
44 Citations
9 Claims
-
1. A computer-implemented method for efficiently parsing received data files, comprising:
-
receiving, by a virtual browser executing on a server that is intermediary to a plurality of clients and a plurality of web servers, a data file from one of the plurality of web servers, responsive to a request by one of the plurality of clients; determining that the received data file comprises an object that is not cached on the server; determining whether the object is currently being tracked; retrieving a previously stored version of the data file and a syntax tree comprising nodes and tokens representing data within the previously stored version of the data file, the syntax tree including at least one static node; comparing the previously stored version of the data file with the received data file and identifying non-matching content in the received data file; parsing only the non-matching content of the received data file; updating the syntax tree by replacing the at least one static node of the syntax tree with a new token; creating a mapping from the new token to a subtree in the syntax tree; and storing the updated syntax tree. - View Dependent Claims (2, 3)
-
-
4. A method for efficiently parsing web pages, comprising:
-
receiving, by a virtual browser executing on a server that is intermediary to a plurality of a clients and a plurality of web servers, an HTML page from one of the plurality of web servers, responsive to a request by one of the plurality of clients; determining that the received HTML page comprises an object that is not cached on the device server; determining whether the object is currently being tracked; retrieving a previously cached version of the HTML page and a syntax tree comprising nodes and tokens representing data within the previously cached version of the HTML page, the syntax tree including at least one static node; comparing the previously cached version of the HTML page with the received HTML page and identifying non-matching content in the received HTML page; parsing only the non-matching content in the received HTML page; updating the syntax tree by replacing the at least one static node of the syntax tree with a new token; creating a mapping from the new token to a subtree in the syntax tree; and storing the updated syntax tree and a most recent version of the HTML page.
-
-
5. A method for efficiently parsing HTML pages, comprising:
-
receiving, by a virtual browser executing on a server that is intermediary to a plurality of a clients and a plurality of web servers, an HTML page from one of the plurality of web servers, responsive to a request by one of the plurality of clients; responsive to a determination that a previously cached version of the HTML page exists; retrieving the previously cached version of the HTML page and a first syntax tree comprising nodes and tokens representing data within the previously cached version of the HTML page, the syntax tree including at least one static node; comparing the previously cached version of the HTML page with the received HTML page and identifying non-matching content in the received HTML page; parsing only the non-matching content in the received HTML page; updating the syntax tree by replacing the at least one static node of the syntax tree with a new token; and creating a mapping from the new token to a subtree in the syntax tree; responsive to a determination that a previously cached version of the HTML page does not exist; parsing the received HTML page and building a syntax tree comprising nodes and tokens representing content of the received data file, the syntax tree containing at only static nodes; and storing the syntax tree and the received HTML page in a cache.
-
-
6. A method for efficiently parsing received data files, comprising:
-
receiving, by a virtual browser executing on a server that is intermediary to a plurality of clients and a plurality of web servers, a data file from one of the plurality of web servers, responsive to a request by one of the plurality of clients; determining that the received data file comprises an object that is not cached on the server; determining whether the object is currently being tracked; retrieving a previously stored version of the data file and a syntax tree comprising nodes and tokens representing data within the previously stored version of the data file, the syntax tree including at least one static node; comparing the previously stored version of the data file with the received data file and identifying non-matching content present only in the received data file; parsing only the non-matching content of the first data file; updating the syntax tree by replacing the at least one static node of the syntax tree with a new token; mapping the new token to a subtree in the syntax tree; and storing the updated syntax tree. - View Dependent Claims (7)
-
-
8. A system for efficiently parsing input data from a plurality of content servers, comprising:
a server that is intermediary to a plurality of clients and a plurality of web servers, wherein the server performs the following functions; receiving a data file from one of the plurality of web servers, responsive to a request by one of the plurality of clients; determining that the received data file comprises an object that is not cached on the server; determining whether the object is currently being tracked; retrieving a previously stored version of the data file and a syntax tree comprising nodes and tokens representing data within the previously stored version of the data file, the syntax tree including at least one static node; comparing the previously stored version of the data file with the received data file and identifying non-matching content in the received data file; parsing only the non-matching content of the received data file; updating the syntax tree by replacing the at least one static node of the syntax tree with a new token; creating a mapping from the new token to a subtree in the syntax tree; and storing the updated syntax tree.
-
9. A system for efficiently parsing received data files transmitted between a client and a server, the system comprising:
a server that is intermediary to a plurality of clients and a plurality of web servers, wherein the server performs the following functions; receiving a data file from one of the plurality of web servers, responsive to a request by one of the plurality of clients; determining that the received data file comprises an object that is not cached on the server; determining whether the object is currently being tracked; retrieving a previously stored version of the data file and a syntax tree comprising nodes and tokens representing data within the previously stored version of the data file, the syntax tree including at least one static node; comparing the previously stored version of the data file with the received data file and identifying non-matching content in the received data file; parsing only the non-matching content of the received data file; updating the syntax tree by replacing the at least one static node of the syntax tree with a new token; creating a mapping from the new token to a subtree in the syntax tree; and storing the updated syntax tree.
Specification