×

Predicting and using utility of script execution in functional web crawling and other crawling

  • US 10,649,740 B2
  • Filed: 01/15/2015
  • Issued: 05/12/2020
  • Est. Priority Date: 01/15/2015
  • Status: Active Grant
First Claim
Patent Images

1. A computer-implemented method, comprising:

  • performing, by a computer system serving webpages through a network to one or more client computers, analysis of a web application program, the analysis comprising;

    training the computer system to learn which script functions to execute, wherein training the computer system includes populating a database comprising of mappings between feature vectors and corresponding utilities, wherein each feature vector corresponds to a script function, wherein each feature vector is generated using a specification set of features, wherein each feature vector is based on syntactic and document object model (DOM) related characteristics of the script function, and wherein each utility corresponds to one or more descriptions of function behaviors observed during execution of the script function;

    executing, by the computer system, the web application program comprising markup language for a webpage, the markup language comprising a plurality of script functions, the executing comprising performing the following by the computer system during program execution;

    crawling the web application program code while the web application program is being executed;

    extracting and reducing the plurality of script functions encountered from the web application program code based on extraction rules, wherein the extraction and reduction process comprises of modeling the script function as an Abstract Syntax Tree (AST);

    for each script function of the plurality of script functions, reducing the script function to a feature vector using the specification set of features and basing the feature vector on syntactic and DOM related characteristics of the script function;

    predicting, for each selected script function of the plurality of script functions, whether the selected script function should or should not be executed by determining whether the feature vector corresponding to the selected script function is a known feature vector in the database;

    in response to determining the feature vector is not a known feature vector in the database;

    (a) executing the selected script function in order to determine the utility of the selected script function, wherein determining the utility of the selected script function is based on one or more descriptions of function behaviors observed during execution of the selected script function, and wherein the function behavior of the selected script function corresponds to at least one of;

    a characterization of relevant DOM effects, a flag or confidence level indicating whether Asynchronous JavaScript+XML (AJAX) calls may be executed, or a measure of execution cost; and

    (b) adding an entry comprising a mapping of the feature vector and the utility to the database;

    in response to determining the feature vector is a known feature vector in the database;

    (a) selecting an entry in the database matching the feature vector;

    (b) accessing the utility for the selected entry, wherein the utility was determined previously during training of the computer system;

    (c) using a prediction model, determining whether to execute the selected script function or skip executing the selected script function based on the utility being sufficiently high or not sufficiently high;

    (d) in response to determining the utility is not sufficiently high, skipping execution of the selected script function; and

    (e) in response to determining the utility is sufficiently high, (i) executing the selected script function; and

    (ii) updating the prediction model in response to a predicted behavior of the selected script function not being compatible with an observed behavior of the selected script function,wherein updating the prediction model comprises establishing a new mapping between the feature vector and the utility, the updating the prediction model further comprising instantiating the prediction model with a similarity algorithm incrementally or with a new similarity algorithm;

    rendering the webpage in accordance with determining, based on the prediction model, whether the selected script function should be executed or should not be executed, wherein determining the selected script function should not be executed results in skipping execution of the selected script function; and

    serving the rendered webpage through the network to the one or more client computers.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×