Link discovery from web scripts
First Claim
1. A computer-implemented method for discovering links in a script, the method comprising:
- receiving webpages associated with one or more scripts, wherein the one or more scripts are Javascripts;
processing the webpages to locate the one or more Javascripts, wherein processing the webpages to locate the Javascripts further comprises;
extracting markup language tags from the webpages to locate function calls, variables, and constants; and
identifying script elements and non-script elements based on the markup language tags;
accessing rules corresponding to the one or more Javascripts;
parsing the one or more Javascripts based on the rules corresponding to the one or more Javascripts, wherein the rules include base rules that are applied to all web pages, site rules that are applied to all webpages from a specific site, and auto-discovered rules that are applied to specific webpages;
identifying segments of the one or more Javascripts that satisfy the rules;
evaluating the identified segments of the one or more Javascripts and applying the rules to the extracted function calls, variables, and constants to generate links;
storing the generated links in an index; and
retrieving content associated with the generated links;
optimizing the retrieved content;
andstoring the optimized content and metadata with the generated links, wherein the metadata comprises types of content associated with the generated links, types of files associated with the generated links, dialog attributes associated with the generated links, pop-up attributes associated with the generated links, and display sizes of the content associated with the generated links.
2 Assignments
0 Petitions
Accused Products
Abstract
A computer-implemented method, a computer system, and computer media for discovering links in scripts are provided. The computer system includes a crawler, a rules engine, and an index that are utilized to store links generated by scripts located in webpages in the index. The crawler traverses a network to locate webpages having scripts. The rules engine parses the located webpages and extracts the scripts based on rules that are satisfied by segments of the extracted scripts. The rules engine evaluates the segments of the extracted scripts to generate links. After the rules engine validates the links, the rules engine transmits the links to the index for storage.
-
Citations
19 Claims
-
1. A computer-implemented method for discovering links in a script, the method comprising:
-
receiving webpages associated with one or more scripts, wherein the one or more scripts are Javascripts; processing the webpages to locate the one or more Javascripts, wherein processing the webpages to locate the Javascripts further comprises; extracting markup language tags from the webpages to locate function calls, variables, and constants; and identifying script elements and non-script elements based on the markup language tags; accessing rules corresponding to the one or more Javascripts; parsing the one or more Javascripts based on the rules corresponding to the one or more Javascripts, wherein the rules include base rules that are applied to all web pages, site rules that are applied to all webpages from a specific site, and auto-discovered rules that are applied to specific webpages; identifying segments of the one or more Javascripts that satisfy the rules; evaluating the identified segments of the one or more Javascripts and applying the rules to the extracted function calls, variables, and constants to generate links; storing the generated links in an index; and retrieving content associated with the generated links; optimizing the retrieved content; and storing the optimized content and metadata with the generated links, wherein the metadata comprises types of content associated with the generated links, types of files associated with the generated links, dialog attributes associated with the generated links, pop-up attributes associated with the generated links, and display sizes of the content associated with the generated links. - View Dependent Claims (2, 3, 16, 19)
-
-
4. One or more computer-readable storage devices having computer-executable instructions embodied thereon that perform a method for generating an index that stores links discovered in scripts, the method comprising:
-
crawling a network to locate webpages; storing in an index metadata corresponding to each located webpage; parsing the located webpages to identify scripts associated with the located webpages; retrieving rules that check the identified scripts for variables, functions, or events, wherein a subset of the variables for the identified scripts are checked to confirm a change of value; when a change of value for the subset of variables is confirmed, generating links based on the variables, functions, or events; evaluating the variables, functions, or events to verify the validity of the generated links; adding the generated links to the index when the generated links are verified; storing the generated links in the index; retrieving content associated with the generated links; optimizing the retrieved content; and storing the metadata with the generated links and the optimized content in the index, wherein the metadata comprises types of content associated with the generated links, types of files associated with the generated links, dialog attributes associated with the generated links, pop-up attributes associated with the generated links, and display sizes of the content associated with the generated links. - View Dependent Claims (5, 6, 7, 8, 9, 10, 11, 17, 18)
-
-
12. A computing system having processor and hardware memories for discovering links in a script of a webpage, the system comprising:
-
a crawler accessing web pages to identify scripts associated with the webpages, wherein the webpages are HTML pages; a rules engine that parses the identified scripts and evaluates portions of the identified scripts based on rules that detect link-generating segments of the identified scripts, wherein the segments of the identified scripts are evaluated based on variables and expressions specified in the identified scripts and matching function patterns located in the segment and the rules to detect links generated by the identified scripts; an index to store the detected links and metadata for the detected links, wherein the metadata comprises types of content associated with detected links, types of files associated with the detected links, dialog attributes associated with the detected links, pop-up attributes associated with the detected links, and display sizes of the content associated with the detected links; and the processor that retrieves the content associated with the detected links;
optimizes the retrieved content; and
optimized content in the index with the detected links. - View Dependent Claims (13, 14, 15)
-
Specification