Interactive web crawler
First Claim
1. A method of web crawling hidden files, comprising:
- loading a web page with a browser agent;
executing any dynamic elements hosted on the web page using the browser agent to insert pre-determined values;
retrieving a list of form controls from the web page using the browser agent;
analyzing the form controls using a driver component of a crawler;
sending form control values from the driver component to the browser agent;
submitting an event to the web page by the browser agent or running any scripted content to trigger operations on the web page corresponding to the form control values;
generating a URL for various form control values using a generalizer; and
re-fetching, using the browser agent, web page content, a new list of form controls, and corresponding values for a new control that is dependent upon one of the form controls, wherein the browser agent re-fetches until all form controls are executed.
2 Assignments
0 Petitions
Accused Products
Abstract
The claimed subject matter provides a system or method for web crawling hidden files. An exemplary method includes loading a web page with a browser agent, and executing any dynamic elements hosted on the web page using the browser agent to insert pre-determined values. A list of form controls may be retrieved from the web page using the browser agent, and the controls may be analyzed using a driver component. Form control values may be sent from the driver component to the browser agent, and an event may be submitted to the web page by the browser agent or scripted content may be run to trigger operations on the web page corresponding to the form control values. A URL may be generated for various form control values using a generalizer.
-
Citations
19 Claims
-
1. A method of web crawling hidden files, comprising:
-
loading a web page with a browser agent; executing any dynamic elements hosted on the web page using the browser agent to insert pre-determined values; retrieving a list of form controls from the web page using the browser agent; analyzing the form controls using a driver component of a crawler; sending form control values from the driver component to the browser agent; submitting an event to the web page by the browser agent or running any scripted content to trigger operations on the web page corresponding to the form control values; generating a URL for various form control values using a generalizer; and re-fetching, using the browser agent, web page content, a new list of form controls, and corresponding values for a new control that is dependent upon one of the form controls, wherein the browser agent re-fetches until all form controls are executed. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system for web crawling hidden files, the system comprising:
-
a processing unit; and a system memory, wherein the system memory comprises code configured to direct the processing unit to; load a web page via a browser agent module; execute any dynamic elements hosted on the web page using the browser agent module to insert pre-determined values; retrieve a list of form controls from the web page using the browser agent module; analyze the form controls using a driver component module of a crawler; send form control values from the driver component module to the browser agent module; generate a URL for various form control values using a generalize module; and re-fetch, using the browser agent, web page content, a new list of form controls, and corresponding values for a new control that is dependent upon one of the form controls, wherein the browser agent re-fetches until all form controls are executed. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. One or more computer-readable storage media, comprising code configured to direct a processing unit to:
-
load a web page with a browser agent; execute any forms hosted on the web page using the browser agent to insert pre-determined values; retrieve a list of form controls from the web page using the browser agent; analyze the form controls using a driver component of a crawler; send form control values from the driver component to the browser agent; generate a URL for various form control values using a generalizer; and re-fetch, using the browser agent, web page content, a new list of form controls, and corresponding values for a new control that is dependent upon one of the form controls, wherein the browser agent re-fetches until all form controls are executed. - View Dependent Claims (16, 17, 18, 19)
-
Specification