Method and apparatus for an application crawler
First Claim
Patent Images
1. A computer-implemented method for searching for files on the Internet, the method comprising:
- finding a target URL;
downloading an HTML file for the target URL;
downloading supplementary data files used to build a web application, based on information in the HTML file;
assembling application components from the supplementary data files and the HTML file;
instantiating application components to create the web application;
applying data-query interfaces to all media objects in the web application that may contain useful data;
loading a pre-defined Application template or generating and automatically defining an Application template;
applying the Application template to extract all of the desired information from the web application;
saving the desired information to a file or database as a structured data information record;
examining all components in the web application to identify all possible components that could respond to a mouse event or form a clickable item;
determining which clickable items have appeared since a last simulated mouse event;
storing new clickable items in an appropriate data structure on a storage medium, wherein the appropriate data structure is a new branch of a clickable item tree containing all clickable items in the application at all possible application states;
simulating a mouse click on a first clickable item in a current branch of the clickable item tree; and
repeating this method until the entire clickable item tree has been traversed including, while continuing to instantiate the application components to create the web application, at a subsequent point in time, relating information gathered from examining subsequently loaded and instantiated components of the web application that are displayed at the subsequent point in time.
5 Assignments
0 Petitions
Accused Products
Abstract
A computer-implemented method is provided for searching for files on the Internet. In one embodiment, the method may provide an application crawler that assembles and dynamically instantiates all components of a web page. The instantiated web application may then be analyzed to locate desired components on the web page. This may involve finding and analyzing all clickable items in the application, driving the web application by injecting events, and extracting information from the application and writing it to a file or database.
-
Citations
6 Claims
-
1. A computer-implemented method for searching for files on the Internet, the method comprising:
-
finding a target URL; downloading an HTML file for the target URL; downloading supplementary data files used to build a web application, based on information in the HTML file; assembling application components from the supplementary data files and the HTML file; instantiating application components to create the web application; applying data-query interfaces to all media objects in the web application that may contain useful data; loading a pre-defined Application template or generating and automatically defining an Application template; applying the Application template to extract all of the desired information from the web application; saving the desired information to a file or database as a structured data information record; examining all components in the web application to identify all possible components that could respond to a mouse event or form a clickable item; determining which clickable items have appeared since a last simulated mouse event; storing new clickable items in an appropriate data structure on a storage medium, wherein the appropriate data structure is a new branch of a clickable item tree containing all clickable items in the application at all possible application states; simulating a mouse click on a first clickable item in a current branch of the clickable item tree; and repeating this method until the entire clickable item tree has been traversed including, while continuing to instantiate the application components to create the web application, at a subsequent point in time, relating information gathered from examining subsequently loaded and instantiated components of the web application that are displayed at the subsequent point in time. - View Dependent Claims (2)
-
-
3. A computer program product comprising:
- a computer usable storage medium and computer readable code embodied on said computer usable storage medium, the computer readable code comprising computer executable instructions that, as executed by a processor, cause a computer implemented system to perform a method for;
finding a target URL; downloading an HTML file for the target URL; downloading supplementary data files used to build a web application, based on information in the HTML file; assembling application components from the supplementary data files and the HTML file; instantiating application components to create the web application; applying data-query interfaces to all media objects in the web application that may contain useful data; loading a pre-defined Application template or generating and automatically defining an Application template; applying the Application template to extract all of the desired information from the web application; saving the desired information to a file or database as a structured data information record; examining all components in the web application to identify all possible components that could respond to a mouse event or form a clickable item; determining which clickable items have appeared since a last simulated mouse event; storing new clickable items in an appropriate data structure on a storage medium, wherein the appropriate data structure is a new branch of a clickable item tree containing all clickable items in the application at all possible application states; simulating a mouse click on a first clickable item in a current branch of the clickable item tree; and repeating this method until the entire clickable item tree has been traversed including, while continuing to instantiate the application components to create the web application, at a subsequent point in time, relating information gathered from examining subsequently loaded and instantiated components of the web application that are displayed at the subsequent point in time. - View Dependent Claims (4)
- a computer usable storage medium and computer readable code embodied on said computer usable storage medium, the computer readable code comprising computer executable instructions that, as executed by a processor, cause a computer implemented system to perform a method for;
-
5. A computer system having a storage medium having computer-executable code stored thereon and a processor, the computer system comprising:
-
an application crawler having programming code configured to, as executed by a processor; find a target URL; download an HTML file for the target URL; download supplementary data files used to build a web application, based on information in the HTML file; assemble application components from the supplementary data files and the HTML file; instantiate application components to create the web application;
apply data-query interfaces to all media objects in the web application that may contain useful data;load a pre-defined Application template or generating and automatically defining an Application template; apply the Application template to extract all of the desired information from the web application; save the desired information to a file or database as a structured data information record; examine all components in the web application to identify all possible components that could respond to a mouse event or form a clickable item; determine which clickable items have appeared since a last simulated mouse event; store new clickable items in an appropriate data structure on a storage medium, wherein the appropriate data structure is a new branch of a clickable item tree containing all clickable items in the application at all possible application states; simulate a mouse click on a first clickable item in a current branch of the clickable item tree; and repeat this method until the entire clickable item tree has been traversed including, while continuing to instantiate the application components to create the web application, at a subsequent point in time, relating information gathered from examining subsequently loaded and instantiated components of the web application that are displayed at the subsequent point in time. - View Dependent Claims (6)
-
Specification