Method and apparatus for an application crawler
First Claim
Patent Images
1. A computer-implemented method comprising:
- loading multiple web page components;
assembling the multiple web page components;
identifying a crawling template based on the multiple web page components;
identifying a period of time specified by the crawling template;
crawling, using one or more processors, an object model of the web page components;
identifying and indexing one or more objects that are loaded during crawling;
simulating a user event;
in response to the simulated user event, pausing crawling of the object model for the identified period of time;
continuing to crawl the object model after the identified period of time has elapsed; and
identifying and indexing one or more objects that loaded during the identified period of time that the crawling of the object model was paused.
5 Assignments
0 Petitions
Accused Products
Abstract
A computer-implemented method is provided for searching for files on the Internet. In one embodiment, the method may provide an application crawler that assembles and dynamically instantiates all components of a web page. The instantiated web application may then be analyzed to locate desired components on the web page. This may involve finding and analyzing all clickable items in the application, driving the web application by injecting events, and extracting information from the application and writing it to a file or database.
-
Citations
28 Claims
-
1. A computer-implemented method comprising:
-
loading multiple web page components; assembling the multiple web page components; identifying a crawling template based on the multiple web page components; identifying a period of time specified by the crawling template; crawling, using one or more processors, an object model of the web page components; identifying and indexing one or more objects that are loaded during crawling; simulating a user event; in response to the simulated user event, pausing crawling of the object model for the identified period of time; continuing to crawl the object model after the identified period of time has elapsed; and identifying and indexing one or more objects that loaded during the identified period of time that the crawling of the object model was paused. - View Dependent Claims (2, 3, 4, 5, 27, 28)
-
-
6. A computer-implemented method comprising:
-
loading multiple web page components; assembling the multiple web page components; executing the loaded and assembled multiple web page components to instantiate at least a portion of the web page components; identifying a crawling template based on the multiple web page components; identifying a period of time specified by the crawling template; crawling, using one or more processors, an object model of the web page components; indexing one or more objects during crawling; relating information gathered from crawling and indexing with objects; simulating a user event; in response to the simulated user event, pausing crawling of the object model for the identified period of time; continuing to crawl, using the one or more processors, an updated object model after the identified period of time has elapsed; indexing at least one object that was loaded during the period of time that the crawling of the object model was paused; and relating information gathered from crawling and indexing the updated object model with objects that are displayed during the period of time. - View Dependent Claims (7, 8, 9, 10, 11)
-
-
12. A computer-implemented method comprising:
-
executing multiple web page components to instantiate the multiple web page components; identifying a crawling template based on the multiple web page components; identifying a period of time specified by the crawling template; crawling, using one or more processors, an object model of the web page components; indexing an object model of the web page components including identifying objects; simulating a user event; in response to the simulated user event, pausing crawling of the object model for the identified period of time; continuing to crawl the object model after the identified period of time has elapsed; locating video files after the identified period of time has elapsed that loaded during the identified period of time that the crawling of the object model was paused; indexing the located video files by saving pointers to the video files in a database; extracting first data about the video files from the object model; saving the first data in the database; detecting when a video file has been initiated for playing; extracting second data as the video file is played; and relating the second data with objects that were displayed at the same time that the second data was extracted.
-
-
13. A computer-implemented method comprising:
-
identifying a video-rich website; identifying a crawling template based on the video-rich website; identifying a period of time specified by the crawling template; crawling, using one or more processors, a web page of the identified video-rich website, wherein the crawling comprises; dynamically instantiating and assembling components of the web page to create an instantiated web application; identifying specific parts of the instantiated web application that contain useful information; providing logic for extracting the information into a metadata record by applying data-query interfaces to media player objects in the instantiated web application; using the data-query interfaces to query the media player objects for media player properties and for metadata about downloaded audio or video streams; analyzing the instantiated web application to extract information from the web application; writing the extracted information to a file or database; simulating a user event; in response to the simulated user event, pausing crawling of the web page; and continuing to crawl the web page after the period of time has elapsed. - View Dependent Claims (14, 15, 16)
-
-
17. A non-transitory computer-readable storage medium including a set of instructions that, when executed, cause at least one processor to perform steps comprising:
-
loading multiple web page components; assembling the multiple web page components; identifying a crawling template based on the multiple web page components; identifying a period of time specified by the crawling template; executing the loaded and assembled multiple web page components to instantiate at least a portion of the application; crawling an object model of the web page components; indexing one or more objects loaded during crawling; relating information gathered from crawling and indexing the object model with objects that are displayed; simulating a user event; in response to the simulated user event, pausing crawling of the object model for the identified period of time; continuing to crawl and index an object model after the identified period of time has elapsed; indexing at least one object that loaded during the period of time that the crawling of the object model was paused; and relating information gathered from crawling and indexing the updated object model with objects that displayed during the period of time that the crawling of the object model was paused. - View Dependent Claims (18, 19, 20, 21, 22)
-
-
23. A computer system comprising:
-
at least one processor; and at least one non-transitory computer readable storage medium storing instructions thereon that, when executed by the at least one processor, cause the system to; identify a video-rich website; identify a crawling template based on the video-rich website; identify a period of time specified by the crawling template; crawl a video-rich website, wherein the crawling comprises; dynamically instantiate and assemble components of a web page at the video-rich website to create an instantiated web application; identify specific parts of the instantiated web application that contain useful information in accordance with a template; provide logic for extracting that information into a metadata record; analyze the instantiated web application to extract information from the instantiated web application; write the extracted information to a file or database; simulate a user event; in response to the simulated user event, pause crawling of the web page; and continue to crawl the web page after the period of time has elapsed. - View Dependent Claims (24, 25, 26)
-
Specification