Operator-guided application crawling architecture
First Claim
1. A system for automated acquisition of content from an application, the system comprising:
- a guide tracker module configured to monitor interaction of an operator with an executing instance of the application and record a set of guides, wherein each guide in the set of guides includes a recorded sequence of user interface interactions concluding at a respective ultimate state of the application;
a link extraction controller configured to, for each guide of the set of guides;
selectively identify additional states of the application that correspond to the respective ultimate state andadd the additional states corresponding to the respective ultimate state and the respective ultimate state to a state list,wherein the additional states and the respective ultimate state are all directly reachable from a common penultimate state of the application,wherein the common penultimate state of the application is immediately prior to the respective ultimate state in the guide, andwherein each entry in the state list designates (i) a state and (ii) a path of user interface interactions to arrive at the state; and
a scraper module configured to, within an executing instance of the application, extract text and metadata from the states designated by each of the entries in the state list, wherein information based on the extracted text and metadata is stored in a data store.
4 Assignments
0 Petitions
Accused Products
Abstract
A system for automated acquisition of content from an application includes a guide tracker module, a link extraction controller, and a scraper. The guide tracker module monitors interaction of an operator with an executing instance of the application and records a set of guides. Each guide includes a recorded sequence of user interface interactions concluding at a respective ultimate state of the application. The link extraction controller, for each guide of the set of guides, selectively identifies additional states of the application that correspond to the respective ultimate state and adds the additional states corresponding to the respective ultimate state and the respective ultimate state to a state list. The additional states and the respective ultimate state are all directly reachable from a common penultimate state of the application. Each entry in the state list designates a state and a path of user interface interactions to arrive at the state.
-
Citations
23 Claims
-
1. A system for automated acquisition of content from an application, the system comprising:
-
a guide tracker module configured to monitor interaction of an operator with an executing instance of the application and record a set of guides, wherein each guide in the set of guides includes a recorded sequence of user interface interactions concluding at a respective ultimate state of the application; a link extraction controller configured to, for each guide of the set of guides; selectively identify additional states of the application that correspond to the respective ultimate state and add the additional states corresponding to the respective ultimate state and the respective ultimate state to a state list, wherein the additional states and the respective ultimate state are all directly reachable from a common penultimate state of the application, wherein the common penultimate state of the application is immediately prior to the respective ultimate state in the guide, and wherein each entry in the state list designates (i) a state and (ii) a path of user interface interactions to arrive at the state; and a scraper module configured to, within an executing instance of the application, extract text and metadata from the states designated by each of the entries in the state list, wherein information based on the extracted text and metadata is stored in a data store. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A method for automated acquisition of content from an application, the method comprising:
-
monitoring interaction of an operator with an executing instance of the application; recording a set of guides according to the monitoring, wherein each guide in the set of guides includes a recorded sequence of user interface interactions concluding at a respective ultimate state of the application; for each guide of the set of guides; selectively identifying additional states of the application that correspond to the respective ultimate state and adding the additional states corresponding to the respective ultimate state and the respective ultimate state to a state list, wherein the additional states and the respective ultimate state are all directly reachable from a common penultimate state of the application, wherein the common penultimate state of the application is immediately prior to the respective ultimate state in the guide, and wherein each entry in the state list designates (i) a state and (ii) a path of user interface interactions to arrive at the state; and within an executing instance of the application, extracting text and metadata from the states designated by each of the entries in the state list, wherein information based on the extracted text and metadata is stored in a data store. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23)
-
Specification