Unguided application crawling architecture
First Claim
1. An apparatus for automated acquisition of content from an application and improved searching of the content in response to a query of a user device, the apparatus comprising:
- at least one processor, the at least one processor configured to track links, the tracking of links including;
controlling an executing instance of the application; and
for a selected state of the application;
controlling the executing application instance to navigate to the selected state, andidentifying a first set of application states reachable from the selected state, each of the first set of application states being reachable via a respective user interface interaction with the selected state,wherein the at least one processor is further configured to store records in a state storage based on the first set of application states, a first state record including;
(i) a representation of content of a first state of the first set of application states, and(ii) a unique identifier that uniquely identifies the first state within the records of the state storage, the unique identifier of the first state indicating a path followed within the executing application instance from a default state of the application to the first state, and the path including the user interface interaction corresponding to the first state,wherein the at least one processor is further configured to scrape records, including, for each of the stored records, extract text and metadata from the state, information based on the extracted text and metadata being stored in a data store, andwherein the at least one processor is further configured to provide at least one record in response to the query from the user device.
4 Assignments
0 Petitions
Accused Products
Abstract
A system for automated acquisition of content from an application includes a link tracking module that controls an instance of the application executing within an emulator. For a selected state, the link tracking module controls the executing application instance to navigate to the selected state and identifies a first set of application states reachable by user interface interaction. A state storage module stores records based on the first set. A first state record includes content of a first state of the first set and a unique identifier that uniquely identifies the first state. The unique identifier indicates a path followed within the executing application instance from a default state to the first state, including corresponding user interface interaction. A scraper module, for each of the records in the state storage module, navigates to the state specified by the unique identifier using the indicated path and extracts text from the state.
45 Citations
31 Claims
-
1. An apparatus for automated acquisition of content from an application and improved searching of the content in response to a query of a user device, the apparatus comprising:
-
at least one processor, the at least one processor configured to track links, the tracking of links including; controlling an executing instance of the application; and for a selected state of the application; controlling the executing application instance to navigate to the selected state, and identifying a first set of application states reachable from the selected state, each of the first set of application states being reachable via a respective user interface interaction with the selected state, wherein the at least one processor is further configured to store records in a state storage based on the first set of application states, a first state record including; (i) a representation of content of a first state of the first set of application states, and (ii) a unique identifier that uniquely identifies the first state within the records of the state storage, the unique identifier of the first state indicating a path followed within the executing application instance from a default state of the application to the first state, and the path including the user interface interaction corresponding to the first state, wherein the at least one processor is further configured to scrape records, including, for each of the stored records, extract text and metadata from the state, information based on the extracted text and metadata being stored in a data store, and wherein the at least one processor is further configured to provide at least one record in response to the query from the user device. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A method for automated acquisition of content from an application and provision of content for searching by a user device, the method comprising:
-
executing, using at least one processor of an apparatus for automated content acquisition and provision, an instance of the application; for a selected state of the application; (i) controlling, using the at least one processor, the executing application instance to navigate to the selected state, and (ii) identifying, using the at least one processor, a first set of application states reachable from the selected state, each of the first set of application states being reachable via a respective user interface interaction with the selected state; storing records in a state storage based on the first set of application states, wherein a first state record includes; (i) a representation of content of a first state of the first set of application states, and (ii) a unique identifier that uniquely identifies the first state within the stored records, the unique identifier of the first state indicating a path followed within the executing application instance from a default state of the application to the first state, and the path including the user interface interaction corresponding to the first state; for each of the stored records, extracting, using the at least one processor, text and metadata from the state; and storing information based on the extracted text and metadata in a data store. - View Dependent Claims (17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31)
-
Specification