Infrastructure enabling intelligent execution and crawling of a web application
First Claim
Patent Images
1. A method comprising:
- accessing, by a web crawler executing on one or more computing systems associated with a social-networking system, a structured document of a network application, the structured document comprising structural information and content comprising one or more embedded scripts and one or more resources or identifiers for the resources;
executing, by the web crawler executing on the one or more computing systems, at least some of the content of the structured document;
processing, by the computing systems, the structured document to generate a model representation of the structured document;
tracking, by the computing systems, one or more interactions resulting from the web crawler'"'"'s execution of at least some of the content, the interactions comprising one or more outgoing requests sent by one or more of the computing systems or incoming responses received by one or more of the computing systems from one or more third-party servers;
creating, by the computing systems, a behavior model of the network application based on one or more of the interactions resulting from the web crawler'"'"'s execution of at least some of the content, the behavior model comprising a first log of outgoing HTTP requests generated by the network application when the content is executed;
creating, by the computing systems, a second log that comprises an identification of one or more network resources ascertained by filtering the first log;
comparing, by the computing systems, one or more of the network resources identified in the second log to a list comprising an identification of one or more rogue network resources;
by the computing systems, determining, based on the comparison, whether the network application meets one or more requirements of the social-networking system, wherein the one or more requirements comprise avoiding interaction with any of the rogue network resources.
2 Assignments
0 Petitions
Accused Products
Abstract
In one embodiment, a method includes accessing a structured document of a network application, processing the structured document to generate a model representation of the structured document, tracking one or more interactions occurring during the processing of the structured document, the one or more interactions including one or more outgoing requests transmitted by the one or more computing systems or incoming responses received by the one or more computing systems, and generating a behavior model of the web application based on one or more of the interactions.
48 Citations
21 Claims
-
1. A method comprising:
-
accessing, by a web crawler executing on one or more computing systems associated with a social-networking system, a structured document of a network application, the structured document comprising structural information and content comprising one or more embedded scripts and one or more resources or identifiers for the resources; executing, by the web crawler executing on the one or more computing systems, at least some of the content of the structured document; processing, by the computing systems, the structured document to generate a model representation of the structured document; tracking, by the computing systems, one or more interactions resulting from the web crawler'"'"'s execution of at least some of the content, the interactions comprising one or more outgoing requests sent by one or more of the computing systems or incoming responses received by one or more of the computing systems from one or more third-party servers; creating, by the computing systems, a behavior model of the network application based on one or more of the interactions resulting from the web crawler'"'"'s execution of at least some of the content, the behavior model comprising a first log of outgoing HTTP requests generated by the network application when the content is executed; creating, by the computing systems, a second log that comprises an identification of one or more network resources ascertained by filtering the first log; comparing, by the computing systems, one or more of the network resources identified in the second log to a list comprising an identification of one or more rogue network resources; by the computing systems, determining, based on the comparison, whether the network application meets one or more requirements of the social-networking system, wherein the one or more requirements comprise avoiding interaction with any of the rogue network resources.
-
-
2. The method of claim 1, further comprising enumerating one or more attributes of the structured document, wherein the behavior model comprises one or more of the enumerated attributes.
-
3. The method of claim 1, further comprising filtering, by the computing systems, the first log to ascertain the one or more network resources, which correspond to one or more advertisement developers or advertisement provider networks that the network application sent requests for advertisements to or that one or more incoming responses comprising advertisements were received from.
-
4. The method of claim 3, wherein the list of rogue network resources includes one or more of rogue ad networks, undesirable or unauthorized third-party systems, websites, or applications, or any combination thereof.
-
5. The method of claim 1, wherein:
-
the computing systems comprise a primary computing system and one or more secondary computing systems; each of the secondary computing systems hosts the web crawler; and the method further comprises receiving, by the web crawler, a request from the primary computer system to access the network application.
-
-
6. The method of claim 5, further comprising accessing, by the web crawler, one or more servers hosting a canvas web page.
-
7. The method of claim 6, further comprising logging into, by the web crawler, the servers using test user credentials.
-
8. The method of claim 5, wherein the web crawler is implemented, at least in part, with all or portions of a cross platform component model and a layout engine.
-
9. The method of claim 8, wherein:
-
the web crawler comprises an overlying programming layer overtop of the cross platform component model and layout engine layers; the overlying programming layer comprises a JavaScript layer; tracking the interactions occurring during the processing of the structured document comprises tracking the interactions by the overlying programming layer; and the JavaScript layer of the overlying programming layer is configured to capture state in the DOM by capturing a snapshot of the state of the DOM after the structured document is rendered.
-
-
10. The method of claim 1, wherein the model representation is a Document Object Model (DOM) representation.
-
11. One or more computer-readable non-transitory storage media embodying software that is operable when executed to:
-
access, by a web crawler executing on one or more computing systems associated with a social-networking system, a structured document of a network application, the structured document comprising structural information and content comprising one or more embedded scripts and one or more resources or identifiers for the resources; execute, by the web crawler executing on the one or more computing systems, at least some of the content of the structured document; process, by the computing systems, the structured document to generate a model representation of the structured document; track, by the computing systems, one or more interactions resulting from the web crawler'"'"'s execution of at least some of the content, the interactions comprising one or more outgoing requests sent by one or more of the computing systems or incoming responses received by one or more of the computing systems from one or more third-party servers; create, by the computing systems, a behavior model of the network application based on one or more of the interactions resulting from the web crawler'"'"'s execution of at least some of the content, the behavior model comprising a first log of outgoing HTTP requests generated by the network application when the content is executed; create, by the computing systems, a second log that comprises an identification of one or more network resources ascertained by filtering the first log; compare, by the computing systems, one or more of the network resources identified in the second log to a list comprising an identification of one or more rogue network resources; by the computing systems, determine, based on the comparison, whether the network application meets one or more requirements of the social-networking system, wherein the one or more requirements comprise avoiding interaction with any of the rogue network resources.
-
-
12. The media of claim 11 wherein the software is further operable when executed to enumerate one or more attributes of the structured document, wherein the behavioral model comprises one or more of the enumerated attributes.
-
13. The media of claim 11, wherein the software is further operable when executed to filter the first log to ascertain the one or more network resources, which correspond to one or more advertisement developers or advertisement provider networks that the network application sent requests for advertisements to or that one or more incoming responses comprising advertisements were received from.
-
14. The media of claim 13, wherein the list of rogue network resources includes one or more of rogue ad networks, undesirable or unauthorized third-party systems, websites, or applications, or any combination thereof.
-
15. The media of claim 11, wherein:
-
the software is further operable when executed to process a request to access the network application; and accessing the web crawler is operable to access and render the network application.
-
-
16. The media of claim 15, wherein the web crawler is further operable to access one or more servers hosting a canvas web page.
-
17. The media of claim 16, wherein the web crawler is further operable to log into the servers using test user credentials.
-
18. The media of claim 15, wherein the web crawler is implemented, at least in part, with all or portions of a cross platform component model and a layout engine.
-
19. The media of claim 18, wherein:
-
the web crawler further comprises an overlying programming layer overtop of the component model and layout engine layers; the overlying programming layer comprises a JavaScript layer; to track the interactions occurring during the processing of the structured document, the software is operable when executed to track the interactions by the overlying programming layer; and the JavaScript layer of the overlying programming layer is operable to capture state in the DOM by capturing a snapshot of the state of the DOM after the structured document is rendered.
-
-
20. The method of claim 1, wherein at least one network resource comprises a Domain name or a URL.
-
21. The media of claim 11, wherein at least one network resource comprises a Domain name or a URL.
Specification