×

System and method of analyzing web content

  • US 8,020,206 B2
  • Filed: 07/10/2006
  • Issued: 09/13/2011
  • Est. Priority Date: 07/10/2006
  • Status: Active Grant
First Claim
Patent Images

1. A method of collecting data associated with a plurality of URLs, the method implemented on one or more computer processors, and comprising:

  • receiving a configuration plug-in, the plug-in specifying a web-crawling mode;

    receiving URL data;

    determining a plurality of work units from the URL data, each work unit comprising a URL;

    determining whether one of a plurality of dispatchers is available for receiving one of the plurality of work units;

    sending the one of the plurality of work units to the one of the plurality of dispatchers; and

    retrieving content associated with the URL of the work unit, using the one of the plurality of dispatchers based on the web-crawling mode specified by the configuration plug-in,wherein at least one of the plurality of dispatchers is configured to retrieve content and store the content in a database,wherein at least one of the plurality of dispatchers is configured to download executable content and execute the executable content in a sandbox environment, andwherein at least one of the plurality of dispatchers is configured to replace the at least one of the plurality of dispatchers configured to download executable content if the at least one of the plurality of dispatchers configured to download executable content is damaged by execution of the executable content.

View all claims
  • 21 Assignments
Timeline View
Assignment View
    ×
    ×