×

Interactive web crawler

  • US 9,524,343 B2
  • Filed: 12/10/2015
  • Issued: 12/20/2016
  • Est. Priority Date: 06/17/2011
  • Status: Active Grant
First Claim
Patent Images

1. A method of web crawling hidden files, comprising:

  • retrieving a list of form controls from a web page;

    retrieving one or more candidate values for one of the form controls;

    retrieving an additional form control candidate value for one of the form controls, the additional form control candidate value not being shown in a static HTML description of the web page;

    generating form control values for the form controls based on the candidate values, the additional form control candidate value, and a knowledge base comprising characterizations of possible form control values, and N-grams generated from search query logs;

    submitting an event to the web page using the form control values;

    generating a URL for the form control values to crawl for the hidden files; and

    generating a plurality of URLs, the plurality of URLs being generated based on one or more constraints, the form control values being generated such that an estimated utility of the form control values is increased, the estimated utility being based on compliance with the constraints.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×