×

Method and apparatus for improved web scraping

  • US 7,072,890 B2
  • Filed: 02/21/2003
  • Issued: 07/04/2006
  • Est. Priority Date: 02/21/2003
  • Status: Active Grant
First Claim
Patent Images

1. A method for improved web scraping, comprising the steps of:

  • obtaining a results page for a given web site/query;

    determining whether the source of said results was previously requested;

    IF said source was previously requested, THENretrieving known links from database;

    comparing said known links to links on said results page;

    determining whether “

    N”

    good links have been found;

    IF said “

    N”

    good links have been found, THENidentifying said “

    N”

    good links;

    building a stack of potential “

    begin hits”

    HTML tags and strings for each of selections “

    1”

    through “

    N”

    ;

    comparing entries of said stack to find “

    best”

    combination of said “

    begin hits”

    HTML tags and strings;

    writing to and updating configuration file so as to terminate process;

    OTHERWISE;

    returning to said step of parsing said results page to identify all links;

    OTHERWISE;

    parsing said results page to identify all links;

    presenting list of said links to user;

    manually selecting “

    N”

    good links; and

    returning to said step of identifying said “

    N”

    good links.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×