×

Method and apparatus for retrieving and indexing hidden pages

  • US 7,685,112 B2
  • Filed: 05/27/2005
  • Issued: 03/23/2010
  • Est. Priority Date: 06/17/2004
  • Status: Expired due to Fees
First Claim
Patent Images

1. A computer-implemented method of downloading Hidden Web pages comprising:

  • a) selecting a query term;

    b) issuing a query to a site-specific search interface containing Hidden Web pages;

    c) acquiring a results index;

    d) downloading the Hidden Web pages from the results index;

    e) identifying a plurality of potential query terms from the downloaded Hidden Web pages;

    f) estimating the efficiency of each potential query term based on a ratio of the number of new pages returned for a particular query to the cost of issuing the particular query wherein the cost of issuing the particular query is equal to cq+crP(qi)+cdPnew(qi) where P(qi) represents the fraction of pages returned for a particular query (qi) and Pnew(qi) represents the fraction of new pages returned for a particular query (qi), and where cq represents the cost of submitting the particular query, cr represents the cost of retrieving a results index page, and cd represents the cost for downloading a matching document;

    g) selecting a next query term from the plurality of potential query terms, wherein the next selected query term has the greatest efficiency; and

    h) issuing a next query to the site-specific search interface using the next query term.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×