×

Web forum crawler

  • US 7,599,931 B2
  • Filed: 03/03/2006
  • Issued: 10/06/2009
  • Est. Priority Date: 03/03/2006
  • Status: Active Grant
First Claim
Patent Images

1. A system with a processor and memory for crawling a site having pages, each page having a reference that identifies the page, each reference having tokens, comprising:

  • a grouping component that identifies groups of pages with similar content;

    a pattern component that identifies a reference pattern of a group based on the references of the pages of the group, the reference pattern being identified by analyzing the tokens of the references of the pages of the group to identify sequences of tokens indicating a pattern of tokens within the references; and

    a decision component that, after encountering a reference that matches a reference pattern when crawling the site, decides whether to access the page of the encountered reference based on characteristics of the pages of the group of the matching reference patternwherein the components are implemented as computer-executable instructions stored in the memory for execution by the processor.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×