×

Anchor tag indexing in a web crawler system

  • US 8,484,548 B1
  • Filed: 11/07/2007
  • Issued: 07/09/2013
  • Est. Priority Date: 07/03/2003
  • Status: Active Grant
First Claim
Patent Images

1. A system for processing information about documents in a collection of linked documents, the system comprising:

  • one or more processors;

    memory storing one or more programs for execution by the one or more processors;

    a link log, the link log comprising a plurality of link records, each link record identifying a source document and a list of one or more target documents pointed to by one or more outbound links in the source document;

    the link record including a source document identifier for the identified source document and one or more target document identifiers for the identified list of target documents, wherein the link records are based, at least in part, on information extracted from crawled documents in the collection of linked documents; and

    the one or more programs including a global state manager configured to access the link log and to output a sorted anchor map, the sorted anchor map comprising a plurality of anchor records, each anchor record comprising a respective target document identifier and a respective list of inbound links, the list of inbound links including source document identifiers;

    wherein the plurality of anchor records are ordered in the sorted anchor map based, at least in part, on their respective target document identifiers; and

    wherein, for at least one anchor record, a document located at a source document address corresponding to a source document identifier in the list of inbound links contains at least one outbound link, the at least one outbound link pointing to a corresponding target document address, the target document address corresponding to the respective target document identifier for the at least one anchor record.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×