×

Document crawling systems and methods

  • US 8,285,703 B1
  • Filed: 05/25/2010
  • Issued: 10/09/2012
  • Est. Priority Date: 05/13/2009
  • Status: Active Grant
First Claim
Patent Images

1. A non-transitory computer-readable medium encoded with a data management application comprising modules executable by a processor to crawl documents, the data management application comprising:

  • a scheduling module to retrieve a plurality of job modules from a data store, the plurality of job modules each comprising corresponding crawling instructions and corresponding priority data for crawling documents in a data storage system;

    a priority queue to receive the plurality of job modules from the scheduling module and to store each job module in a sequence according to the corresponding priority data;

    an execution module to assign each job module to one of a plurality of processing modules according to the sequence for processing, wherein each assigned job module is configured to;

    identify a step for processing based on the corresponding crawling instructions, the step comprising crawling a group of the documents;

    process the step to crawl the group of the documents in the data storage system;

    determine if at least one additional step for processing is required based on the corresponding crawling instructions, the at least one additional step comprising crawling another group of the documents; and

    reschedule the job module to the scheduling module for insertion into the priority queue.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×