Spatially directed crawling of documents
First Claim
1. A method for populating a document repository with documents that are relevant to a spatial domain that has a spatial metric, said method comprising:
- retrieving a document address from a page queue;
loading into the document repository a document that is identified by the retrieved document address;
parsing the loaded document for links to new documents;
storing addresses of the new documents into the page queue, and for each address in the page queue, storing a spatial relevance level, the spatial relevance level being a measure of a document'"'"'s relevance to a location in the spatial domain; and
iteratively repeating the steps of retrieving, loading, parsing and storing to populate the document repository, and wherein retrieving involves using the spatial relevance levels of the stored addresses in the page queue to determine which document addresses are retrieved from the page queue.
3 Assignments
0 Petitions
Accused Products
Abstract
A method for populating a document repository that involves: retrieving a document address from a page queue that stores document addresses; loading into the document repository a document that is identified by the retrieved document address; parsing the loaded document for links to new documents; storing addresses of the new documents into the page queue along with a spatial relevance level for each stored address; and iteratively repeating the steps of retrieving, loading, parsing and storing to populate the document repository, wherein retrieving involves using the spatial relevance levels of the stored addresses in the page queue to determine which document addresses are retrieved from the page queue.
72 Citations
14 Claims
-
1. A method for populating a document repository with documents that are relevant to a spatial domain that has a spatial metric, said method comprising:
-
retrieving a document address from a page queue; loading into the document repository a document that is identified by the retrieved document address; parsing the loaded document for links to new documents; storing addresses of the new documents into the page queue, and for each address in the page queue, storing a spatial relevance level, the spatial relevance level being a measure of a document'"'"'s relevance to a location in the spatial domain; and iteratively repeating the steps of retrieving, loading, parsing and storing to populate the document repository, and wherein retrieving involves using the spatial relevance levels of the stored addresses in the page queue to determine which document addresses are retrieved from the page queue. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
Specification