ADAPTING CONTENT REPOSITORIES FOR CRAWLING AND SERVING
First Claim
1. A computer-implemented method comprising:
- obtaining, using an adaptor being executed by at least one processor, file identifiers for files available in a file source, the file source being unavailable to a web crawler of a search engine that is remote from the file source and the adaptor;
creating a uniform resource locator (URL) for each of the file identifiers using the at least one processor, the URL being HTTP compatible;
providing each URL to the search engine;
receiving, by the adaptor, a request for contents associated with a particular URL of the provided URLs from the search engine;
obtaining file content using a file identifier determined based on the particular URL from the file source; and
providing an HTTP response to the search engine, the response comprising the content of the file identified by the file identifier corresponding to the particular URL.
2 Assignments
0 Petitions
Accused Products
Abstract
A system for searching files stored in a closed file source that is not accessible via a web crawler obtains file identifiers for files stored in the file source and creates a unique URL for each of the identifiers. Each URL may be based on a file identifier and a domain portion of a URL associated with the system. The system may provide the unique URLs to a search engine. The system may respond to a crawl request from the search engine for a particular URL by converting the URL back into a file identifier, obtaining the contents of the file, creating an HTTP response from the contents of the file, and returning the response to the search engine. The system may respond to a request for a seed URL with a plurality of URLs as links in a single HTTP response.
20 Citations
20 Claims
-
1. A computer-implemented method comprising:
-
obtaining, using an adaptor being executed by at least one processor, file identifiers for files available in a file source, the file source being unavailable to a web crawler of a search engine that is remote from the file source and the adaptor; creating a uniform resource locator (URL) for each of the file identifiers using the at least one processor, the URL being HTTP compatible; providing each URL to the search engine; receiving, by the adaptor, a request for contents associated with a particular URL of the provided URLs from the search engine; obtaining file content using a file identifier determined based on the particular URL from the file source; and providing an HTTP response to the search engine, the response comprising the content of the file identified by the file identifier corresponding to the particular URL. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A system comprising:
-
at least one processor; and a memory storing modules comprising; a lister module configured to cause the at least one processor to provide identifiers for files stored in a file source accessible by the system but unavailable to a web crawler of a search engine that is remote from the file source, a retriever module configured to cause the at least one processor to provide content of the files stored in the file source using the identifiers, and an adaptor module configured to cause the at least one processor to perform the following operations; invoke the lister for a particular file source, receive file identifiers for files in the file source, create a uniform resource locator (URL) for each of the file identifiers, each URL including a domain portion of a URL for the system; and provide each URL to the search engine. - View Dependent Claims (14, 15, 16, 17)
-
-
18. A computer-implemented method of crawling and indexing a closed file source comprising:
-
receiving, using a search engine being executed by at least one processor, uniform resource locators (URLs) from an adaptor associated with the file source, wherein the URLs are received without corresponding file contents; adding the URLs to a crawl list; and sending a request to the adaptor for the contents associated with a particular URL of the URLs, wherein the adaptor can identify a file based on the particular URL, and wherein a web crawler of the search engine cannot access the file without using the adaptor. - View Dependent Claims (19, 20)
-
Specification