Process for fast file system crawling to support incremental file system differencing
First Claim
1. A method of crawling a hierarchical storage structure of a data storage system, the method comprising:
- recursively dividing the hierarchical storage structure into a plurality of sections, wherein the hierarchical storage structure contains data entities stored by the storage system;
maintaining a queue in the data storage system, the queue containing leads for the plurality of sections to be crawled for metadata;
associating each of the leads with one of a plurality of threads associated with a parallel-processing architecture of the data storage system;
parallel-processing the plurality of sections utilizing the plurality of threads to generate a plurality of sorted lists of metadata, wherein each of the plurality of sorted lists corresponds to a different one of the plurality of sections of the hierarchical storage structure, said parallel-processing the plurality of sections including;
identifying, by each thread, metadata associated with data entities contained in a section associated with the thread;
appending, by the thread, the metadata of each of the data entities as a metadata entry to a metadata list corresponding to the associated section, the metadata list stored in a memory buffer associated with the thread; and
sorting, by the thread, the metadata entries within the metadata list of the associated section based on a unique identifier associated with the metadata entry;
merging, by the threads associated with each of the plurality of sections, corresponding sorted lists of metadata to form a baseline list, wherein the baseline list contains sorted metadata for the data entities of the hierarchical storage structure; and
outputting a representation of the baseline list, as a result of the crawling, to indicate a state of the data entities stored by the storage system.
1 Assignment
0 Petitions
Accused Products
Abstract
A network storage server implements a method to perform fast crawling of a hierarchical storage structure. The hierarchical storage structure contains data entities stored by a network storage server. The hierarchical storage structure can be recursively divided into a plurality of sections. A plurality of parallel-processing threads can be used to process the plurality of sections. Each thread selects and processes one of the plurality of sections at a time to generate a sorted list of metadata corresponding to the section of the hierarchical storage structure. The sorted lists generated by the plurality of threads are merged to a baseline list. The baseline list contains sorted metadata for entities managed by the hierarchical storage structure. The baseline list can then be outputted as a representation of the state of data stored by the network storage server.
-
Citations
22 Claims
-
1. A method of crawling a hierarchical storage structure of a data storage system, the method comprising:
-
recursively dividing the hierarchical storage structure into a plurality of sections, wherein the hierarchical storage structure contains data entities stored by the storage system; maintaining a queue in the data storage system, the queue containing leads for the plurality of sections to be crawled for metadata; associating each of the leads with one of a plurality of threads associated with a parallel-processing architecture of the data storage system; parallel-processing the plurality of sections utilizing the plurality of threads to generate a plurality of sorted lists of metadata, wherein each of the plurality of sorted lists corresponds to a different one of the plurality of sections of the hierarchical storage structure, said parallel-processing the plurality of sections including; identifying, by each thread, metadata associated with data entities contained in a section associated with the thread; appending, by the thread, the metadata of each of the data entities as a metadata entry to a metadata list corresponding to the associated section, the metadata list stored in a memory buffer associated with the thread; and sorting, by the thread, the metadata entries within the metadata list of the associated section based on a unique identifier associated with the metadata entry; merging, by the threads associated with each of the plurality of sections, corresponding sorted lists of metadata to form a baseline list, wherein the baseline list contains sorted metadata for the data entities of the hierarchical storage structure; and outputting a representation of the baseline list, as a result of the crawling, to indicate a state of the data entities stored by the storage system. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A method of crawling a network file system of a network storage server, the method comprising:
-
maintaining a queue in the network file system, wherein the network file system manages entities stored in the network storage server, and the queue contains leads for a plurality of sections of the network file system to be crawled for metadata; associating each lead with one of a plurality of threads associated with a parallel-processing architecture of the network storage server; processing the leads with a plurality of parallel-processing threads, wherein each of the plurality of threads selects one section from the plurality of sections at a time for processing; selecting each lead from the queue, and for each lead; assigning the lead to be processed by one of the threads of the plurality of the threads, scanning, by the assigned thread, a section identified by the lead for entities contained in the section; storing, by the assigned thread, metadata for each of the entities contained in the section to a memory buffer; upon the memory buffer being full, sorting, by the assigned thread, metadata stored in the memory buffer and saving the sorted metadata to a sorted list, wherein the sorting is based on a unique identifier associated with the metadata; and merging, by the threads associated with each of the leads, corresponding saved sorted lists to a baseline list, wherein the baseline list contains metadata for all entities of the network file system. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19)
-
-
20. A system comprising:
-
a processor; a network interface through which to communicate with a network file system; and a memory coupled with the processor, the memory storing instructions which, when executed by the processor, cause the system to perform a process comprising; recursively dividing data in the network file system into a plurality of sections, wherein the network file system contains data stored in a network storage server; maintaining a queue in the data storage system, the queue containing leads for the plurality of sections to be crawled for metadata; processing the leads with a plurality of parallel-processing threads, wherein each of the plurality of threads selects one section from the plurality of sections at a time for processing, said processing the plurality of sections including; identifying, by a thread assigned to a particular section, metadata associated with data entities contained in the particular section; appending, by the thread, the metadata of each of the data entities as a metadata entry to a metadata list corresponding to the particular section, the metadata list stored in a memory buffer associated with the thread; and sorting, by the thread, the metadata entries within the metadata list of the particular section based on a unique identifier associated with the metadata entry; and merging, by the threads associated with each of the plurality of sections, corresponding sorted lists of metadata generated by the plurality of threads to a baseline list, wherein the baseline list contains sorted metadata for the data in the network file system. - View Dependent Claims (21, 22)
-
Specification