Reconstruction of dense tree volume metadata state across crash recovery
First Claim
1. A method comprising:
- storing a first upper level of a multi-level dense tree structure and a second upper level of the multi-level dense tree structure in a memory of a storage system coupled to one or more solid state devices (SSDs), wherein each level of the multi-level dense tree structure includes volume metadata entries for storing volume metadata, the volume metadata entries of the first upper level associated with a first generation identifier (ID), the volume metadata entries of the second upper level associated with a second generation ID;
recording the volume metadata entries in an append log stored on the SSDs, each recorded volume metadata entry tagged with one of the first generation ID associated with the first upper level and the second generation ID associated with the second upper level;
triggering a merge operation to merge the volume metadata entries of the first upper level of the multi-level dense tree structure with the volume metadata entries of a next lower level of the multi-level dense tree structure when the first upper level is full, the volume metadata entries of the merged levels recorded to the SSDs; and
recovering from a crash of the storage system during the merge operation by reading merged metadata entries from the SSDs into the memory and replaying entries of the append log to recreate the first upper level and the second upper level of the multi-level dense tree structure, such that the first upper level and second upper level are consistent with restarting the merge operation.
0 Assignments
0 Petitions
Accused Products
Abstract
Embodiments herein are directed to efficient crash recovery of persistent metadata managed by a volume layer of a storage input/output (I/O) stack executing on one or more nodes of a cluster. Volume metadata managed by the volume layer is organized as a multi-level dense tree, wherein each level of the dense tree includes volume metadata entries for storing the volume metadata. When a level of the dense tree is full, the volume metadata entries of the level are merged with the next lower level of the dense tree. During a merge operation, two sets of generation IDs may be used in accordance with a double buffer arrangement: a first generation ID for the append buffer that is full (i.e., a merge staging buffer) and a second, incremented generation ID for the append buffer that accepts new volume metadata entries. Upon completion of the merge operation, the lower level (e.g., level 1) to which the merge is directed is assigned the generation ID of the merge staging buffer.
-
Citations
20 Claims
-
1. A method comprising:
-
storing a first upper level of a multi-level dense tree structure and a second upper level of the multi-level dense tree structure in a memory of a storage system coupled to one or more solid state devices (SSDs), wherein each level of the multi-level dense tree structure includes volume metadata entries for storing volume metadata, the volume metadata entries of the first upper level associated with a first generation identifier (ID), the volume metadata entries of the second upper level associated with a second generation ID; recording the volume metadata entries in an append log stored on the SSDs, each recorded volume metadata entry tagged with one of the first generation ID associated with the first upper level and the second generation ID associated with the second upper level; triggering a merge operation to merge the volume metadata entries of the first upper level of the multi-level dense tree structure with the volume metadata entries of a next lower level of the multi-level dense tree structure when the first upper level is full, the volume metadata entries of the merged levels recorded to the SSDs; and recovering from a crash of the storage system during the merge operation by reading merged metadata entries from the SSDs into the memory and replaying entries of the append log to recreate the first upper level and the second upper level of the multi-level dense tree structure, such that the first upper level and second upper level are consistent with restarting the merge operation. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A system comprising:
-
a storage system having a memory connected to a processor; a storage array coupled to the storage system and having one or more solid state drives (SDDs); a storage input/output (I/O) stack adapted for execution on the processor of the storage system, the storage I/O stack configured to; store a first upper level of a multi-level dense tree structure and a second upper level of the multi-level dense tree structure in the memory, wherein each level of the multi-level dense tree structure includes volume metadata entries for storing volume metadata, the volume metadata entries of the first upper level associated with a first generation identifier (ID), the volume metadata entries of the second upper level associated with a second generation ID; record the volume metadata entries in an append log stored on the SSDs, each recorded volume metadata entry tagged with one of the first generation ID associated with the first upper level and the second generation ID associated with the second upper level; trigger a merge operation to merge the volume metadata entries of the first upper level of the multi-level dense tree structure with the volume metadata entries of a next lower level of the multi-level dense tree structure when the first upper level is full, the volume metadata entries of the merged levels recorded to the SSDs; and recover from a crash of the storage system during the merge operation by reading merged metadata entries from the SSDs into the memory and replaying entries of the append log to recreate the first upper level and the second upper level of the multi-level dense tree structure, such that the first upper level and second upper level are consistent with restarting the merge operation. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A non-transitory computer readable medium including program instructions for execution on a processor of a storage system, the program instructions configured to:
-
store a first upper level of a multi-level dense tree structure and a second upper level of the multi-level dense tree structure in a memory of the storage system coupled to one or more solid state devices (SSDs), wherein each level of the multi-level dense tree structure includes volume metadata entries for storing volume metadata, the volume metadata entries of the first upper level associated with a first generation identifier (ID), the volume metadata entries of the second upper level associated with a second generation ID; record the volume metadata entries in an append log stored on the SSDs, each recorded volume metadata entry tagged with one of the first generation ID associated with the first upper level and the second generation ID associated with the second upper level; is trigger a merge operation to merge the volume metadata entries of the first upper level of the multi-level dense tree structure with the volume metadata entries of a next lower level of the multi-level dense tree structure when the first upper level is full, the volume metadata entries of the merged levels recorded to the SSDs; and recover from a crash of the storage system during the merge operation by reading merged metadata entries from the SSDs into the memory and replaying entries of the append log to recreate the first upper level and the second upper level of the multi-level dense tree structure, such that the first upper level and second upper level are consistent with restarting the merge operation. - View Dependent Claims (20)
-
Specification