Copying garbage collector for B+ trees under multi-version concurrency control
First Claim
Patent Images
1. A method for use with a distributed storage system comprising a plurality of storage devices, the method comprising:
- traversing a plurality of search trees to identify one or more elements stored in underpopulated storage chunks of the distributed storage system;
generating a plurality of copy requests corresponding to the identified elements, each of the copy requests corresponding to a different respective one of the identified elements;
merging the plurality of copy requests with one or more co-pending data update requests, the merging including discarding a copy request from the plurality of copy requests in response to detecting that the copy request corresponds to an element that is due to be modified by one of the co-pending data update requests;
executing any remaining, copy requests in the plurality of copy requests by copying respective ones of the identified elements that correspond to the remaining copy requests from the underpopulated storage chunks to different storage chunks; and
reclaiming storage capacity corresponding to the underpopulated storage chunks.
4 Assignments
0 Petitions
Accused Products
Abstract
Structures and processes for garbage collection of search trees under Multi-Version Concurrency Control (MVCC). Such search trees may be used to store data within a distributed storage system. A process detects live search tree elements using tracing and then identify storage chunks having no live elements as garbage to be reclaimed. The process can be paused and resumed to reduce impact on other system processing. To reduce disk fragmentation, a garbage collector may copy pages between chunks prior to reclaiming chunk capacity. Also described is a resource efficient scheduler for a garbage collection.
53 Citations
18 Claims
-
1. A method for use with a distributed storage system comprising a plurality of storage devices, the method comprising:
-
traversing a plurality of search trees to identify one or more elements stored in underpopulated storage chunks of the distributed storage system; generating a plurality of copy requests corresponding to the identified elements, each of the copy requests corresponding to a different respective one of the identified elements; merging the plurality of copy requests with one or more co-pending data update requests, the merging including discarding a copy request from the plurality of copy requests in response to detecting that the copy request corresponds to an element that is due to be modified by one of the co-pending data update requests; executing any remaining, copy requests in the plurality of copy requests by copying respective ones of the identified elements that correspond to the remaining copy requests from the underpopulated storage chunks to different storage chunks; and reclaiming storage capacity corresponding to the underpopulated storage chunks. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A distributed storage system comprising:
-
a plurality of storage devices; two or more storage nodes configured to; traverse a plurality of search trees to identify one or more elements stored in underpopulated storage chunks of the distributed storage system; generate a plurality of copy requests corresponding to the identified elements, each of the copy requests corresponding to a different respective one of the identified elements; merge the plurality of copy requests with one or more co-pending data update requests by discarding a copy request from the plurality of copy requests in response to detecting that the copy request corresponds to an element that is due to be modified by one of the co-pending data update requests; execute any remaining copy requests in the plurality of copy requests by copying respective ones of the identified elements that correspond to the remaining copy requests from the underpopulated storage chunks to different storage chunks; and reclaim storage capacity corresponding to the underpopulated storage chunks. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
-
Specification