GARBAGE COLLECTION AND HOTSPOTS RELIEF FOR A DATA DEDUPLICATION CHUNK STORE
First Claim
1. A method for garbage collecting a chunk store, the chunk store including data stored as a plurality of data chunks, the plurality of data chunks including stream map chunks, each stream map chunk corresponding to a stream map for a corresponding data stream and referencing data chunks stored in one or more chunk containers of the chunk store that are included in the corresponding data stream, the method comprising:
- identifying data chunks stored in the one or more chunk containers that are unused based on being referenced only by stream map chunks indicated as deleted;
indicating the identified data chunks as deleted; and
reclaiming storage space in the one or more chunk containers containing the data chunks indicated as deleted.
2 Assignments
0 Petitions
Accused Products
Abstract
Techniques for garbage collecting unused data chunks in storage are provided. According to one implementation, data chunks stored in a chunk container that are unused are identified based an analysis of one or more stream map chunks indicated as deleted. The identified data chunks are indicated as deleted. The storage space in the chunk container filled by the data chunks indicated as deleted may then be reclaimed. Techniques for selectively backing up data chunks are also provided. According to one implementation, a data chunk is received for storing in a chunk container. A backup copy of the received data chunk is stored in a backup container if the received data chunk is in a predetermined top percentage of most referenced data chunks in the chunk container and has a number of references greater than a predetermined reference threshold.
146 Citations
20 Claims
-
1. A method for garbage collecting a chunk store, the chunk store including data stored as a plurality of data chunks, the plurality of data chunks including stream map chunks, each stream map chunk corresponding to a stream map for a corresponding data stream and referencing data chunks stored in one or more chunk containers of the chunk store that are included in the corresponding data stream, the method comprising:
-
identifying data chunks stored in the one or more chunk containers that are unused based on being referenced only by stream map chunks indicated as deleted; indicating the identified data chunks as deleted; and reclaiming storage space in the one or more chunk containers containing the data chunks indicated as deleted. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A method for efficient data backup in a chunk store that stores a plurality of data streams in the form of data chunks, each data stream stored as a stream map that references data chunks of the data stream in a chunk container of the chunk store, the method comprising:
-
receiving a data chunk for storing in a chunk container; determining whether the received data chunk is in a predetermined top percentage of most referenced data chunks in the chunk container, has a number of references greater than a predetermined heuristic that includes a predetermined reference threshold, and is not replicated for backup; and storing a backup copy of the received data chunk in a backup container if the received data chunk is determined to be in the predetermined top percentage and/or to have a number of references greater than the predetermined reference threshold, and to not be replicated for backup. - View Dependent Claims (11, 12, 13, 14, 15, 16)
-
-
17. A garbage collection module for garbage collecting a chunk store, the chunk store including data stored as a plurality of data chunks, the plurality of data chunks including stream map chunks, each stream map chunk corresponding to a stream map for a corresponding data stream and referencing data chunks stored in one or more chunk containers of the chunk store that are included in the corresponding data stream, the garbage collection module comprising:
-
a stream map chunk scanner configured to identify data chunks stored in the one or more chunk containers that are unused based on being referenced only by stream map chunks indicated as deleted; a deleted data chunk indicator configured to indicate the identified data chunks as deleted; and a storage space reclaimer configured to reclaim storage space in the one or more chunk containers containing the data chunks indicated as deleted. - View Dependent Claims (18, 19, 20)
-
Specification