Methods and systems for space management in data de-duplication
First Claim
1. A method for controlling a data de-duplication system to perform garbage collection, the method comprising:
- accessing a first list of tags, the first list of tags being a list of active tags associated with a data object stored in a data de-duplication system object store, where accessing the first list of tags comprises acquiring information about tags associated with active file namespaces associated with the de-duplication system by acquiring information from active namespace snapshot metadata acquired during a replication performed by the de-duplication system;
accessing a second list of tags, the second list of tags being a list of tags in the data de-duplication system object store, where accessing the second list of tags comprises acquiring information about tags actually present in the data de-duplication system object store by one or more of, accessing the object store via an object store Application Programming Interface (API) that is configured to provide the second list, and accessing a real-time up-to-date list of tags maintained by the data de-duplication system; and
performing garbage collection in the de-duplication system by reclaiming space in the data de-duplication object store by selectively deleting a tag present in the data de-duplication object store upon determining that the tag present in the data de-duplication object store is in the second list of tags but is not in the first list of tags, where deleting the tag comprises one or more of, physically deleting the tag and logically deleting the tag through reference count manipulation including delaying the deleting of the tag present in the data de-duplication system object store for a period of time sufficient to allow a race condition associated with deleting the tag to be resolved.
10 Assignments
0 Petitions
Accused Products
Abstract
The present invention is directed toward methods and systems for data de-duplication. More particularly, in various embodiments, the present invention provides systems and methods for data de-duplication that may utilize storage reclamation. In various embodiments, data de-duplication may be performed using data storage reclamation by reconciling a list of all active tags against a list of all tags present within the object store itself. Any tags found to be in the object store that have no corresponding active usage may then be deleted. In some embodiments additional steps may be taken to avoid race conditions in deleting tags that are needed by incoming data. In some embodiments the object store may request the lists. In other embodiments, a runtime list, in which each new tag is entered as they are returned from the object store. In another embodiment the object store may maintain this list directly.
22 Citations
10 Claims
-
1. A method for controlling a data de-duplication system to perform garbage collection, the method comprising:
-
accessing a first list of tags, the first list of tags being a list of active tags associated with a data object stored in a data de-duplication system object store, where accessing the first list of tags comprises acquiring information about tags associated with active file namespaces associated with the de-duplication system by acquiring information from active namespace snapshot metadata acquired during a replication performed by the de-duplication system; accessing a second list of tags, the second list of tags being a list of tags in the data de-duplication system object store, where accessing the second list of tags comprises acquiring information about tags actually present in the data de-duplication system object store by one or more of, accessing the object store via an object store Application Programming Interface (API) that is configured to provide the second list, and accessing a real-time up-to-date list of tags maintained by the data de-duplication system; and performing garbage collection in the de-duplication system by reclaiming space in the data de-duplication object store by selectively deleting a tag present in the data de-duplication object store upon determining that the tag present in the data de-duplication object store is in the second list of tags but is not in the first list of tags, where deleting the tag comprises one or more of, physically deleting the tag and logically deleting the tag through reference count manipulation including delaying the deleting of the tag present in the data de-duplication system object store for a period of time sufficient to allow a race condition associated with deleting the tag to be resolved. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A data storage system comprising:
-
a data storage device; a memory configured to store instructions; a processor configured to read the memory and execute the instructions, the instructions configured to cause the processor to; generate a first list of tags, the first list of tags being a list of active tags associated with a data object stored in a data de-duplication system object store, where generating the first list of tags comprises acquiring information about tags associated with active file namespaces associated with the de-duplication system by acquiring information from active namespace snapshot metadata acquired during a replication performed by the de-duplication system; generate a second list of tags, the second list of tags being a list of tags in the data de-duplication system object store, where generating the second list of tags comprises acquiring information about tags actually present in the data de-duplication system object store by one or more of, accessing the object store via an object store Application Programming Interface (API) that is configured to provide the second list, and accessing a real-time up-to-date list of tags maintained by the data de-duplication system; and reclaim space in the data de-duplication object store by selectively deleting a tag present in the data de-duplication object store upon determining that the tag present in the data de-duplication object store is in the second list of tags but is not in the first list of tags, where deleting the tag comprises one or more of, physically deleting the tag, and logically deleting the tag through reference count manipulation, and delaying the deleting of the tag present in the data de-duplication object store for a period of time sufficient to allow a race condition associated with deleting the tag to be resolved.
-
-
7. A non-transitory computer-readable medium storing computer executable instructions that when executed by a computer cause the computer to perform a method, the method comprising:
-
accessing a list of storage tags, where the list of storage tags are tags stored in an object store of a data de-duplication system, where accessing the storage list of tags comprises acquiring information about tags actually present in the data de-duplication system object store by one or more of, accessing the object store via an object store Application Programming Interface (API) that is configured to provide the second list, and accessing a real-time up-to-date list of tags maintained by the data de-duplication system; accessing a list of active tags, where the list of active tags are tags stored in the object store with at least one active reference, where accessing the active list of tags comprises acquiring information about tags associated with active file namespaces associated with the de-duplication system by acquiring information from active namespace snapshot metadata acquired during a replication performed by the de-duplication system; and performing garbage collection by deleting a tag in the object store in response to determining that the tag is in the list of storage tags but is not in the list of active tags, where deleting the tag comprises one or more of, physically deleting the tag and logically deleting the tag through reference count manipulation and delaying the deleting of the tag in the object store for a period of time sufficient to allow a race condition associated with deleting the tag to be resolved. - View Dependent Claims (8, 9, 10)
-
Specification