Deduplication Seeding
First Claim
1. A non-transitory computer-readable medium storing computer- executable instructions that when executed by a computer cause the computer to perform a data de-duplication method, the method comprising:
- re-configuring a data de-duplication repository with a first blocklet taken from a source other than a data stream being ingested by a data de-duplication apparatus; and
re-configuring a data de-duplication index associated with the data de-duplication repository with index information about the first blocklet,where reconfiguring the data de-duplication repository or the data de-duplication index increases the likelihood that a second blocklet will be treated as a duplicate blocklet when processed by the data de-duplication apparatus using the data de-duplication repository and the data-duplication index to support duplicate blocklet determinations.
10 Assignments
0 Petitions
Accused Products
Abstract
Apparatus, methods, and other embodiments associated with de- duplication seeding are described. One example method includes re-configuring a data de-duplication repository with a blocklet from a data de-duplication seed corpus. Reconfiguring the repository may include adding a blocklet from the seed corpus to the repository, activating a blocklet identified with the seed corpus in the repository, removing a blocklet from the repository, and de-activating a blocklet in the repository. The example method may also include re-configuring a data de-duplication index associated with the data de-duplication repository with information about the blocklet. Reconfiguring the repository and the index increases the likelihood that a blocklet ingested by a data de-duplication apparatus that relies on the repository and the index will be treated as a duplicate blocklet by the data de-duplication apparatus.
15 Citations
20 Claims
-
1. A non-transitory computer-readable medium storing computer- executable instructions that when executed by a computer cause the computer to perform a data de-duplication method, the method comprising:
-
re-configuring a data de-duplication repository with a first blocklet taken from a source other than a data stream being ingested by a data de-duplication apparatus; and re-configuring a data de-duplication index associated with the data de-duplication repository with index information about the first blocklet, where reconfiguring the data de-duplication repository or the data de-duplication index increases the likelihood that a second blocklet will be treated as a duplicate blocklet when processed by the data de-duplication apparatus using the data de-duplication repository and the data-duplication index to support duplicate blocklet determinations. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A data de-duplication apparatus, comprising:
-
a processor; a memory; a set of logics; and an interface to connect the processor, the memory, and the set of logics, the set of logics comprising; a first logic configured to manipulate a data de-duplication repository with a first blocklet associated with a seed corpus, where the data de-duplication apparatus uses the data de-duplication repository to make duplicate blocklet determinations; and a second logic configured to manipulate a data de-duplication index with information about the first blocklet, where the data de-duplication apparatus uses the data de-duplication index to make duplicate determinations, where manipulating the repository with the first blocklet and manipulating the index with the information about the first blocklet change the likelihood that a second blocklet processed by the data de-duplication apparatus will be treated as a duplicate blocklet. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18, 19)
-
-
20. A system, comprising:
-
means for identifying a property of a data stream being processed by a data de-duplication apparatus; and means for updating a data de-duplication repository of unique blocks in use by the data de-duplication apparatus with data from a data de-duplication seed corpus, where the data from the data de-duplication seed corpus is configured to increase a de-duplication rate for the data stream being processed.
-
Specification