Deduplication seeding
First Claim
1. A non-transitory computer-readable medium storing computer-executable instructions that when executed by a computer cause the computer to perform a data de-duplication method, the method comprising:
- re-configuring a data de-duplication repository with a first blocklet taken from a source other than a data stream being ingested by a data de-duplication apparatus, where the source other than the data stream being ingested is a seed corpus, and where re-configuring the repository comprises moving the first blocklet from the seed corpus into the repository;
re-configuring a data de-duplication index associated with the data de-duplication repository with index information about the first blocklet,where reconfiguring the data de-duplication repository or the data de-duplication index increases the likelihood that a second blocklet will be treated as a duplicate blocklet when processed by the data de-duplication apparatus using the data de-duplication repository and the data-duplication index to support duplicate blocklet determinations, andgenerating a new seed corpus, where generating the new seed corpus comprises selecting a seed blocklet from an existing repository based, at least in part, on one or more of, a reference count associated with the seed blocklet, an attribute describing the generalness of the seed blocklet, a trial and error approach, and a random approach.
10 Assignments
0 Petitions
Accused Products
Abstract
Apparatus, methods, and other embodiments associated with de-duplication seeding are described. One example method includes re-configuring a data de-duplication repository with a blocklet from a data de-duplication seed corpus. Reconfiguring the repository may include adding a blocklet from the seed corpus to the repository, activating a blocklet identified with the seed corpus in the repository, removing a blocklet from the repository, and de-activating a blocklet in the repository. The example method may also include re-configuring a data de-duplication index associated with the data de-duplication repository with information about the blocklet. Reconfiguring the repository and the index increases the likelihood that a blocklet ingested by a data de-duplication apparatus that relies on the repository and the index will be treated as a duplicate blocklet by the data de-duplication apparatus.
7 Citations
6 Claims
-
1. A non-transitory computer-readable medium storing computer-executable instructions that when executed by a computer cause the computer to perform a data de-duplication method, the method comprising:
-
re-configuring a data de-duplication repository with a first blocklet taken from a source other than a data stream being ingested by a data de-duplication apparatus, where the source other than the data stream being ingested is a seed corpus, and where re-configuring the repository comprises moving the first blocklet from the seed corpus into the repository; re-configuring a data de-duplication index associated with the data de-duplication repository with index information about the first blocklet, where reconfiguring the data de-duplication repository or the data de-duplication index increases the likelihood that a second blocklet will be treated as a duplicate blocklet when processed by the data de-duplication apparatus using the data de-duplication repository and the data-duplication index to support duplicate blocklet determinations, and generating a new seed corpus, where generating the new seed corpus comprises selecting a seed blocklet from an existing repository based, at least in part, on one or more of, a reference count associated with the seed blocklet, an attribute describing the generalness of the seed blocklet, a trial and error approach, and a random approach. - View Dependent Claims (2, 3, 4, 5, 6)
-
Specification