×

Deduplication seeding

  • US 8,892,526 B2
  • Filed: 01/11/2012
  • Issued: 11/18/2014
  • Est. Priority Date: 01/11/2012
  • Status: Expired due to Fees
First Claim
Patent Images

1. A non-transitory computer-readable medium storing computer-executable instructions that when executed by a computer cause the computer to perform a data de-duplication method, the method comprising:

  • re-configuring a data de-duplication repository with a first blocklet taken from a source other than a data stream being ingested by a data de-duplication apparatus, where the source other than the data stream being ingested is a seed corpus, and where re-configuring the repository comprises moving the first blocklet from the seed corpus into the repository;

    re-configuring a data de-duplication index associated with the data de-duplication repository with index information about the first blocklet,where reconfiguring the data de-duplication repository or the data de-duplication index increases the likelihood that a second blocklet will be treated as a duplicate blocklet when processed by the data de-duplication apparatus using the data de-duplication repository and the data-duplication index to support duplicate blocklet determinations, andgenerating a new seed corpus, where generating the new seed corpus comprises selecting a seed blocklet from an existing repository based, at least in part, on one or more of, a reference count associated with the seed blocklet, an attribute describing the generalness of the seed blocklet, a trial and error approach, and a random approach.

View all claims
  • 10 Assignments
Timeline View
Assignment View
    ×
    ×