×

Data de-duplication in a dispersed storage network utilizing data characterization

  • US 8,762,346 B2
  • Filed: 06/03/2013
  • Issued: 06/24/2014
  • Est. Priority Date: 11/25/2009
  • Status: Active Grant
First Claim
Patent Images

1. A method for execution by a processing module of a computing device, the method comprises:

  • receiving, from a requesting device, a data storage request that includes data for storage;

    determining, by the processing module, whether substantially identical data is currently stored in a dispersed storage network (DSN) memory as a plurality of sets of encoded data slices, wherein the identical data was encoded in accordance with a dispersed storage error encoding function to produce the plurality of sets of encoded data slices, and wherein the substantially identical data is recoverable based on a unique retrieval matrix of the plurality of sets of encoded data slices; and

    when the substantially identical data is stored in the DSN memory;

    generating, for the requesting device, a second unique retrieval matrix of the plurality of sets of encoded data slices, wherein the requesting device can recover at least a portion of the data based on the second unique retrieval matrix of the plurality of sets of encoded data slices, wherein;

    the unique retrieval matrix including;

    for a first set of encoded data slices of the plurality of sets of encoded data slices, identity of a first sub-set of encoded data slices of the first set of encoded data slices; and

    for a second set of encoded data slices of the plurality of sets of encoded data slices, identity of a first sub-set of encoded data slices of the second set of encoded data slices; and

    the second unique retrieval matrix including;

    for the first set of encoded data slices of the plurality of sets of encoded data slices, identity of a second sub-set of encoded data slices of the first set of encoded data slices; and

    for the second set of encoded data slices of the plurality of sets of encoded data slices, identity of a second sub-set of encoded data slices of the second set of encoded data slices, wherein each sub-set of encoded data slices includes a least a decode threshold number of encoded data slices.

View all claims
  • 4 Assignments
Timeline View
Assignment View
    ×
    ×