Data de-duplication in a dispersed storage network utilizing data characterization
First Claim
1. A method for execution by a processing module of a computing device, the method comprises:
- receiving, from a requesting device, a data storage request that includes data for storage;
determining, by the processing module, whether substantially identical data is currently stored in a dispersed storage network (DSN) memory as a plurality of sets of encoded data slices, wherein the identical data was encoded in accordance with a dispersed storage error encoding function to produce the plurality of sets of encoded data slices, and wherein the substantially identical data is recoverable based on a unique retrieval matrix of the plurality of sets of encoded data slices; and
when the substantially identical data is stored in the DSN memory;
generating, for the requesting device, a second unique retrieval matrix of the plurality of sets of encoded data slices, wherein the requesting device can recover at least a portion of the data based on the second unique retrieval matrix of the plurality of sets of encoded data slices, wherein;
the unique retrieval matrix including;
for a first set of encoded data slices of the plurality of sets of encoded data slices, identity of a first sub-set of encoded data slices of the first set of encoded data slices; and
for a second set of encoded data slices of the plurality of sets of encoded data slices, identity of a first sub-set of encoded data slices of the second set of encoded data slices; and
the second unique retrieval matrix including;
for the first set of encoded data slices of the plurality of sets of encoded data slices, identity of a second sub-set of encoded data slices of the first set of encoded data slices; and
for the second set of encoded data slices of the plurality of sets of encoded data slices, identity of a second sub-set of encoded data slices of the second set of encoded data slices, wherein each sub-set of encoded data slices includes a least a decode threshold number of encoded data slices.
4 Assignments
0 Petitions
Accused Products
Abstract
A computing device includes a processing module and an interface. The processing module is operable to receive, from a requesting device via the interface, a data storage request that includes data for storage. The processing module then determines whether substantially identical data is currently stored in a dispersed storage network (DSN) memory. When the substantially identical data is stored in the DSN memory, the processing module generates, for the requesting device, a second unique retrieval matrix of a plurality of sets of encoded data slices corresponding to the already stored substantially identical data, wherein the requesting device can recover at least a portion of the data based on the second unique retrieval matrix of the plurality of sets of encoded data slices.
13 Citations
14 Claims
-
1. A method for execution by a processing module of a computing device, the method comprises:
-
receiving, from a requesting device, a data storage request that includes data for storage; determining, by the processing module, whether substantially identical data is currently stored in a dispersed storage network (DSN) memory as a plurality of sets of encoded data slices, wherein the identical data was encoded in accordance with a dispersed storage error encoding function to produce the plurality of sets of encoded data slices, and wherein the substantially identical data is recoverable based on a unique retrieval matrix of the plurality of sets of encoded data slices; and when the substantially identical data is stored in the DSN memory; generating, for the requesting device, a second unique retrieval matrix of the plurality of sets of encoded data slices, wherein the requesting device can recover at least a portion of the data based on the second unique retrieval matrix of the plurality of sets of encoded data slices, wherein; the unique retrieval matrix including; for a first set of encoded data slices of the plurality of sets of encoded data slices, identity of a first sub-set of encoded data slices of the first set of encoded data slices; and for a second set of encoded data slices of the plurality of sets of encoded data slices, identity of a first sub-set of encoded data slices of the second set of encoded data slices; and the second unique retrieval matrix including; for the first set of encoded data slices of the plurality of sets of encoded data slices, identity of a second sub-set of encoded data slices of the first set of encoded data slices; and for the second set of encoded data slices of the plurality of sets of encoded data slices, identity of a second sub-set of encoded data slices of the second set of encoded data slices, wherein each sub-set of encoded data slices includes a least a decode threshold number of encoded data slices. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A computing device comprises:
-
an interface; and a processing module operable to; receive, from a requesting device via the interface, a data storage request that includes data for storage; determine whether substantially identical data is currently stored in a dispersed storage network (DSN) memory as a plurality of sets of encoded data slices, wherein the identical data was encoded in accordance with a dispersed storage error encoding function to produce the plurality of sets of encoded data slices, and wherein the substantially identical data is recoverable based on a unique retrieval matrix of the plurality of sets of encoded data slices; and when the substantially identical data is stored in the DSN memory, generate, for the requesting device, a second unique retrieval matrix of the plurality of sets of encoded data slices, wherein the requesting device can recover at least a portion of the data based on the second unique retrieval matrix of the plurality of sets of encoded data slices, wherein; the unique retrieval matrix including; for a first set of encoded data slices of the plurality of sets of encoded data slices, identity of a first sub-set of encoded data slices of the first set of encoded data slices; and for a second set of encoded data slices of the plurality of sets of encoded data slices, identity of a first sub-set of encoded data slices of the second set of encoded data slices; and the second unique retrieval matrix including; for the first set of encoded data slices of the plurality of sets of encoded data slices, identity of a second sub-set of encoded data slices of the first set of encoded data slices; and for the second set of encoded data slices of the plurality of sets of encoded data slices, identity of a second sub-set of encoded data slices of the second set of encoded data slices, wherein each sub-set of encoded data slices includes a least a decode threshold number of encoded data slices. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
Specification