Representing de-duplicated file data
First Claim
1. A method comprising:
- receiving an indication of an input identifying a subset of data to be provided as output, wherein the subset of data has been stored in de-duplicated form;
identifying a plurality of containers that includes one or more data segments comprising the subset of data, wherein each of the plurality of containers may include one or more segments not included in the subset of data;
determining, for each of the plurality of containers, whether corresponding container data has already been included in an output stream based at least in part on a data structure comprising for each of at least a subset of container stored on the de-duplicated storage system an indication of whether or not the corresponding container data has been included in the output stream;
retrieving a corresponding de-duplicated container data for each container that is determined to comprise container data that has not been previously included in the output stream;
providing to the output stream the retrieved container data for each container that includes container data that has not been previously included in the output stream; and
for each container from which corresponding container data is included in the output, updating the corresponding value in the data structure.
10 Assignments
0 Petitions
Accused Products
Abstract
A subset of de-duplicated is outputted. In some embodiments, the output includes a subset of data stored in de-duplicated form in a plurality of containers each including a plurality of data segments comprising the data. For each container that includes one or more data segments comprising the subset, a corresponding container data is included in the output. Each container may include one or more segments not included in the subset. For each container the corresponding container data of which is included in the output, a corresponding value in a data structure including for each container stored on the de-duplicated storage system a data value indicating whether or not the corresponding container data has been included in the output is updated.
-
Citations
24 Claims
-
1. A method comprising:
-
receiving an indication of an input identifying a subset of data to be provided as output, wherein the subset of data has been stored in de-duplicated form; identifying a plurality of containers that includes one or more data segments comprising the subset of data, wherein each of the plurality of containers may include one or more segments not included in the subset of data; determining, for each of the plurality of containers, whether corresponding container data has already been included in an output stream based at least in part on a data structure comprising for each of at least a subset of container stored on the de-duplicated storage system an indication of whether or not the corresponding container data has been included in the output stream; retrieving a corresponding de-duplicated container data for each container that is determined to comprise container data that has not been previously included in the output stream; providing to the output stream the retrieved container data for each container that includes container data that has not been previously included in the output stream; and for each container from which corresponding container data is included in the output, updating the corresponding value in the data structure. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A data storage system, comprising:
-
a processor configured to provide as output a subset of de-duplicated data stored in de-duplicated form in a plurality of containers each comprising a plurality of data segments comprising the de-duplicated data, including by; receiving an indication of an input identifying a subset of data to be provided as output, wherein the subset of data has been stored in de-duplicated form; identifying a plurality of containers that includes one or more data segments comprising the subset of data, wherein each of the plurality of containers may include one or more segments not included in the subset of data; determining, for each of the plurality of containers, whether corresponding container data has already been included in an output stream based at least in part on a data structure comprising for each of at least a subset of container stored on the de-duplicated storage system an indication of whether or not the corresponding container data has been included in the output stream; retrieving a corresponding de-duplicated container data for each container that is determined to comprise container data that has not been previously included in the output stream; providing to the output stream the retrieved container data for each container that includes container data that has not been previously included in the output stream; for each container from which corresponding container data is included in the output, updating the corresponding value in the data structure comprising; and a memory coupled to the processor and configured to store the data structure. - View Dependent Claims (20, 21, 22, 23)
-
-
24. A computer program product for providing as output a subset of data stored in de-duplicated form in a plurality of containers each comprising a plurality of data segments comprising the data, the computer program product being embodied in a non-transitory computer readable storage medium and comprising computer instructions for:
-
receiving an indication of an input identifying a subset of data to be provided as output, wherein the subset of data has been stored in de-duplicated form; identifying a plurality of containers that includes one or more data segments comprising the subset of data, wherein each of the plurality of containers may include one or more segments not included in the subset of data; determining, for each of the plurality of containers, whether corresponding container data has already been included in an output stream based at least in part on a data structure comprising for each of at least a subset of container stored on the de-duplicated storage system an indication of whether or not the corresponding container data has been included in the output stream; retrieving a corresponding de-duplicated container data for each container that is determined to comprise container data that has not been previously included in the output stream; providing to the output stream the retrieved container data for each container that includes container data that has not been previously included in the output stream; and for each container from which corresponding container data is included in the output, updating the corresponding value in the data structure.
-
Specification