REPRESENTING DE-DUPLICATED FILE DATA
First Claim
1. A method comprising:
- receiving an indication identifying a subset of data to be provided as output, wherein the subset of data includes a first data segment;
identifying a first container that includes the first data segment;
determining whether corresponding container data associated with the first container has already been included in an output stream based at least in part on a data structure comprising for each of at least a subset of containers an indication of whether or not the corresponding container data has been included in the output stream; and
in the event that it is determined, based at least in part on the data structure, that the first container comprises container data that has not already been included in the output stream;
retrieving the corresponding container data associated with the first container and including the corresponding container data in the output stream, wherein the corresponding container data includes the one or more other segments that are not is included in the subset of data to be provided as output; and
updating a corresponding value in the data structure indicating that the container data associated with the first container has been included in the output stream.
9 Assignments
0 Petitions
Accused Products
Abstract
Providing a subset of de-duplicated as output is disclosed. In some embodiments, the output comprises a subset of data stored in de-duplicated form in a plurality of containers each comprising a plurality of data segments comprising the data. For each container that includes one or more data segments comprising the subset, a corresponding container data is included in the output. Each container may include one or more segments not included in the subset. For each container the corresponding container data of which is included in the output, a corresponding value in a data structure comprising for each container stored on the de-duplicated storage system a data value indicating whether or not the corresponding container data has been included in the output is updated.
16 Citations
17 Claims
-
1. A method comprising:
-
receiving an indication identifying a subset of data to be provided as output, wherein the subset of data includes a first data segment; identifying a first container that includes the first data segment; determining whether corresponding container data associated with the first container has already been included in an output stream based at least in part on a data structure comprising for each of at least a subset of containers an indication of whether or not the corresponding container data has been included in the output stream; and in the event that it is determined, based at least in part on the data structure, that the first container comprises container data that has not already been included in the output stream; retrieving the corresponding container data associated with the first container and including the corresponding container data in the output stream, wherein the corresponding container data includes the one or more other segments that are not is included in the subset of data to be provided as output; and updating a corresponding value in the data structure indicating that the container data associated with the first container has been included in the output stream. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A data storage system, comprising:
a processor configured to; receive an indication identifying a subset of data to be provided as output, wherein the subset of data includes a first data segment; identify a first container that includes the first data segment; determine whether corresponding container data associated with the first container has already been included in an output stream based at least in part on a data structure comprising for each of at least a subset of containers an indication of whether or not the corresponding container data has been included in the output stream; and in the event that it is determined, based at least in part on the data structure, that the first container comprises container data that has not already been included in the output stream; retrieve the corresponding container data associated with the first container and include the corresponding container data in the output stream, is wherein the corresponding container data includes the one or more other segments that are not included in the subset of data to be provided as output; and update a corresponding value in the data structure indicating that the container data associated with the first container has been included in the output stream; and a memory coupled to the processor and configured to store the data structure. - View Dependent Claims (8, 9, 10, 11, 12)
-
13. A computer program product embodied in a non-transitory computer readable storage medium and comprising computer instructions for:
-
receiving an indication identifying a subset of data to be provided as output, wherein the subset of data includes a first data segment; identifying a first container that includes the first data segment; determining whether corresponding container data associated with the first container has already been included in an output stream based at least in part on a data structure comprising for each of at least a subset of containers an indication of whether or not the corresponding container data has been included in the output stream; and in the event that it is determined, based at least in part on the data structure, that the first container comprises container data that has not already been included in the output stream; retrieving the corresponding container data associated with the first container and including the corresponding container data in the output stream, wherein the corresponding container data includes the one or more other segments that are not is included in the subset of data to be provided as output; and updating a corresponding value in the data structure indicating that the container data associated with the first container has been included in the output stream. - View Dependent Claims (14, 15, 16, 17)
-
Specification