Optimizing the de-duplication rate for a backup stream
First Claim
Patent Images
1. A method comprising:
- generating extent group information, whereinthe generating uses metadata extracted from a partition within a backup stream,the extent group information comprisesinformation identifying one or more extent groups associated with a single data file of a plurality of data files,the data files are arranged within the backup stream, andeach of the one or more extent groups comprisesa plurality of extents associated with the single data file;
prepending the extent group information to the backup stream;
modifying an extent mapping by translating at least one offset relative to the partition into at least one offset relative to the backup stream, whereinthe extent mapping comprisesinformation identifying the data files;
processing the partition within the backup stream, whereinthe processing comprisessorting the data files according to a starting extent location of each data file of the data files,identifying a first data file within the backup stream, whereinthe identifying the first data file locates the starting extent of the first data file, using the extent group information,identifying a second data file within the backup stream, whereinthe identifying the second data file locates the starting extent of the second data file, using the extent group information, anddetermining whether the second data file is redundant, whereinthe determining comprises
determining whether the second data file is a redundant data file of the first data file,the starting extent of the each data file of the data files is identified by a first extent group within the extent group information, andthe first extent group comprises
information identifying one or more extents of the each data file of the data files; and
de-duplicating the backup stream, whereinthe de-duplicating is performed in response to a determination that the second data file is redundant, andthe de-duplicating comprisesremoving at least one extent group associated with the second data file from the backup stream.
7 Assignments
0 Petitions
Accused Products
Abstract
A method and apparatus for optimizing a de-duplication rate for backup streams is described. In one embodiment, the method for optimizing data de-duplication using an extent mapping of a backup stream includes processing a backup stream to access an extent mapping associated with a plurality of data files, wherein the plurality of the data files are arranged within the backup stream and examining the extent mapping to identify at least one extent group within the backup stream, wherein the plurality of the data files are de-duplicated using at least one location of the at least one extent group.
64 Citations
18 Claims
-
1. A method comprising:
-
generating extent group information, wherein the generating uses metadata extracted from a partition within a backup stream, the extent group information comprises information identifying one or more extent groups associated with a single data file of a plurality of data files, the data files are arranged within the backup stream, and each of the one or more extent groups comprises a plurality of extents associated with the single data file; prepending the extent group information to the backup stream; modifying an extent mapping by translating at least one offset relative to the partition into at least one offset relative to the backup stream, wherein the extent mapping comprises information identifying the data files; processing the partition within the backup stream, wherein the processing comprises sorting the data files according to a starting extent location of each data file of the data files, identifying a first data file within the backup stream, wherein the identifying the first data file locates the starting extent of the first data file, using the extent group information, identifying a second data file within the backup stream, wherein the identifying the second data file locates the starting extent of the second data file, using the extent group information, and determining whether the second data file is redundant, wherein the determining comprises
determining whether the second data file is a redundant data file of the first data file,the starting extent of the each data file of the data files is identified by a first extent group within the extent group information, and the first extent group comprises
information identifying one or more extents of the each data file of the data files; andde-duplicating the backup stream, wherein the de-duplicating is performed in response to a determination that the second data file is redundant, and the de-duplicating comprises removing at least one extent group associated with the second data file from the backup stream. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. An apparatus comprising:
-
a central processing unit (CPU); and a manager configured to be coupled to the CPU and further configured to generate extent group information, wherein the generating uses metadata extracted from a partition within a backup stream, the extent group information comprises information identifying one or more extent groups associated with a single data file of a plurality of data files, the data files are arranged within the backup stream, and each of the one or more extent groups comprises a plurality of extents associated with the single data file; prepend the extent group information to the backup stream; modify an extent mapping by translating at least one offset relative to the partition into at least one offset relative to the backup stream, wherein the extent mapping comprises information identifying the data files; process the partition within the backup stream, wherein the manager is configured to process the partition within the backup stream by virtue of being configured to sort the data files according to a starting extent location of each data file of the data files, identify a first data file within the backup stream by locating the starting extent of the first data file, using the extent group information, identify a second data file within the backup stream by locating the starting extent of the second data file, using the extent group information, and determining whether the second data file is redundant by determining whether the second data file is a redundant data file of the first data file, wherein
the starting extent of the each data file of the data files is identified by a first extent group within the extent group information, and
the first extent group comprises
information identifying one or more extents of the each data file of the data files; andde-duplicate the backup stream, in response to a determination that the second data file is redundant, wherein the manager is configured to de-duplicate the backup stream by virtue of being configured to remove at least one extent group associated with the second data file from the backup stream. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. A system comprising:
-
a plurality of clients, wherein each client comprises a plurality of data files; a server, coupled to the plurality of clients and comprising a manager, configured to generate extent group information, wherein the group information comprises generate extent group information, wherein the generating uses metadata extracted from a partition within a backup stream, the extent group information comprises
information identifying one or more extent groups associated with a single data file of a plurality of data files,the data files are arranged within the backup stream, and each of the one or more extent groups comprises
a plurality of extents associated with the single data file;prepend the extent group information to the backup stream; modify an extent mapping by translating at least one offset relative to the partition into at least one offset relative to the backup stream, wherein the extent mapping comprises
information identifying the data files;process the partition within the backup stream, wherein the manager is configured to process the partition within the backup stream by virtue of being configured to
sort the data files according to a starting extent location of each data file of the data files,
identify a first data file within the backup stream by locating the starting extent of the first data file, using the extent group information,
identify a second data file within the backup stream by locating the starting extent of the second data file, using the extent group information, and
determining whether the second data file is redundant by determining whether the second data file is a redundant data file of the first data file, wherein
the starting extent of the each data file of the data files is identified by a first extent group within the extent group information, and
the first extent group comprises
information identifying one or more extents of the each data file of the data files; andde-duplicate the backup stream, in response to a determination that the second data file is redundant, wherein the manager is configured to de-duplicate the backup stream by virtue of being configured to
remove at least one extent group associated with the second data file from the backup stream; anda storage pool, coupled to the server and configured to store a de-duplicated backup image. - View Dependent Claims (14, 15, 16, 17, 18)
-
Specification