×

Optimizing the de-duplication rate for a backup stream

  • US 8,315,985 B1
  • Filed: 12/18/2008
  • Issued: 11/20/2012
  • Est. Priority Date: 12/18/2008
  • Status: Active Grant
First Claim
Patent Images

1. A method comprising:

  • generating extent group information, whereinthe generating uses metadata extracted from a partition within a backup stream,the extent group information comprisesinformation identifying one or more extent groups associated with a single data file of a plurality of data files,the data files are arranged within the backup stream, andeach of the one or more extent groups comprisesa plurality of extents associated with the single data file;

    prepending the extent group information to the backup stream;

    modifying an extent mapping by translating at least one offset relative to the partition into at least one offset relative to the backup stream, whereinthe extent mapping comprisesinformation identifying the data files;

    processing the partition within the backup stream, whereinthe processing comprisessorting the data files according to a starting extent location of each data file of the data files,identifying a first data file within the backup stream, whereinthe identifying the first data file locates the starting extent of the first data file, using the extent group information,identifying a second data file within the backup stream, whereinthe identifying the second data file locates the starting extent of the second data file, using the extent group information, anddetermining whether the second data file is redundant, whereinthe determining comprises 

    determining whether the second data file is a redundant data file of the first data file,the starting extent of the each data file of the data files is identified by a first extent group within the extent group information, andthe first extent group comprises 

    information identifying one or more extents of the each data file of the data files; and

    de-duplicating the backup stream, whereinthe de-duplicating is performed in response to a determination that the second data file is redundant, andthe de-duplicating comprisesremoving at least one extent group associated with the second data file from the backup stream.

View all claims
  • 7 Assignments
Timeline View
Assignment View
    ×
    ×