×

Method and apparatus for block level data de-duplication

  • US 8,200,923 B1
  • Filed: 12/31/2008
  • Issued: 06/12/2012
  • Est. Priority Date: 12/31/2008
  • Status: Active Grant
First Claim
Patent Images

1. A computer storage environment comprising:

  • at least one chunking/hashing unit that receives input data from at least one source, wherein the at least one chunking/hashing unit processes at least some of the input data to output a plurality of data blocks from the at least some of the input data and a content address for each of the plurality of data blocks, wherein a content address for a corresponding data block is generated based, at least in part, on the content of the corresponding data block; and

    a plurality of object addressable storage devices to store at least some of the plurality of data blocks output from the at least one chunking/hashing unit;

    wherein the computer storage environment comprises at least one processor programmed to, for each one of the plurality of data blocks output from the at least one chunking/hashing unit, make a determination as to which of the plurality of object addressable storage devices is to control storage of the one of the plurality of data blocks output from the at least one chunking/hashing unit; and

    wherein each of the plurality of object addressable storage devices comprises at least one processor programmed to, in response to receipt from the at least one chunking/hashing unit of a received one of the plurality of data blocks;

    for received data blocks having content addresses within a particular range, determine whether the received one of the plurality of data blocks is a duplicate of another data block previously stored on the computer storage environment by comparing a content address for the received one of the plurality of data blocks with a data structure including content addresses for data blocks previously stored on the computer storage environment, wherein the size of the particular range is selected to ensure that the data structure including content addresses within the particular range can fit within a memory of the object addressable storage;

    control storage of the received one of the plurality of data blocks on the computer storage environment when it is determined that the received one of the plurality of data blocks is not a duplicate of another data block previously stored on the computer storage environment; and

    control storage of information indicating that the received one of the plurality of data blocks is represented by data previously stored on the computer storage environment when it is determined that the received one of the plurality of data blocks is a duplicate of another data block previously stored on the computer storage environment.

View all claims
  • 9 Assignments
Timeline View
Assignment View
    ×
    ×