Deduplication in an extent-based architecture
First Claim
Patent Images
1. A method comprising:
- accessing a plurality of extent mapping entries, wherein each of the plurality of extent mapping entries maps an extent identifier to one or more data blocks;
while accessing the plurality of extent mapping entries, determining that a first data block comprises the same data as a second data block, wherein the first data block is associated with a first extent that is associated with a first of the plurality of extent mapping entries, wherein the second data block is associated with a second extent that is associated with a second of the plurality of extent mapping entries; and
in response to said determining that the first data block comprises the same data as the second data block,creating a third extent mapping entry that identifies the first extent and that comprises a reference to the second extent mapping entry, an offset identifying the location of the second data block within the second extent, an external reference count associated with the first extent, and an internal reference count associated with the first extent; and
disassociating the first data block from the first extent.
2 Assignments
0 Petitions
Accused Products
Abstract
It is determined that a first data block contains the same data as a second data block. The first data block is associated with a first extent and the second data block is associated with a second extent. In response to determining that the first data block contains the same data as the second data block, the second data block is associated with the first extent and the first data block is disassociated with the second extent.
-
Citations
19 Claims
-
1. A method comprising:
-
accessing a plurality of extent mapping entries, wherein each of the plurality of extent mapping entries maps an extent identifier to one or more data blocks; while accessing the plurality of extent mapping entries, determining that a first data block comprises the same data as a second data block, wherein the first data block is associated with a first extent that is associated with a first of the plurality of extent mapping entries, wherein the second data block is associated with a second extent that is associated with a second of the plurality of extent mapping entries; and in response to said determining that the first data block comprises the same data as the second data block, creating a third extent mapping entry that identifies the first extent and that comprises a reference to the second extent mapping entry, an offset identifying the location of the second data block within the second extent, an external reference count associated with the first extent, and an internal reference count associated with the first extent; and disassociating the first data block from the first extent. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A non-transitory machine readable medium having stored thereon instructions for data deduplication, the instructions to:
-
determine that a data block referenced by a first entry of a plurality of entries matches a data block referenced by a second entry of the plurality of entries, wherein each entry identifies an extent and each entry is associated with at least one volume of a storage system; in response to a determination that a data block referenced by the first entry matches a data block referenced by the second entry of the plurality of entries, select one of the first entry and the second entry as a donor extent and select the other one of the first entry and the second entry as a recipient extent; determine that a first reference count equals a predetermined value, wherein the first reference count is associated with the recipient extent; and in response to a determination that the first reference count equals the first predetermined value, indicate that a set of one or more data blocks are shared between the donor extent and the recipient extent. - View Dependent Claims (10, 11, 12, 13, 14)
-
-
15. An apparatus comprising:
-
a processor; and a machine readable storage medium having program code stored therein that is executable by the processor to cause the apparatus to, determine that a data block referenced by a first entry of a plurality of entries matches a data block referenced by a second entry of the plurality of entries, wherein each entry identifies an extent and each entry is associated with at least one volume of a storage system; in response to a determination that a data block referenced by the first entry matches a data block referenced by the second entry of the plurality of entries, select one of the first entry and the second entry as a donor extent and select the other one of the first entry and the second entry as a recipient extent; determine that a first reference count equals a predetermined value, wherein the first reference count is associated with the recipient extent; and in response to a determination that the first reference count equals the first predetermined value, indicate that a set of one or more data blocks are shared between the donor extent and the recipient extent. - View Dependent Claims (16, 17, 18, 19)
-
Specification