APPARATUS, SYSTEM, AND METHOD FOR IMPROVED DATA DEDUPLICATION
First Claim
1. An apparatus for generating, in a nonvolatile storage device, a hash of a data unit that is stored in the nonvolatile storage device, the apparatus comprising:
- an input module that is implemented on a nonvolatile storage device and that receives a hash request from a requesting entity, the hash request comprising a data unit identifier that identifies the data unit for which the hash is requested;
wherein the data unit identified by the data unit identifier is stored in the nonvolatile storage device, the nonvolatile storage device comprising a storage controller and a nonvolatile storage connected by a first communications connection, the nonvolatile storage device configured to connect to one or more external devices through a second communications connection that is separate from the first communications connection;
a hash module that is implemented on the nonvolatile storage device and that generates, within the nonvolatile storage device, a hash for the data unit identified by the data unit identifier;
wherein the hash identifies the data unit for which the hash is generated; and
a transmission module that is implemented on the nonvolatile storage device and that sends the hash to a receiving entity over the second communications connection.
7 Assignments
0 Petitions
Accused Products
Abstract
An apparatus, system, and method are disclosed for improved deduplication. The apparatus includes an input module, a hash module, and a transmission module that are implemented in a nonvolatile storage device. The input module receives hash requests from requesting entities that may be internal or external to the nonvolatile storage device; the hash requests include a data unit identifier that identifies the data unit for which the hash is requested. The hash module generates a hash for the data unit using a hash function. The hash is generated using the computing resources of the nonvolatile storage device. The transmission module sends the hash to a receiving entity when the input module receives the hash request. A deduplication agent uses the hash to determine whether or not the data unit is a duplicate of a data unit already stored in the storage system that includes the nonvolatile storage device.
-
Citations
29 Claims
-
1. An apparatus for generating, in a nonvolatile storage device, a hash of a data unit that is stored in the nonvolatile storage device, the apparatus comprising:
-
an input module that is implemented on a nonvolatile storage device and that receives a hash request from a requesting entity, the hash request comprising a data unit identifier that identifies the data unit for which the hash is requested; wherein the data unit identified by the data unit identifier is stored in the nonvolatile storage device, the nonvolatile storage device comprising a storage controller and a nonvolatile storage connected by a first communications connection, the nonvolatile storage device configured to connect to one or more external devices through a second communications connection that is separate from the first communications connection; a hash module that is implemented on the nonvolatile storage device and that generates, within the nonvolatile storage device, a hash for the data unit identified by the data unit identifier; wherein the hash identifies the data unit for which the hash is generated; and a transmission module that is implemented on the nonvolatile storage device and that sends the hash to a receiving entity over the second communications connection. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
-
-
17. An apparatus to perform operations for improved deduplication on a computing device, the apparatus comprising:
-
a receipt module for receiving a hash of a data unit from one or more remote computing devices that are connected to the computing device by a network, wherein the one or more remote computing devices generate the hash of the data unit and transmit the hash over the network; a duplicate module for determining, without touching the data unit, whether the data unit is a duplicate of an existing data unit in a storage system, wherein the determination is made using the hash provided by the one or more remote computing devices; a delete module for causing one or more nonvolatile storage devices in the storage system to maintain a single logical copy that is one of the data unit and the existing data unit in the storage system; and an update module for associating the data unit with the existing data unit in response to determining that the data unit is a duplicate of the existing data unit such that requests for the data unit and the existing data unit are directed to the logical copy of the data unit stored in the storage system. - View Dependent Claims (18, 19, 20, 21, 22)
-
-
23. A system for improved deduplication, the system comprising:
-
a deduplication agent that determines whether a data unit is a duplicate of an existing data unit in a storage system comprising one or more nonvolatile storage devices by using a hash of the data unit, the deduplication agent operating on a first computing device; a hash generation apparatus for generating the hash of the data unit, the hash generation apparatus operating on a second computing device remote from the first computing device and connected to the first computing device by a communications connection, the hash generation apparatus comprising; an input module that receives a hash request from a requesting entity, the hash request comprising a data unit identifier that identifies the data unit for which the hash is requested; a hash module that generates a hash for the data unit identified by the data unit identifier, wherein the hash identifies the data unit for which the hash is generated; and a transmission module that sends the hash to a receiving entity in response to the input module receiving the hash request. - View Dependent Claims (24, 25, 26, 27, 28, 29)
-
Specification