Managing data storage in a set of storage systems using usage counters
First Claim
1. A computer implemented method for data access in a storage infrastructure, the storage infrastructure comprising a host system connected to at least a first storage system and a second storage system, the storage infrastructure further comprising a de-duplication module maintaining a data structure comprising one or more entries, each entry of the one or more entries comprising a hash value, a data location, an identifier, a first usage count and a second usage count for a data chunk, wherein the first usage count and the second usage count are associated with the first storage system and the second storage system, respectively, the first storage system and the second storage system comprising a first reference table and a second reference table, respectively, the method comprising:
- receiving, by the first storage system from the host system, a write request for storing the data chunk, wherein the write request is indicative of a first identifier of the data chunk;
calculating, by the first storage system, a hash value of the data chunk using a hash function;
determining, by the first storage system, a first storage location for the data chunk in the first storage system;
sending, by the first storage system, a write message including the hash value, the first identifier and the first storage location to the de-duplication module;
determining, by the de-duplication module, whether the hash value exists in the data structure;
responsive to the hash value existing in the data structure, incrementing, by the de-duplication module, the first usage count of the data chunk;
responsive to the hash value failing to exist in the data structure, adding, by the de-duplication module, an entry to the data structure comprising the hash value, the first storage location, the first identifier, the first usage count set to one and the second usage count set to zero;
receiving, by the first storage system, a response message from the de-duplication module, wherein;
responsive to the hash value existing in the data structure, the response message comprising a second storage location, a second identifier, the first usage count and the second usage count associated with the hash value and wherein;
responsive to a determination that the first usage count is higher than a predetermined maximum usage value, storing, by the first storage system, the data chunk in the first storage location, thereby duplicating the data chunk and adding an entry in the first reference table including the first identifier and the first storage location, andresponsive to a determination that the first usage count fails to be higher than the predetermined maximum usage value, adding, by the first storage system, an entry in the first reference table with the first identifier, the second storage location and the second identifier; and
responsive to the hash value failing to exist in the data structure, the response message comprises instructions for storing the data chunk in the first storage location and storing, by the first storage system, the data chunk in the first storage location and adding an entry in the first reference table including the first identifier and the first storage location.
1 Assignment
0 Petitions
Accused Products
Abstract
The present invention relates to a method for data access in a storage infrastructure. The storage infrastructure comprises a host system connected to at least a first storage system and a second storage system. The first storage system receives, from the host system, a write request for storing a data chunk, the write request is indicative of a first identifier of the data chunk. The first storage system calculates a hash value of the received data chunk using a hash function. The first storage system determines a first storage location in the first storage system of the data chunk and sends a write message including the hash value, the first identifier and the first storage location to the de-duplication module. The de-duplication module determines whether the hash value exists in the data structure.
21 Citations
20 Claims
-
1. A computer implemented method for data access in a storage infrastructure, the storage infrastructure comprising a host system connected to at least a first storage system and a second storage system, the storage infrastructure further comprising a de-duplication module maintaining a data structure comprising one or more entries, each entry of the one or more entries comprising a hash value, a data location, an identifier, a first usage count and a second usage count for a data chunk, wherein the first usage count and the second usage count are associated with the first storage system and the second storage system, respectively, the first storage system and the second storage system comprising a first reference table and a second reference table, respectively, the method comprising:
-
receiving, by the first storage system from the host system, a write request for storing the data chunk, wherein the write request is indicative of a first identifier of the data chunk; calculating, by the first storage system, a hash value of the data chunk using a hash function; determining, by the first storage system, a first storage location for the data chunk in the first storage system; sending, by the first storage system, a write message including the hash value, the first identifier and the first storage location to the de-duplication module; determining, by the de-duplication module, whether the hash value exists in the data structure; responsive to the hash value existing in the data structure, incrementing, by the de-duplication module, the first usage count of the data chunk; responsive to the hash value failing to exist in the data structure, adding, by the de-duplication module, an entry to the data structure comprising the hash value, the first storage location, the first identifier, the first usage count set to one and the second usage count set to zero; receiving, by the first storage system, a response message from the de-duplication module, wherein; responsive to the hash value existing in the data structure, the response message comprising a second storage location, a second identifier, the first usage count and the second usage count associated with the hash value and wherein; responsive to a determination that the first usage count is higher than a predetermined maximum usage value, storing, by the first storage system, the data chunk in the first storage location, thereby duplicating the data chunk and adding an entry in the first reference table including the first identifier and the first storage location, and responsive to a determination that the first usage count fails to be higher than the predetermined maximum usage value, adding, by the first storage system, an entry in the first reference table with the first identifier, the second storage location and the second identifier; and responsive to the hash value failing to exist in the data structure, the response message comprises instructions for storing the data chunk in the first storage location and storing, by the first storage system, the data chunk in the first storage location and adding an entry in the first reference table including the first identifier and the first storage location. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A computer system comprising:
-
a first storage system comprising a physical processor and a physical memory coupled to the physical processor, and a de-duplication module further comprising a physical processor and a physical memory coupled to the physical processor, wherein the memories comprise instructions which, when executed by the physical processors, cause the physical processors to; receive, by the first storage system, a write request for storing a data chunk, wherein the write request is indicative of a first identifier of the data chunk; calculate, by the first storage system, a hash value of the data chunk using a hash function; determine, by the first storage system, a first storage location for the data chunk in the first storage system; send, by the first storage system, a write message including the hash value, the first identifier and the first storage location to the de-duplication module; determine, by the de-duplication module, whether the hash value exists in a data structure; responsive to the hash value existing in the data structure, increment, by the de-duplication module, a first usage count of the data chunk; responsive to the hash value failing to exist in the data structure, add, by the de-duplication module, an entry to the data structure comprising the hash value, the first identifier, the first storage location, the first usage count set to one and a second usage count set to zero; receive, by the first storage system, a response message from the de-duplication module, wherein; responsive to the hash value existing in the data structure, the response message comprising a second storage location, a second identifier, the first usage count and the second usage count associated with the hash value and wherein; responsive to a determination that the first usage count is higher than a predetermined maximum usage value, store, by the first storage system, the data chunk in the first storage location, thereby duplicating the data chunk and adding an entry in the first reference table including the first identifier and the first storage location, and responsive to a determination that the first usage count fails to be higher than the predetermined maximum usage value, add, by the first storage system, an entry in the first reference table with the first identifier, the second storage location and the second identifier; and responsive to the hash value failing to exist in the data structure, the response message comprises instructions for storing the data chunk in the first storage location and store, by the first storage system, the data chunk and adding an entry in the first reference table including the first identifier and the first storage location. - View Dependent Claims (10, 11, 12, 13, 14)
-
-
15. A computer program product comprising a non-transitory computer readable storage medium having a computer readable program stored therein, wherein the computer readable program, when executed in a first storage system or a de-duplication module, causes the first storage system or the de-duplication device to:
-
receive, by the first storage system, a write request for storing a data chunk, wherein the write request is indicative of a first identifier of the data chunk; calculate, by the first storage system, a hash value of the data chunk using a hash function; determine, by the first storage system, a first storage location for the data chunk in the first storage system; send, by the first storage system, a write message including the hash value, a first identifier and the first storage location to the de-duplication module; determine, by the de-duplication module, whether the hash value exists in a data structure; responsive to the hash value existing in the data structure, increment, by the de-duplication module, a first usage count of the data chunk; responsive to the hash value failing to exist in the data structure, add, by the de-duplication module, an entry to the data structure comprising the hash value, the first identifier, the first storage location, the first usage count set to one and a second usage count set to zero; receive, by the first storage system, a response message from the de-duplication module, wherein; responsive to the hash value existing in the data structure, the response message comprising a second storage location, a second identifier, the first usage count and the second usage count associated with the hash value and wherein; responsive to a determination that the first usage count is higher than a predetermined maximum usage value, store, by the first storage system, the data chunk in the first storage location, thereby duplicating the data chunk and adding an entry in the first reference table including the first identifier and the first storage location, and responsive to a determination that the first usage count fails to be higher than the predetermined maximum usage value, add, by the first storage system, an entry in the first reference table with the first identifier, the second storage location and the second identifier; and responsive to the hash value failing to exist in the data structure, the response message comprises instructions for storing the data chunk in the first storage location and store, by the first storage system, the data chunk and adding an entry in the first reference table including the first identifier and the first storage location. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification