Managing data storage in a set of storage systems using usage counters

US 9,830,101 B2
Filed: 08/07/2014
Issued: 11/28/2017
Est. Priority Date: 09/11/2013
Status: Active Grant

First Claim

Patent Images

1. A computer implemented method for data access in a storage infrastructure, the storage infrastructure comprising a host system connected to at least a first storage system and a second storage system, the storage infrastructure further comprising a de-duplication module maintaining a data structure comprising one or more entries, each entry of the one or more entries comprising a hash value, a data location, an identifier, a first usage count and a second usage count for a data chunk, wherein the first usage count and the second usage count are associated with the first storage system and the second storage system, respectively, the first storage system and the second storage system comprising a first reference table and a second reference table, respectively, the method comprising:

receiving, by the first storage system from the host system, a write request for storing the data chunk, wherein the write request is indicative of a first identifier of the data chunk;

calculating, by the first storage system, a hash value of the data chunk using a hash function;

determining, by the first storage system, a first storage location for the data chunk in the first storage system;

sending, by the first storage system, a write message including the hash value, the first identifier and the first storage location to the de-duplication module;

determining, by the de-duplication module, whether the hash value exists in the data structure;

responsive to the hash value existing in the data structure, incrementing, by the de-duplication module, the first usage count of the data chunk;

responsive to the hash value failing to exist in the data structure, adding, by the de-duplication module, an entry to the data structure comprising the hash value, the first storage location, the first identifier, the first usage count set to one and the second usage count set to zero;

receiving, by the first storage system, a response message from the de-duplication module, wherein;

responsive to the hash value existing in the data structure, the response message comprising a second storage location, a second identifier, the first usage count and the second usage count associated with the hash value and wherein;

responsive to a determination that the first usage count is higher than a predetermined maximum usage value, storing, by the first storage system, the data chunk in the first storage location, thereby duplicating the data chunk and adding an entry in the first reference table including the first identifier and the first storage location, andresponsive to a determination that the first usage count fails to be higher than the predetermined maximum usage value, adding, by the first storage system, an entry in the first reference table with the first identifier, the second storage location and the second identifier; and

responsive to the hash value failing to exist in the data structure, the response message comprises instructions for storing the data chunk in the first storage location and storing, by the first storage system, the data chunk in the first storage location and adding an entry in the first reference table including the first identifier and the first storage location.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present invention relates to a method for data access in a storage infrastructure. The storage infrastructure comprises a host system connected to at least a first storage system and a second storage system. The first storage system receives, from the host system, a write request for storing a data chunk, the write request is indicative of a first identifier of the data chunk. The first storage system calculates a hash value of the received data chunk using a hash function. The first storage system determines a first storage location in the first storage system of the data chunk and sends a write message including the hash value, the first identifier and the first storage location to the de-duplication module. The de-duplication module determines whether the hash value exists in the data structure.

21 Citations

View as Search Results

20 Claims

1. A computer implemented method for data access in a storage infrastructure, the storage infrastructure comprising a host system connected to at least a first storage system and a second storage system, the storage infrastructure further comprising a de-duplication module maintaining a data structure comprising one or more entries, each entry of the one or more entries comprising a hash value, a data location, an identifier, a first usage count and a second usage count for a data chunk, wherein the first usage count and the second usage count are associated with the first storage system and the second storage system, respectively, the first storage system and the second storage system comprising a first reference table and a second reference table, respectively, the method comprising:
- receiving, by the first storage system from the host system, a write request for storing the data chunk, wherein the write request is indicative of a first identifier of the data chunk;
  
  calculating, by the first storage system, a hash value of the data chunk using a hash function;
  
  determining, by the first storage system, a first storage location for the data chunk in the first storage system;
  
  sending, by the first storage system, a write message including the hash value, the first identifier and the first storage location to the de-duplication module;
  
  determining, by the de-duplication module, whether the hash value exists in the data structure;
  
  responsive to the hash value existing in the data structure, incrementing, by the de-duplication module, the first usage count of the data chunk;
  
  responsive to the hash value failing to exist in the data structure, adding, by the de-duplication module, an entry to the data structure comprising the hash value, the first storage location, the first identifier, the first usage count set to one and the second usage count set to zero;
  
  receiving, by the first storage system, a response message from the de-duplication module, wherein;
  
  responsive to the hash value existing in the data structure, the response message comprising a second storage location, a second identifier, the first usage count and the second usage count associated with the hash value and wherein;
  
  responsive to a determination that the first usage count is higher than a predetermined maximum usage value, storing, by the first storage system, the data chunk in the first storage location, thereby duplicating the data chunk and adding an entry in the first reference table including the first identifier and the first storage location, andresponsive to a determination that the first usage count fails to be higher than the predetermined maximum usage value, adding, by the first storage system, an entry in the first reference table with the first identifier, the second storage location and the second identifier; and
  
  responsive to the hash value failing to exist in the data structure, the response message comprises instructions for storing the data chunk in the first storage location and storing, by the first storage system, the data chunk in the first storage location and adding an entry in the first reference table including the first identifier and the first storage location.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1, further comprising:
    - after determining the first storage location and before sending the write message, storing, by the first storage system, the data chunk in the determined first storage location and adding an entry to the first reference table including the first identifier and the first storage location, wherein the sending of the write message is performed after a predefined time period; and
      
      responsive to the hash value existing in the data structure, the response message further indicating instructions to delete the data chunk stored in the first storage location, deleting, by the first storage system, the data chunk from the first storage system.
  - 3. The method of claim 1, wherein, responsive to the hash value existing in the data structure, the method further comprising:
    - responsive to a determination that the first usage count is higher than the second usage count and responsive to the first usage count being lower than or equal to the predetermined maximum usage value;
      
      sending, by the first storage system, a request to the de-duplication module for moving the data chunk from the second storage location in the second storage system to the first storage location,adding, by the de-duplication module, an entry in the data structure including the first identifier and the first storage location,adding, by the first storage system, an entry in the first reference table including the first identifier and the first storage location, andadding, by the second storage system, an entry in the second reference table indicating the first storage location, the first identifier and the second identifier.
  - 4. The method of claim 1, further comprising:
    - receiving, by the first storage system, a read request for reading the data chunk having the first identifier;
      
      determining, by the first storage system, whether the first identifier exists in association with the second storage location in the first reference table and responsive to determining that the first identifier exists in association with the second storage location in the first reference table, sending to the second storage system that includes the second storage location, a request including the second storage location and the second identifier for obtaining the data chunk from the second storage system;
      
      determining, by the first storage system, using the first reference table whether the first identifier exists in association with the first storage location and whether the first storage location is in the first storage system and responsive to determining that the first identifier exists in association with the first storage location and that the first storage location is in the first storage system, reading the data chunk from the first storage location of the first storage system; and
      
      sending, by the first storage system, the data chunk to the host system.
  - 5. The method of claim 1, further comprising:
    - receiving, by the first storage system, from the host system a delete request for deleting the data chunk having the first identifier, wherein the data chunk is stored in the first storage location;
      
      sending, by the first storage system, a delete message to the de-duplication module including the first storage location and the first identifier of the data chunk;
      
      decrementing, by the de-duplication module, the first usage count;
      
      receiving, by the first storage system, a response message from the de-duplication module indicating the decremented first usage count and the second usage count of the data chunk;
      
      determining, by the first storage system, whether the first usage count is lower than or equal to a preset minimum usage count and the second usage count is lower than or equal to than the preset minimum usage count;
      
      responsive to the first usage count being lower than or equal to the preset minimum usage count and the second usage count being lower than or equal to the preset minimum usage count, deleting, by the first storage system, the data chunk from the first storage system;
      
      sending, by the first storage system, a request to the second storage system for deleting an entry in the second reference table associated with the data chunk; and
      
      sending, by the first storage system, a request to the de-duplication module for deleting an entry in the data structure associated with the data chunk.
  - 6. The method of claim 5, wherein the minimum usage count is equal to zero.
  - 7. The method of claim 1, further comprising:
    - receiving, by the first storage system, from the host system a delete request for deleting the data chunk, wherein the data chunk is stored in the second storage location;
      
      determining, by the first storage system, whether the first usage count is equal to zero; and
      
      responsive to the first usage count being equal to zero, deleting, by the first storage system, an entry of the first reference table comprising the first identifier.
  - 8. The method of claim 7, further comprising:
    - determining, by the first storage system, whether the first usage count is greater than zero and the second usage count is equal to zero; and
      
      responsive to the first usage count being greater than zero and the second usage count being equal to zero;
      
      sending, by the first storage system, a request to the de-duplication module for moving the data chunk from the second storage location in the second storage system to the first storage location,updating, by the de-duplication module, an entry in the data structure associated with the data chunk to indicate that the data chunk is stored in the first storage system, andadding, by the first storage system, an entry in the first reference table including the first identifier and the first storage location.

9. A computer system comprising:
- a first storage system comprising a physical processor and a physical memory coupled to the physical processor, and a de-duplication module further comprising a physical processor and a physical memory coupled to the physical processor, wherein the memories comprise instructions which, when executed by the physical processors, cause the physical processors to;
  
  receive, by the first storage system, a write request for storing a data chunk, wherein the write request is indicative of a first identifier of the data chunk;
  
  calculate, by the first storage system, a hash value of the data chunk using a hash function;
  
  determine, by the first storage system, a first storage location for the data chunk in the first storage system;
  
  send, by the first storage system, a write message including the hash value, the first identifier and the first storage location to the de-duplication module;
  
  determine, by the de-duplication module, whether the hash value exists in a data structure;
  
  responsive to the hash value existing in the data structure, increment, by the de-duplication module, a first usage count of the data chunk;
  
  responsive to the hash value failing to exist in the data structure, add, by the de-duplication module, an entry to the data structure comprising the hash value, the first identifier, the first storage location, the first usage count set to one and a second usage count set to zero;
  
  receive, by the first storage system, a response message from the de-duplication module, wherein;
  
  responsive to the hash value existing in the data structure, the response message comprising a second storage location, a second identifier, the first usage count and the second usage count associated with the hash value and wherein;
  
  responsive to a determination that the first usage count is higher than a predetermined maximum usage value, store, by the first storage system, the data chunk in the first storage location, thereby duplicating the data chunk and adding an entry in the first reference table including the first identifier and the first storage location, andresponsive to a determination that the first usage count fails to be higher than the predetermined maximum usage value, add, by the first storage system, an entry in the first reference table with the first identifier, the second storage location and the second identifier; and
  
  responsive to the hash value failing to exist in the data structure, the response message comprises instructions for storing the data chunk in the first storage location and store, by the first storage system, the data chunk and adding an entry in the first reference table including the first identifier and the first storage location.
- View Dependent Claims (10, 11, 12, 13, 14)
- - 10. The computer system of claim 9, wherein the instructions further cause the physical processors to:
    - after determining the first storage location and before sending the write message, store, by the first storage system, the data chunk in the determined first storage location and adding an entry to the first reference table including the first identifier and the first storage location, wherein the sending of the write message is performed after a predefined time period; and
      
      responsive to the hash value existing in the data structure, the response message further indicating instructions to delete the data chunk stored in the first storage location, delete, by the first storage system, the data chunk from the first storage system.
  - 11. The computer system of claim 9, wherein, responsive to the hash value existing in the data structure, the instructions further cause the physical processors to:
    - responsive to a determination that the first usage count is higher than the second usage count and responsive to the first usage count being lower than or equal to the predetermined maximum usage value;
      
      send, by the first storage system, a request to the de-duplication module for moving the data chunk from the second storage location in the second storage system to the first storage location,add, by the de-duplication module, an entry in the data structure including the first identifier and the first storage location,add, by the first storage system, an entry in the first reference table including the first identifier and the first storage location, andadd, by the second storage system, an entry in the second reference table indicating the first storage location, the first identifier and the second identifier.
  - 12. The computer system of claim 9, wherein the instructions further cause the physical processors to:
    - receive, by the first storage system, a read request for reading the data chunk having the first identifier;
      
      determine, by the first storage system, that the first identifier exists in association with the second storage location in the first reference table and, responsive to determining that the first identifier exists in association with the second storage location in the first reference table, sending to the second storage system that includes the second storage location, a request including the second storage location and the second identifier for obtaining the data chunk from the second storage system;
      
      determine, by the first storage system, using the first reference table whether the first identifier exists in association with the first storage location and whether the first storage location is in the first storage system and responsive to determining that the first identifier exists in association with the first storage location and that the first storage location is in the first storage system, reading the data chunk from the first storage location of the first storage system; and
      
      send, by the first storage system, the data chunk to the host system.
  - 13. The computer system of claim 9, wherein the instructions further cause the physical processors to:
    - receive, by the first storage system, from the host system a delete request for deleting the data chunk having the first identifier, wherein the data chunk is stored in the first storage location;
      
      send, by the first storage system, a delete message to the de-duplication module including the first storage location and the first identifier of the data chunk;
      
      decrement, by the de-duplication module, the first usage count;
      
      receive, by the first storage system, a response message from the de-duplication module indicating the decremented first usage count and the second usage count of the data chunk;
      
      determine, by the first storage system, whether the first usage count is lower than or equal to a preset minimum usage count and the second usage count is lower than or equal to the preset minimum usage count;
      
      responsive to the first usage count being lower than or equal to the preset minimum usage count and the second usage count being lower than or equal to the preset minimum usage count, delete, by the first storage system, the data chunk from the first storage system;
      
      send, by the first storage system, a request to the second storage system for deleting an entry in the second reference table associated with the data chunk; and
      
      send, by the first storage system, a request to the de-duplication module for deleting an entry in the data structure associated with the data chunk.
  - 14. The computer system of claim 9, wherein the instructions further cause the physical processors to:
    - receive, by the first storage system, from the host system a delete request for deleting the data chunk, wherein the data chunk is stored in the second storage location;
      
      determine, by the first storage system, whether the first usage count is equal to zero; and
      
      responsive to the first usage count being equal to zero, delete, by the first storage system, an entry of the first reference table comprising the first identifier.

15. A computer program product comprising a non-transitory computer readable storage medium having a computer readable program stored therein, wherein the computer readable program, when executed in a first storage system or a de-duplication module, causes the first storage system or the de-duplication device to:
- receive, by the first storage system, a write request for storing a data chunk, wherein the write request is indicative of a first identifier of the data chunk;
  
  calculate, by the first storage system, a hash value of the data chunk using a hash function;
  
  determine, by the first storage system, a first storage location for the data chunk in the first storage system;
  
  send, by the first storage system, a write message including the hash value, a first identifier and the first storage location to the de-duplication module;
  
  determine, by the de-duplication module, whether the hash value exists in a data structure;
  
  responsive to the hash value existing in the data structure, increment, by the de-duplication module, a first usage count of the data chunk;
  
  responsive to the hash value failing to exist in the data structure, add, by the de-duplication module, an entry to the data structure comprising the hash value, the first identifier, the first storage location, the first usage count set to one and a second usage count set to zero;
  
  receive, by the first storage system, a response message from the de-duplication module, wherein;
  
  responsive to the hash value existing in the data structure, the response message comprising a second storage location, a second identifier, the first usage count and the second usage count associated with the hash value and wherein;
  
  responsive to a determination that the first usage count is higher than a predetermined maximum usage value, store, by the first storage system, the data chunk in the first storage location, thereby duplicating the data chunk and adding an entry in the first reference table including the first identifier and the first storage location, andresponsive to a determination that the first usage count fails to be higher than the predetermined maximum usage value, add, by the first storage system, an entry in the first reference table with the first identifier, the second storage location and the second identifier; and
  
  responsive to the hash value failing to exist in the data structure, the response message comprises instructions for storing the data chunk in the first storage location and store, by the first storage system, the data chunk and adding an entry in the first reference table including the first identifier and the first storage location.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. The computer program product of claim 15, wherein the computer readable program further causes the first storage system or the de-duplication device to:
    - after determining the first storage location and before sending the write message, store, by the first storage system, the data chunk in the determined first storage location and adding an entry to the first reference table including the first identifier and the first storage location, wherein the sending of the write message is performed after a predefined time period; and
      
      responsive to the hash value existing in the data structure, the response message further indicating instructions to delete the data chunk stored in the first storage location, delete, by the first storage system, the data chunk from the first storage system.
  - 17. The computer program product of claim 15, wherein, responsive to the hash value existing in the data structure, the computer readable program further causes the first storage system or the de-duplication device to:
    - responsive to a determination that the first usage count is higher than the second usage count and responsive to the first usage count being lower than or equal to the predetermined maximum usage value;
      
      send, by the first storage system, a request to the de-duplication module for moving the data chunk from the second storage location in the second storage system to the first storage location,add, by the de-duplication module, an entry in the data structure including the first identifier and the first storage location,add, by the first storage system, an entry in the first reference table including the first identifier and the first storage location, andadd, by the second storage system, an entry in the second reference table indicating the first storage location, the first identifier and the second identifier.
  - 18. The computer program product of claim 15, wherein the computer readable program further causes the first storage system or the de-duplication device to:
    - receive, by the first storage system, a read request for reading the data chunk having the first identifier;
      
      determine, by the first storage system, that the first identifier exists in association with the second storage location in the first reference table and, responsive to determining that the first identifier exists in association with the second storage location in the first reference table, sending to the second storage system that includes the second storage location, a request including the second storage location and the second identifier for obtaining the data chunk from the second storage system;
      
      determine, by the first storage system, using the first reference table whether the first identifier exists in association with the first storage location and whether the first storage location is in the first storage system and responsive to determining that the first identifier exists in association with the first storage location and that the first storage location is in the first store system, reading the data chunk from the first storage location of the first storage system; and
      
      send, by the first storage system, the data chunk to the host system.
  - 19. The computer program product of claim 15, wherein the computer readable program further causes the first storage system or the de-duplication device to:
    - receive, by the first storage system, from the host system a delete request for deleting the data chunk having the first identifier, wherein the data chunk is stored in the first storage location;
      
      send, by the first storage system, a delete message to the de-duplication module including the first storage location and the first identifier of the data chunk;
      
      decrement, by the de-duplication module, the first usage count;
      
      receive, by the first storage system, a response message from the de-duplication module indicating the decremented first usage count and the second usage count of the data chunk;
      
      determine, by the first storage system, whether the first usage count is lower than or equal to a preset minimum usage count and the second usage count is lower than or equal to the preset minimum usage count;
      
      responsive to the first usage count being lower than or equal to the preset minimum usage count and the second usage count being lower than or equal to the preset minimum usage count, delete, by the first storage system, the data chunk from the first storage system;
      
      send, by the first storage system, a request to the second storage system for deleting an entry in the second reference table associated with the data chunk; and
      
      send, by the first storage system, a request to the de-duplication module for deleting an entry in the data structure associated with the data chunk.
  - 20. The computer program product of claim 15, wherein the computer readable program further causes the first storage system or the de-duplication device to:
    - receive, by the first storage system, from the host system a delete request for deleting the data chunk, wherein the data chunk is stored in the second storage location;
      
      determine, by the first storage system, whether the first usage count is equal to zero; and
      
      responsive to the first usage count being equal to zero, delete, by the first storage system, an entry of the first reference table comprising the first identifier.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Christ, Achim, Haustein, Nils, Mueller-Wicke, Dominic, Winarski, Daniel J.
Primary Examiner(s)
Mahmoudi, Tony
Assistant Examiner(s)
Le, Michael

Application Number

US14/453,756
Publication Number

US 20150074065A1
Time in Patent Office

1,209 Days
Field of Search

707692
US Class Current
CPC Class Codes

G06F 3/0608   Saving storage space on sto...

G06F 3/0641   De-duplication techniques

G06F 3/067   Distributed or networked st...

Managing data storage in a set of storage systems using usage counters

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

21 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Managing data storage in a set of storage systems using usage counters

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

21 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links