SYSTEM AND METHOD FOR CACHING HASHES FOR CO-LOCATED DATA IN A DEDUPLICATION DATA STORE

US 20130339319A1
Filed: 06/18/2013
Published: 12/19/2013
Est. Priority Date: 06/18/2012
Status: Active Grant

First Claim

Patent Images

1. A computerized method for caching hashes for deduplicated data in a deduplication data store, in which data is stored using a persist header comprising a set of hashes, wherein each hash in the set of hashes represents data stored in the deduplication data store after the persist header that is co-located with other data represented by the remaining hashes in the set of hashes, the computerized method comprising:

receiving, by a computing device, a request to read data from the deduplication data store;

identifying, by the computing device, in a first hash structure that is not stored in memory of the computing device, a persist header stored in a deduplication data store, wherein;

the persist header comprises a set of hashes that includes a hash that is indicative of the data the computing device requested to read; and

wherein each hash in the set of hashes represents data stored in the deduplication data store after the persist header that is co-located with other data represented by the remaining hashes in the set of hashes; and

caching, by the computing device, the set of hashes in a second hash structure stored in the memory of the computing device, whereby if the computing device requests to read additional data, the computing device can identify the additional data using the second hash structure if the additional data is represented by the persist header.

View all claims

8 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Systems and methods are provided for caching hashes for deduplicated data. A request to read data from the deduplication data store is received. A persist header stored in a deduplication data store is identified in a first hash structure that is not stored in memory of the computing device. The persist header comprises a set of hashes that includes a hash that is indicative of the data the computing device requested to read. Each hash in the set of hashes represents data stored in the deduplication data store after the persist header that is co-located with other data represented by the remaining hashes in the set of hashes. The set of hashes is cached in a second hash structure stored in the memory, whereby the computing device can identify the additional data using the second hash structure if the additional data is represented by the persist header.

Citations

11 Claims

1. A computerized method for caching hashes for deduplicated data in a deduplication data store, in which data is stored using a persist header comprising a set of hashes, wherein each hash in the set of hashes represents data stored in the deduplication data store after the persist header that is co-located with other data represented by the remaining hashes in the set of hashes, the computerized method comprising:
- receiving, by a computing device, a request to read data from the deduplication data store;
  
  identifying, by the computing device, in a first hash structure that is not stored in memory of the computing device, a persist header stored in a deduplication data store, wherein;
  
  the persist header comprises a set of hashes that includes a hash that is indicative of the data the computing device requested to read; and
  
  wherein each hash in the set of hashes represents data stored in the deduplication data store after the persist header that is co-located with other data represented by the remaining hashes in the set of hashes; and
  
  caching, by the computing device, the set of hashes in a second hash structure stored in the memory of the computing device, whereby if the computing device requests to read additional data, the computing device can identify the additional data using the second hash structure if the additional data is represented by the persist header.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1, wherein caching the set of hashes in the second hash structure stored in the memory of the computing device comprises:
    - storing a hash table structure in a hash table array based on the persist header, the hash table structure comprising;
      
      a hash fragment comprising a portion of the hash of the data;
      
      an index into a persist header reference array; and
      
      a hash index into the set of hashes for the persist header that identifies the hash for the data.
  - 3. The method of claim 2, further comprising:
    - identifying the hash table structure in the hash table array based on the hash fragment.
  - 4. The method of claim 2, further comprising:
    - storing a persist header reference structure in a persist header reference array based on the persist header, the persist header reference structure comprising;
      
      a cache page index into a cache page array that identifies the persist header in memory; and
      
      a hash code to verify an identity of a cache page array entry identified by the cache page index.
  - 5. The method of claim 4, further comprising reading data associated with the persist header, comprising:
    - identifying the hash table structure in the hash table array based on the hash fragment; and
      
      identifying the persist header reference structure in the persist header reference array based on the index.
  - 6. The method of claim 5, further comprising:
    - identifying the cache page array entry in the cache page index based on the cache page index;
      
      verifying an identity of the cache page array entry based on the hash code.
  - 7. The method of claim 6, further comprising:
    - identifying the persist header in memory based on the cache page array; and
      
      identifying the hash in the set of hashes based on the index.
  - 8. The method of claim 1, further comprising:
    - receiving a second request to read second data from the deduplication data store; and
      
      identifying the second data using the second hash structure and not the first hash structure, wherein the second data comprises a second hash in the set of hashes.
  - 9. The method of claim 1, comprising:
    - receiving a second request to read second data from the deduplication data store;
      
      determining a second hash for the second data is not in the second hash structure;
      
      identifying a second persist header in the first hash structure, wherein;
      
      the second persist header comprises a second hash in a second set of hashes stored in the second persist header; and
      
      the second hash is indicative of the second data the computing device requested to read; and
      
      caching the second set of hashes in the second hash structure stored in the memory of the computing device.

10. A computing device for caching hashes for deduplicated data in a deduplication data store, in which data is stored using a persist header comprising a set of hashes, wherein each hash in the set of hashes represents data stored in the deduplication data store after the persist header that is co-located with other data represented by the remaining hashes in the set of hashes, the computing device comprising:
- a deduplication data store; and
  
  a processor in communication with the deduplication data store, and configured to run a module stored in memory that is configured to cause the processor to;
  
  receive a request to read data from the deduplication data store;
  
  identify in a first hash structure that is not stored in memory of the computing device, a persist header stored in a deduplication data store, wherein;
  
  the persist header comprises a set of hashes that includes a hash that is indicative of the data the computing device requested to read; and
  
  wherein each hash in the set of hashes represents data stored in the deduplication data store after the persist header that is co-located with other data represented by the remaining hashes in the set of hashes; and
  
  cache the set of hashes in a second hash structure stored in the memory of the computing device, whereby if the computing device requests to read additional data, the computing device can identify the additional data using the second hash structure if the additional data is represented by the persist header.

11. A non-transitory computer readable medium having executable instructions operable to cause an apparatus to:
- receive a request to read data from a deduplication data store;
  
  identify in a first hash structure that is not stored in memory of the computing device, a persist header stored in a deduplication data store, wherein;
  
  the persist header comprises a set of hashes that includes a hash that is indicative of the data the computing device requested to read; and
  
  wherein each hash in the set of hashes represents data stored in the deduplication data store after the persist header that is co-located with other data represented by the remaining hashes in the set of hashes; and
  
  cache the set of hashes in a second hash structure stored in the memory of the computing device, whereby if the computing device requests to read additional data, the computing device can identify the additional data using the second hash structure if the additional data is represented by the persist header.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Actifio, Inc. (Alphabet Inc.)
Inventors
PROVENZANO, Christopher A., WOODWARD, Mark L.

Granted Patent

US 9,501,545 B2
Time in Patent Office

Days
Field of Search
US Class Current

707/692
CPC Class Codes

G06F 11/1451   by selection of backup cont...

G06F 11/1453   using de-duplication of the...

G06F 11/1456   Hardware arrangements for b...

G06F 11/1461   Backup scheduling policy

G06F 16/128   Details of file system snap...

G06F 16/215   Improving data quality; Dat...

G06F 16/273   Asynchronous replication or...

G06F 16/275   Synchronous replication

G06F 2201/80   Database-specific techniques

G06F 2201/84   Using snapshots, i.e. a log...

G06F 3/0617   in relation to availability

G06F 3/065   Replication mechanisms

G06F 3/0683   Plurality of storage devices

H04L 67/10   in which an application is ...

H04L 67/1095   Replication or mirroring of...

SYSTEM AND METHOD FOR CACHING HASHES FOR CO-LOCATED DATA IN A DEDUPLICATION DATA STORE

First Claim

8 Assignments

0 Petitions

Accused Products

Abstract

Citations

11 Claims

Specification

Solutions

Use Cases

Quick Links

SYSTEM AND METHOD FOR CACHING HASHES FOR CO-LOCATED DATA IN A DEDUPLICATION DATA STORE

First Claim

8 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

11 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links