Maintaining deduplication data in native file formats

US 9,442,951 B2
Filed: 09/23/2011
Issued: 09/13/2016
Est. Priority Date: 09/23/2011
Status: Active Grant

First Claim

Patent Images

1. A method, comprising:

parsing a file to identify a first plurality of components including a first component and a second component of the file, wherein the first and second components are data segments of the file;

replacing the first component in the file with a first stub and the second component in the file with a second stub such that an application can still both access the file and perform an operation on the file as if the file;

delineating the first component into a first plurality of chunks;

generating a first chunk identifier corresponding to a first chunk, the first chunk identifier used to access a deduplication dictionary;

determining whether the first chunk is already stored in datastore suitcases in a deduplication system using the first chunk identifier and the deduplication dictionary.

View all claims

23 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Mechanisms are provided to maintain deduplication data in native file formats. Files, including entities such as volumes and databases, are analyzed to identify components suitable for deduplication. These components suitable for deduplication are delineated into chunks and identifiers are generated for each of the chunks. The identifiers are used to reference the chunks in deduplication dictionaries that provide locations indicating where deduplicated chunks are stored. The components in the files are replaced with file handles or stubs that applications can use to access deduplicated data. Applications can continue to perform operations on the files as though no deduplication has occurred.

Citations

20 Claims

1. A method, comprising:
- parsing a file to identify a first plurality of components including a first component and a second component of the file, wherein the first and second components are data segments of the file;
  
  replacing the first component in the file with a first stub and the second component in the file with a second stub such that an application can still both access the file and perform an operation on the file as if the file;
  
  delineating the first component into a first plurality of chunks;
  
  generating a first chunk identifier corresponding to a first chunk, the first chunk identifier used to access a deduplication dictionary;
  
  determining whether the first chunk is already stored in datastore suitcases in a deduplication system using the first chunk identifier and the deduplication dictionary.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method of claim 1, wherein the first component is further separated into a first plurality of subcomponents.
  - 3. The method of claim 1, wherein the file is maintained by an application in an application native file format.
  - 4. The method of claim 3, wherein the first stub is accessed by the application without adversely affecting application operation.
  - 5. The method of claim 4, wherein the first chunk identifier is a hash of the first chunk.
  - 6. The method of claim 3, wherein the deduplication dictionary maintains a plurality of chunk identifiers and corresponding chunk locations.
  - 7. The method of claim 6, wherein the first chunk is maintained in a datastore suitcase.
  - 8. The method of claim 1, wherein if the first chunk is already stored in the deduplication system, reference counts associated with the datastore suitcase are updated.
  - 9. The method of claim 1, wherein if the first chunk is not already stored in the deduplication system, a new entry is created in the deduplication dictionary and the first chunk is stored in a datastore suitcase.
  - 10. The method of claim 1, wherein the second component is further separated into a second plurality of subcomponents.

11. A non-transitory computer readable medium, comprising:
- computer code for parsing a file to identify a first plurality of components including a first component and a second component of the file, wherein the first and second components are data segments of the file;
  
  computer code for replacing the first component in the file with a first stub and the second component in the file with a second stub such that an application can still both access the file and perform an operation on the file as if the file;
  
  computer code for delineating the first component into a first plurality of chunks;
  
  computer code for generating a first chunk identifier corresponding to a first chunk, the first chunk identifier used to access a deduplication dictionary;
  
  computer code for determining whether the first chunk is already stored in datastore suitcases in a deduplication system using the first chunk identifier and the deduplication dictionary.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19)
- - 12. The computer readable medium of claim 11, wherein the first component is further separated into a first plurality of subcomponents.
  - 13. The computer readable medium of claim 11, wherein the file is maintained by an application in an application native file format.
  - 14. The computer readable medium of claim 13, wherein the first stub is accessed by the application without adversely affecting application operation.
  - 15. The computer readable medium of claim 14, wherein the first chunk identifier is a hash of the first chunk.
  - 16. The computer readable medium of claim 13, wherein the deduplication dictionary maintains a plurality of chunk identifiers and corresponding chunk locations.
  - 17. The computer readable medium of claim 16, wherein the first chunk is maintained in a datastore suitcase.
  - 18. The computer readable medium of claim 11, wherein if the first chunk is already stored in the deduplication system, reference counts associated with the datastore suitcase are updated.
  - 19. The computer readable medium of claim 11, wherein if the first chunk is not already stored in the deduplication system, a new entry is created in the deduplication dictionary and the first chunk is stored in a datastore suitcase.

20. A system, comprising:
- a processor; and
  
  a memory, the memory containing instructions for execution a method by the processor, the method comprising;
  
  parsing a file to identify a first plurality of components including a first component and a second component of the file, wherein the first and second components are data segments of the file;
  
  replacing the first component in the file with a first stub and the second component in the file with a second stub such that an application can still both access the file and perform an operation on the file as if the file;
  
  delineating the first component into a first plurality of chunks;
  
  generating a first chunk identifier corresponding to a first chunk, the first chunk identifier used to access a deduplication dictionary;
  
  determining whether the first chunk is already stored in datastore suitcases in a deduplication system using the first chunk identifier and the deduplication dictionary.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Quest Software, Inc.
Original Assignee
Dell Products LP (Dell Technologies Inc.)
Inventors
Smith, Brian, Ramsdell, Tom, Rodriquez, Rob, Shintre, Shweta
Primary Examiner(s)
Jacob, Ajith

Application Number

US13/243,965
Publication Number

US 20130080404A1
Time in Patent Office

1,817 Days
Field of Search

707/626
US Class Current

1/1
CPC Class Codes

G06F 16/1748 De-duplication implemented ...

G06F 16/1752 based on file chunks

Maintaining deduplication data in native file formats

First Claim

23 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Maintaining deduplication data in native file formats

First Claim

23 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links