File aware block level deduplication
First Claim
1. A method comprising:
- selecting candidate files for optimization by traversing a file system;
identifying a plurality of blocks in the plurality of candidate files;
sending a block list and a plurality of tokens to a storage sub-system for optimization, the blocks list corresponding to the plurality of blocks in the plurality of candidate files and the plurality of tokens corresponding to the plurality of candidate files, wherein optimization includes deduplication and compression;
receiving a notification that a file in the plurality of candidate files has been optimized, wherein the notification includes information on optimized data location, wherein upon receipt of the notification, a token in the plurality of tokens is used to determine if the file in tile plurality of candidate files has changed during optimization;
wherein if the file has not changed, a stub file including a filemap needed to find the optimized data is generated.
17 Assignments
0 Petitions
Accused Products
Abstract
A system provides file aware block level deduplication in a system having multiple clients connected to a storage subsystem over a network such as an Internet Protocol (IP) network. The system includes client components and storage subsystem components. Client components include a walker that traverses the namespace looking for files that meet the criteria for optimization, a file system daemon that rehydrates the files, and a filter driver that watches all operations going to the file system. Storage subsystem components include an optimizer resident on the nodes of the storage subsystem. The optimizer can use idle processor cycles to perform optimization. Sub-file compression can be performed at the storage subsystem.
-
Citations
17 Claims
-
1. A method comprising:
-
selecting candidate files for optimization by traversing a file system; identifying a plurality of blocks in the plurality of candidate files; sending a block list and a plurality of tokens to a storage sub-system for optimization, the blocks list corresponding to the plurality of blocks in the plurality of candidate files and the plurality of tokens corresponding to the plurality of candidate files, wherein optimization includes deduplication and compression; receiving a notification that a file in the plurality of candidate files has been optimized, wherein the notification includes information on optimized data location, wherein upon receipt of the notification, a token in the plurality of tokens is used to determine if the file in tile plurality of candidate files has changed during optimization; wherein if the file has not changed, a stub file including a filemap needed to find the optimized data is generated. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system comprising:
-
a processor operable to select candidate files for optimization by traversing a file system, wherein a plurality of blocks in the plurality of candidate files are identified; an interface operable to send a block list and a plurality'"'"' of tokens to a storage sub-system for optimization, the blocks list corresponding to the plurality of blocks in the plurality of candidate files and the plurality of tokens corresponding to the plurality of candidate files, wherein optimization includes deduplication and compression, the interface further operable to receive a notification that a file in the plurality of candidate files has been optimized; wherein the notification includes information on optimized data location, wherein upon receipt of the notification, a token in the plurality of tokens is used to determine if the file in the plurality of candidate files has changed during optimization; wherein if the file has changed, a stub file is noted generated. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A non-transitory computer readable medium comprising:
-
computer code for selecting candidate files for optimization by traversing a file system; computer code for identifying a plurality of blocks in the plurality of candidate files; computer code for sending a block list and a plurality of tokens to a storage sub-system for optimization, the blocks list corresponding to the plurality of blocks in the plurality of candidate files and the plurality of tokens corresponding to the plurality of candidate files, wherein optimization includes deduplication and compression; computer code for receiving a notification that a file in the plurality of candidate files has been optimized, wherein the notification includes information on optimized data location, wherein upon receipt of the notification, a token in the plurality of tokens is used to determine if the file in the plurality of candidate files has changed during optimization; wherein if the file has changed, a stub file is noted generated. - View Dependent Claims (16, 17)
-
Specification