Granular partial recall of deduplicated files
First Claim
Patent Images
1. A computing device comprising:
- one or more processing units; and
one or more computer-readable media comprising computer-executable instructions, which, when executed by the one or more processing units, cause the computing device to;
detect a writing of data into a deduplicated file that comprises references to chunks of data in a chunk store; and
separately modify, in response to the detecting, each of at least two different data structures that are hierarchically arranged, wherein the computer-executable instructions that cause the computing device to perform the separate modifications comprise computer-executable instructions that cause the computing device to;
modify one or more entries in a main recall table to identify as dirty one or more ranges of data of the deduplicated file that comprise the written data, wherein the main recall table is a hierarchically lower one of the at least two different data structures such that each of the one or more entries in the main recall table identifies whether a corresponding single one of the one or more ranges of data of the deduplicated file is either clean or dirty; and
modify one or more entries in a recall index table to identify one or more blocks of multiple entries in the main recall table as comprising at least one entry identifying that a corresponding range of data of the deduplicated file is dirty, wherein the recall index table is a hierarchically higher one of the at least two different data structures such that a single entry of the recall index table identifies whether a corresponding block of multiple entries in the main recall table either comprises only entries that identify corresponding ranges of data of the deduplicated file as clean, or includes at least one entry that identifies a corresponding range of data of the deduplicated file as dirty;
wherein a deduplicated file metadata that is stored as part of a file structure of the deduplicated file comprises a root recall index table; and
wherein further the main recall table is stored externally to the deduplicated file metadata that is stored as part of the file structure of the deduplicated file.
2 Assignments
0 Petitions
Accused Products
Abstract
The subject disclosure is directed towards partially recalling file ranges of deduplicated files based on tracking dirty (write modified) ranges (user writes) in a way that eliminates or minimizes reading and writing already-optimized adjacent data. The granularity of the ranges does not depend on any file-system granularity for tracking ranges. In one aspect, lazy flushing of tracking data that preserves data-integrity and crash-consistency is provided. In one aspect, also described is supporting granular partial recall on an open file while a data deduplication system is optimizing that file.
-
Citations
20 Claims
-
1. A computing device comprising:
-
one or more processing units; and one or more computer-readable media comprising computer-executable instructions, which, when executed by the one or more processing units, cause the computing device to; detect a writing of data into a deduplicated file that comprises references to chunks of data in a chunk store; and separately modify, in response to the detecting, each of at least two different data structures that are hierarchically arranged, wherein the computer-executable instructions that cause the computing device to perform the separate modifications comprise computer-executable instructions that cause the computing device to; modify one or more entries in a main recall table to identify as dirty one or more ranges of data of the deduplicated file that comprise the written data, wherein the main recall table is a hierarchically lower one of the at least two different data structures such that each of the one or more entries in the main recall table identifies whether a corresponding single one of the one or more ranges of data of the deduplicated file is either clean or dirty; and modify one or more entries in a recall index table to identify one or more blocks of multiple entries in the main recall table as comprising at least one entry identifying that a corresponding range of data of the deduplicated file is dirty, wherein the recall index table is a hierarchically higher one of the at least two different data structures such that a single entry of the recall index table identifies whether a corresponding block of multiple entries in the main recall table either comprises only entries that identify corresponding ranges of data of the deduplicated file as clean, or includes at least one entry that identifies a corresponding range of data of the deduplicated file as dirty; wherein a deduplicated file metadata that is stored as part of a file structure of the deduplicated file comprises a root recall index table; and wherein further the main recall table is stored externally to the deduplicated file metadata that is stored as part of the file structure of the deduplicated file. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 17)
-
-
11. A method of partially deduplicating data files at a finer granularity to increase data access performance, the method comprising:
-
detecting a writing of data into a deduplicated file that comprises references to chunks of data in a chunk store; separately modifying, in response to the detecting, each of at least two different data structures that are hierarchically arranged, wherein the separately modifying comprises; modifying one or more entries in a main recall table to identify as dirty one or more ranges of data of the deduplicated file that comprise the written data, wherein the main recall table is a hierarchically lower one of the at least two different data structures such that each of the one or more entries in the main recall table identifies whether a corresponding single one of the one or more ranges of data of the deduplicated file is either clean or dirty; and modifying one or more entries in a recall index table to identify one or more blocks of multiple entries in the main recall table as comprising at least one entry identifying that a corresponding range of data of the deduplicated file is dirty, wherein the recall index table is a hierarchically higher one of the at least two different data structures such that a single entry of the recall index table identifies whether a corresponding block of multiple entries in the main recall table either comprises only entries that identify corresponding ranges of data of the deduplicated file as clean, or includes at least one entry that identifies a corresponding range of data of the deduplicated file as dirty; wherein a deduplicated file metadata that is stored as part of a file structure of the deduplicated file comprises a root recall index table; and wherein further the main recall table is stored externally to the deduplicated file metadata that is stored as part of the file structure of the deduplicated file. - View Dependent Claims (12, 13, 14, 15, 16, 18)
-
-
19. A computing device comprising:
-
one or more processing units; and one or more computer-readable media comprising computer-executable instructions, which, when executed by the one or more processing units, cause the computing device to; receive a request to read a first set of data from a file that is only partially deduplicated, the file comprising;
(1) pointers to chunks of data stored externally to the file in a chunk store and (2) dirtied file data comprising data that was changed after the file was last deduplicated into the chunks of data;determine from which portion of a file system to obtain the first set of data, in response to the request, by referencing a set of recall tables that are hierarchically arranged, the set of recall tables comprising; a main recall table that is a hierarchically lower table of the set of recall tables, wherein each entry of the main recall table identifies whether a corresponding range of data of the file is either clean or dirty; and a recall index table that is a hierarchically higher table of the set of recall tables, wherein each entry of the recall index table identifies whether a corresponding block of multiple entries in the main recall table either comprises only entries that identify corresponding ranges of data of the file as clean, or includes at least one entry that identifies a corresponding range of data of the file as dirty; source, in response to the read request, a first subset of the first set of data from the dirtied file data stored with the file if the set of recall tables indicate that the first subset is dirty; and source, in response to the read request, a second subset of the first set of data from one or more of the chunks of data stored externally to the file if the set of recall tables indicate that the second subset is clean. - View Dependent Claims (20)
-
Specification