Efficient content meta-data collection and trace generation from deduplicated storage
First Claim
1. A computer-implemented method for generating a meta-data trace for a file in a deduplication data storage system, the method comprising:
- selecting a file recipe for the file in the deduplication data storage system, the file recipe selected from a data collection storage unit that includes one or more file recipes;
retrieving data chunk meta-data for research and analysis, the data chunk meta-data corresponding to a unique data chunk identified by a fingerprint in the selected file recipe;
determining a number of file recipe bins corresponding to a memory unit for the selected file recipe based on a number of fingerprints in the selected file recipe;
mapping the selected file recipe to correspond to the determined number of file recipe bins;
reading the selected file recipe into the corresponding bins of the memory unit; and
merging retrieved data chunk meta-data into the meta-data trace corresponding to the file recipe, wherein the meta-data trace is a data structure used for research and analysis of the deduplication data storage system.
9 Assignments
0 Petitions
Accused Products
Abstract
The method and apparatus collect file recipes from deduplicated data storage systems, the file recipes consist of a list of fingerprints of data chunks of a file. Detailed meta-data for each unique data chunk is also collected. In an offline process, research and analysis can be performed on either the meta-data itself or on a reconstruction of a full trace of meta-data constructed by matching recipe fingerprints to the corresponding meta-data. The method and system can generate the full meta-data trace efficiently in an on-line or off-line process. Typical deduplicated storage systems achieve 10× or higher deduplication rates, and the meta-data collection is faster than processing all of the original files and produces compact meta-data that is smaller to store.
49 Citations
18 Claims
-
1. A computer-implemented method for generating a meta-data trace for a file in a deduplication data storage system, the method comprising:
-
selecting a file recipe for the file in the deduplication data storage system, the file recipe selected from a data collection storage unit that includes one or more file recipes; retrieving data chunk meta-data for research and analysis, the data chunk meta-data corresponding to a unique data chunk identified by a fingerprint in the selected file recipe; determining a number of file recipe bins corresponding to a memory unit for the selected file recipe based on a number of fingerprints in the selected file recipe; mapping the selected file recipe to correspond to the determined number of file recipe bins; reading the selected file recipe into the corresponding bins of the memory unit; and merging retrieved data chunk meta-data into the meta-data trace corresponding to the file recipe, wherein the meta-data trace is a data structure used for research and analysis of the deduplication data storage system. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A non-transitory computer-readable storage medium having instructions stored therein, which when executed by a computer, cause the computer to perform a method, the method for generating a meta-data trace for a file in a deduplication data storage system, the method comprising:
-
selecting a file recipe for the file in the deduplication data storage system, the file recipe selected from a data collection storage unit that includes one or more file recipes; retrieving data chunk meta-data for research and analysis, the data chunk meta-data corresponding to a unique data chunk identified by a fingerprint in the selected file recipe; determining a number of file recipe bins corresponding to a memory unit for the selected file recipe based on a number of fingerprints in the selected file recipe; mapping the selected file recipe to correspond to the determined number of file recipe bins; reading the selected file recipe into the corresponding bins of the memory unit; and merging retrieved data chunk meta-data into the meta-data trace corresponding to the file recipe, wherein the meta-data trace is a data structure used for research and analysis of the deduplication data storage system. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A deduplication data storage system, comprising:
-
a meta-data trace generation engine to select a file recipe for a file in the deduplication data storage system, the file recipe selected from a data collection storage unit that includes one or more file recipes, the meta-data trace generation engine to retrieve data chunk meta-data for research and analysis the data chunk meta-data of a unique data chunk identified by a fingerprint in the selected file recipe, determine a number of file recipe bins corresponding to a memory unit for the selected file recipe based on a number of fingerprints in the selected file recipe, map the selected file recipe to correspond to the determined number of file recipe bins, read the selected file recipe into the corresponding bins of the memory unit, and merge retrieved data chunk meta-data into a meta-data trace corresponding to the file recipe, wherein the meta-data trace is a data structure used for research and analysis of the deduplication data storage system; and a data collection storage unit communicatively coupled to the meta-data trace generation engine, the data collection storage unit including the memory unit for storing the file recipe and meta-data for content data set research and analysis. - View Dependent Claims (16, 17, 18)
-
Specification