File deduplication in a file system
First Claim
1. An apparatus for file deduplication in a file system, comprising:
- an acquisition unit configured to acquire identification information which is newly assigned to a file upon creation or update of the file and is inherited by the file from a different file if the file is a copy of the different file, to thereby make a content of the file identifiable;
a determination unit configured to determine whether or not first identification information and second identification information match each other, the first identification information being the identification information acquired by the acquisition unit and assigned to a first file, the second identification information being the identification information acquired by the acquisition unit and assigned to a second file;
a control unit configured to perform such control that the first file and the second file are prevented from being stored as duplicate files in the file system, if the determination unit determines that the first identification information and the second identification information match each other;
a first registration unit configured to register, in count information, an increase in the number of pieces of identification information associated with the first file, when the second identification information becomes associated with the first file, the count information indicating the number of pieces of identification information associated with the first file;
a second registration unit configured to register, in the count information, a decrease in the number of pieces of identification information associated with the first file, in response to an instruction to delete the first management information for managing each file backed up to the file system at the first time point; and
a deletion unit configured to delete the first management information in response to the instruction to delete the first management information, and to also delete the first file if the count information after the registration by the second registration unit indicates that no identification information is associated with the first file.
1 Assignment
0 Petitions
Accused Products
Abstract
Each file is assigned in advance with a WWUID which is newly assigned to a file upon the creation or update of the file and is inherited from a file to a copied file when the file is copied. In a backup apparatus, a file name reception unit receives the file name of a backup target file. A WWUID reception unit receives a WWUID corresponding to the file name. A WWUID search unit searches for the same WWUID in backup management information of a previous day stored in a backup destination. Only if the search is failed, a file operation instruction unit instructs the storing of the backup target file into the backup destination. Then, an Rcnt update instruction unit instructs the updating of the number of references made to the WWUID within the backup destination. A second management information update instruction unit then instructs the updating of backup management information of the current day.
16 Citations
11 Claims
-
1. An apparatus for file deduplication in a file system, comprising:
-
an acquisition unit configured to acquire identification information which is newly assigned to a file upon creation or update of the file and is inherited by the file from a different file if the file is a copy of the different file, to thereby make a content of the file identifiable; a determination unit configured to determine whether or not first identification information and second identification information match each other, the first identification information being the identification information acquired by the acquisition unit and assigned to a first file, the second identification information being the identification information acquired by the acquisition unit and assigned to a second file; a control unit configured to perform such control that the first file and the second file are prevented from being stored as duplicate files in the file system, if the determination unit determines that the first identification information and the second identification information match each other; a first registration unit configured to register, in count information, an increase in the number of pieces of identification information associated with the first file, when the second identification information becomes associated with the first file, the count information indicating the number of pieces of identification information associated with the first file; a second registration unit configured to register, in the count information, a decrease in the number of pieces of identification information associated with the first file, in response to an instruction to delete the first management information for managing each file backed up to the file system at the first time point; and a deletion unit configured to delete the first management information in response to the instruction to delete the first management information, and to also delete the first file if the count information after the registration by the second registration unit indicates that no identification information is associated with the first file. - View Dependent Claims (2, 3, 4)
-
-
5. An apparatus for backing up a file to a file system, comprising:
-
a first acquisition unit configured to acquire first management information for managing each file backed up to the file system at a first time point; a second acquisition unit configured to acquire backup target file identification information as identification information which is newly assigned to a file upon creation or update of the file and is inherited by the file from a different file if the file is a copy of the different file, to thereby make a content of the file identifiable, and which is assigned to a backup target file to be backed up to the file system at a second time point subsequent to the first time point; a determination unit configured to determine whether or not the first management information acquired by the first acquisition unit includes the backup target file identification information acquired by the second acquisition unit as already-backed-up file identification information that is the identification information assigned to an already-backed-up file backed up to the file system at the first time point; a copy unit configured to prevent the backup target file from being copied to the file system at the second time point if the determination unit determines that the first management information includes the backup target file identification information, and configured to copy the backup target file to the file system at the second time point if the determination unit determines that the first management information does not include the backup target file identification information; a storage unit configured to store the backup target file identification information in association with the already-backed-up file into second management information for managing each file backed up at the second time point; a first registration unit configured to register, in count information, an increase in the number of pieces of identification information associated with the already-backed-up file, when the backup target file identification information becomes associated with the already-backed-up file, the count information indicating the number of pieces of identification information associated with the already-backed-up file; a second registration unit configured to register, in the count information, a decrease in the number of pieces of identification information associated with the already-backed-up file, in response to an instruction to delete the first management information; and a deletion unit configured to delete the backup target file identification information in response to the instruction to delete the first management information, and to also delete the already-backed-up file if the count information after the registration by the second registration unit indicates that no identification information is associated with the already-backed-up file.
-
-
6. An apparatus for managing a file in a file system, comprising:
-
a first assignment unit configured to assign identification information to a new file when the new file is crated in the file system; a second assignment unit configured to assign identification information same as the identification information to a copied file when the new file is copied to generate the copied file in the file system; a third assignment unit configured to assign identification information different from the identification information to an updated file when any one of the new file and the copied file is updated to generate the updated file; a first registration unit configured to register, in count information, an increase in the number of pieces of identification information associated with the first file, when the second identification information becomes associated with the first file, the count information indicating the number of pieces of identification information associated with the first file; a second registration unit configured to register, in the count information, a decrease in the number of pieces of identification information associated with the first file, in response to an instruction to delete the first management information; and a deletion unit configured to delete the first management information in response to the instruction to delete the first management information, and to also delete the first file if the count information after the registration by the second registration unit indicates that no identification information is associated with the first file. - View Dependent Claims (7)
-
-
8. A computer program product for file deduplication in a file system by a processor, the computer program product comprising a non-transitory computer-readable storage medium having computer-readable program code portions stored therein, the computer-readable program code portions comprising:
-
a first executable portion for acquiring identification information which is newly assigned to a file upon creation or update of the file and is inherited by the file from a different file if the file is a copy of the different file, to thereby make a content of the file identifiable; a second executable portion for determining whether or not first identification information and second identification information match each other, the first identification information being the identification information acquired by the acquisition unit and assigned to a first file, the second identification information being the identification information acquired by the acquisition unit and assigned to a second file; a third executable portion for, if the first identification information is determined to match the second identification information, preventing the first file and the second file from being stored as duplicate files in the file system; a third executable portion for registering, in count information, an increase in the number of pieces of identification information associated with the first file, when the second identification information becomes associated with the first file, the count information indicating the number of pieces of identification information associated with the first file; a fourth executable portion for registering, in count information, an decrease in the number of pieces of identification information associated with the first file, in response to an instruction to delete first management information for managing each file backed up to the file system at the first time point; and a fifth executable portion for deleting the first management information in response to the instruction to delete the first management information, and to also delete the first file if the count information after the registration by the second registration unit indicates that no identification information is associated with the first file. - View Dependent Claims (9, 10, 11)
-
Specification