COMPUTER PROGRAM, APPARATUS, AND METHOD FOR MANAGING DATA
First Claim
1. A computer-readable, non-transitory medium storing a data management program for use in a multi-node storage system formed from a plurality of disk nodes each managing a storage device to store data in a distributed manner, the data management program, when executed by a computer serving as one of the disk nodes, causing the computer to perform a procedure comprising:
- allocating one of constituent storage spaces in a storage device coupled to the computer, to one of data units constituting a logical volume that provides a virtual storage space, in response to a write request specifying the one of data units as a destination of write data, and writing the write data to the allocated constituent storage space;
recording, upon the writing of the write data, a current time in a data unit record memory as a record of last write time of the data unit to which the write data has been written;
detecting, by consulting the data unit information memory, a data unit whose deduplication grace period after the last write time has expired;
obtaining, from an index server, one of deduplication addresses that is associated with a first unique value obtained by applying a predetermined computation to data stored in the constituent storage space allocated to the detected data unit, wherein the index server manages the deduplication addresses each including an identifier of a disk node managing a deduplicate unit and a second unique value obtained by applying the predetermined computation to deduplication target data stored in the deduplicate unit, and wherein the deduplicate unit is provided in a plurality to constitute a deduplicate volume that provides another virtual storage space; and
storing the obtained deduplication address in the data unit record memory, together with the detected data unit, while canceling the allocation of the constituent storage spaces to the detected data unit.
1 Assignment
0 Petitions
Accused Products
Abstract
A computer in a disk node executes a data management program. A deduplication-eligible data unit detection module detects a data unit whose deduplication grace period after last write time has expired. A deduplication address fetch module interacts with an index server to obtain a deduplication address associated with a unique value of data stored in a constituent storage space allocated to the data unit that is found to be deduplication-eligible. A constituent storage space deallocation module stores the obtained deduplication address in a data unit record memory, together with information indicating the detected data unit. Simultaneously a constituent storage space deallocation module releases the allocated constituent storage space from the detected data unit.
66 Citations
16 Claims
-
1. A computer-readable, non-transitory medium storing a data management program for use in a multi-node storage system formed from a plurality of disk nodes each managing a storage device to store data in a distributed manner, the data management program, when executed by a computer serving as one of the disk nodes, causing the computer to perform a procedure comprising:
-
allocating one of constituent storage spaces in a storage device coupled to the computer, to one of data units constituting a logical volume that provides a virtual storage space, in response to a write request specifying the one of data units as a destination of write data, and writing the write data to the allocated constituent storage space; recording, upon the writing of the write data, a current time in a data unit record memory as a record of last write time of the data unit to which the write data has been written; detecting, by consulting the data unit information memory, a data unit whose deduplication grace period after the last write time has expired; obtaining, from an index server, one of deduplication addresses that is associated with a first unique value obtained by applying a predetermined computation to data stored in the constituent storage space allocated to the detected data unit, wherein the index server manages the deduplication addresses each including an identifier of a disk node managing a deduplicate unit and a second unique value obtained by applying the predetermined computation to deduplication target data stored in the deduplicate unit, and wherein the deduplicate unit is provided in a plurality to constitute a deduplicate volume that provides another virtual storage space; and storing the obtained deduplication address in the data unit record memory, together with the detected data unit, while canceling the allocation of the constituent storage spaces to the detected data unit. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A computer-readable, non-transitory medium storing a data management program for managing storage spaces in a multi-node storage system formed from a plurality of disk nodes each managing a storage device to store data in a distributed manner, the data management program, when executed by a computer, causing the computer to perform a procedure comprising:
-
storing, in a deduplicate unit record memory, deduplicate unit records each including information indicating use of a deduplicate unit in a disk node, a first unique value obtained by applying a predetermined computation to deduplication target data stored in a constituent storage space allocated to the deduplicate unit being used, and an identifier of the deduplicate unit, wherein the deduplicate unit records are stored together with an identifier of a disk node that manages the deduplicate units, and wherein the deduplicate units constitute a deduplicate volume that provides another virtual storage space; receiving from one of the disk nodes a deduplication address request specifying a second unique value obtained by applying the predetermined computation to data in a constituent storage space allocated to a deduplication-eligible data unit, and searching the deduplicate unit record memory to find a deduplicate unit record that contains the second unique value specified in the deduplication address request; returning a first deduplication address to the disk node that has issued the deduplication address request when a relevant deduplicate unit record is found as a result of the searching, wherein the first deduplication address includes an identifier of a deduplicate unit which is contained in the found deduplicate unit record, and an identifier of a disk node that manages the deduplicate unit corresponding to the found deduplicate unit record; and consulting the deduplicate unit record memory to select one of the deduplicate units that is not used when no relevant record is found as a result of the searching, sending an allocation request to a disk node that manages the selected deduplicate unit for allocation of a constituent storage space to the selected deduplicate unit, storing an updated deduplicate unit record of the selected deduplicate unit in the deduplicate unit record memory to record the constituent storage space allocated to the selected deduplicate unit, and returning a second deduplication address to the disk node that has issued the deduplication address request, wherein the second deduplication address includes an identifier of the selected deduplicate unit and an identifier of the disk node managing the selected deduplicate unit. - View Dependent Claims (12, 13)
-
-
14. An apparatus for managing data in a multi-node storage system formed from a plurality of disk nodes each managing a storage device to store data in a distributed manner, the apparatus comprising:
-
write access means that allocates one of constituent storage spaces in a storage device coupled to the computer, to one of data units constituting a logical volume that provides a virtual storage space, in response to a write request specifying the one of data units as a destination of write data, and writes the write data to the allocated constituent storage space; last write time update means that records a current time in the memory as a record of last write time of the data unit to which the write data has been written; deduplication-pending data unit detection means that detects, by consulting the memory, a data unit whose deduplication grace period after the last write time has expired; deduplication address fetch means that obtains, from an index server, one of deduplication addresses that is associated with a first unique value obtained by applying a predetermined computation to data stored in the constituent storage space allocated to the detected data unit, wherein the index server manages the deduplication addresses each including an identifier of a disk node managing a deduplicate unit and a second unique value obtained by applying the predetermined computation to deduplication target data stored in the deduplicate unit, and wherein the deduplicate unit is provided in a plurality to constitute a deduplicate volume that provides another virtual storage space; constituent storage space deallocation means that stores the obtained deduplication address in the data unit record memory, together with the detected data unit, while canceling the allocation of the constituent storage spaces to the detected data unit.
-
-
15. An apparatus for managing data in a multi-node storage system formed from a plurality of disk nodes each managing a storage device to store data in a distributed manner, the apparatus comprising:
a processor configured to execute a procedure, the procedure comprising; allocating one of constituent storage spaces in a storage device coupled to the computer, to one of data units constituting a logical volume that provides a virtual storage space, in response to a write request specifying the one of data units as a destination of write data, and writing the write data to the allocated constituent storage space; recording, upon the writing of the write data, a current time in a data unit record memory as a record of last write time of the data unit to which the write data has been written; detecting, by consulting the data unit information memory, a data unit whose deduplication grace period after the last write time has expired; obtaining, from an index server, one of deduplication addresses that is associated with a first unique value obtained by applying a predetermined computation to data stored in the constituent storage space allocated to the detected data unit, wherein the index server manages the deduplication addresses each including an identifier of a disk node managing a deduplicate unit and a second unique value obtained by applying the predetermined computation to deduplication target data stored in the deduplicate unit, and wherein the deduplicate unit is provided in a plurality to constitute a deduplicate volume that provides another virtual storage space.
-
16. A method executed by a computer in one of a plurality of disk nodes constituting a multi-node storage system, each disk node managing a storage device to store data in a distributed manner, the method comprising:
-
allocating one of constituent storage spaces in a storage device coupled to the computer, to one of data units constituting a logical volume that provides a virtual storage space, in response to a write request specifying the one of data units as a destination of write data, and writing the write data to the allocated constituent storage space; recording, upon the writing of the write data, a current time in a data unit record memory as a record of last write time of the data unit to which the write data has been written; detecting, by consulting the data unit information memory, a data unit whose deduplication grace period after the last write time has expired; obtaining, from an index server, one of deduplication addresses that is associated with a first unique value obtained by applying a predetermined computation to data stored in the constituent storage space allocated to the detected data unit, wherein the index server manages the deduplication addresses each including an identifier of a disk node managing a deduplicate unit and a second unique value obtained by applying the predetermined computation to deduplication target data stored in the deduplicate unit, and wherein the deduplicate unit is provided in a plurality to constitute a deduplicate volume that provides another virtual storage space; storing the obtained deduplication address in the data unit record memory, together with the detected data unit, while canceling the allocation of the constituent storage spaces to the detected data unit.
-
Specification