METHOD, DEVICE, NODE AND SYSTEM FOR MANAGING FILE IN DISTRIBUTED DATA WAREHOUSE
First Claim
1. A method for managing file in distributed data warehouse, comprising:
- acquiring, by a data node, a deleting instruction carrying a data block identifier, wherein the deleting instruction is sent by a management node;
suspending, by the data node, the deleting instruction; and
deleting, by the data node, a data block corresponding to the data block identifier after a condition is met;
wherein the process of suspending, by the data node, the deleting instruction comprises storing the data block identifier into a delay queue.
1 Assignment
0 Petitions
Accused Products
Abstract
A method, a device, a node and a system for managing file in distributed data warehouse are provided. The method includes: acquiring, by a data node, a deleting instruction carrying a data block identifier, wherein the deleting instruction is sent by a management node; suspending, by the data node, the deleting instruction; and deleting, by the data node, a data block corresponding to the data block identifier after a condition is met, thereby resolving the technical issue that an accidentally deleted file can not be recovered by setting a trash in the management node in some cases and ensuring the data security of the Hadoop system.
3 Citations
38 Claims
-
1. A method for managing file in distributed data warehouse, comprising:
-
acquiring, by a data node, a deleting instruction carrying a data block identifier, wherein the deleting instruction is sent by a management node; suspending, by the data node, the deleting instruction; and deleting, by the data node, a data block corresponding to the data block identifier after a condition is met; wherein the process of suspending, by the data node, the deleting instruction comprises storing the data block identifier into a delay queue.
-
-
2. (canceled)
-
3. The method according to claim 1, wherein the process of deleting, by the data node, a data block corresponding to the data block identifier after a condition is met comprises:
-
deleting, by the data node, the data block corresponding to the data block identifier in a case that a period since the data block identifier is stored into the delay queue reaches a predetermined time threshold;
ordeleting, by the data node, the data blocks corresponding to all the data block identifiers in the delay queue in response to an emptying instruction sent by a client for emptying the data blocks corresponding to all the data block identifiers in the delay queue.
-
-
4. The method according to claim 1, wherein after storing the data block identifier into the delay queue, the method further comprises:
-
receiving a recovering instruction sent by the management node for recovering the data block corresponding to the data block identifier stored in the delay queue; and sending to the management node a report carrying data block identifiers of all the data blocks stored in the data node, so that the management node creates a mapping from the data block identifier to the data node based on the data block identifiers in the received report.
-
-
5. The method according to claim 3, wherein before deleting the data blocks corresponding to all the data block identifiers in the delay queue in response to an emptying instruction sent by a client for emptying the data blocks corresponding to all the data block identifiers in the delay queue, the method further comprises:
-
determining data blocks in the data node corresponding to all the data block identifiers in the delay queue; calculating a parameter of occupation of the determined data blocks in the data node; and sending the parameter of occupation to the management node, wherein the client determines after checking the parameter of occupation whether to send to the data node the emptying instruction for emptying the data blocks corresponding to all the data block identifiers in the delay queue.
-
-
6. The method according to claim 5, wherein the parameter of occupation comprises a delay deleting storage space and a delay deleting percentage, the process of calculating a parameter of occupation of the determined data blocks in the data node comprises:
-
calculating a storage space occupied by the determined data blocks in the data node as the delay deleting storage space; and calculating a percentage of an entire storage space of the data node occupied by the delay deleting storage space as the delay deleting percentage.
-
-
7. The method according to claim 3, wherein the method further comprises:
-
receiving a time configuration instruction carrying a specified time length sent by the client, wherein the time configuration instruction is utilized in dynamic configuration of the predetermined time threshold; and updating the predetermined time threshold to the specified time length based on the time configuration instruction.
-
-
8. A method for managing file in distributed data warehouse, comprising:
-
receiving from a client an instruction for deleting a specified file; determining, by a management node, a data block which belongs to the specified file and is stored in a data node; sending to the data node, by the management node, a deleting instruction carrying a data block identifier of the data block, wherein the deleting instruction is suspended by the data node, until the data node deletes the data block corresponding to the data block identifier after a condition is met; receiving a file recovering instruction sent by the client for recovering the specified file; recovering an eligible first correspondence relation, wherein the eligible first correspondence relation is a first correspondence relation which is backed-up before the deleting instruction is sent and a time point for backup is closest to a time point of sending the deleting instruction, and the first correspondence relation comprises a relation between the specified file and a data block identifier of a data block in the specified file; and recovering a second correspondence relation, wherein the second correspondence relation is a mapping from the data block identifier of the data block to the data node storing the data block.
-
-
9. (canceled)
-
10. The method according to claim 8, wherein the data block identifier is stored into a delay queue by the data node, wherein the process of recovering the second correspondence relation comprises:
-
sending to the data node a recovering instruction for recovering the data block corresponding to the data block identifier stored in the delay queue, wherein the data node sends to the management node a report carrying data block identifiers of all the data blocks stored in the data node after receiving the recovering instruction; receiving the report sent by the data node; and creating a mapping from the data block identifier to the data node based on the data block identifiers in the received report.
-
-
11. The method according to claim 8, wherein the method further comprises:
receiving a parameter of occupation sent by the data node, wherein the parameter of occupation comprises a delay deleting storage space and a delay deleting percentage, wherein the delay deleting storage space is a storage space occupied by the data blocks corresponding to all the data block identifiers in the delay queue of the data node, and the delay deleting percentage is a percentage of the entire storage space of the data node occupied by the delay deleting storage space, such that the client determines after checking the parameter of occupation whether to send to the data node an emptying instruction for emptying the data blocks corresponding to all the data block identifiers in the delay queue.
-
12. A method for managing file in distributed data warehouse, wherein the method comprises:
-
sending to a management node an instruction for deleting a specified file, wherein the instruction for deleting the specified file is utilized by the management node to determine a data block which belongs to the specified file and is stored in a data node, and the management node sends to the data node a deleting instruction carrying a data block identifier of the data block, wherein the deleting instruction is suspended by the data node, until the data node deletes the data block corresponding to the data block identifier after a condition is met; wherein the data block identifier is stored into a delay queue by the data node, the method further comprises; checking a parameter of occupation sent to the management node by each data node, wherein the parameter of occupation comprises a delay deleting storage space and a delay deleting percentage, wherein the delay deleting storage space is a storage space occupied by the data blocks corresponding to all the data block identifiers in the delay queue of the data node, and the delay deleting percentage is a percentage of the entire storage space of the data node occupied by the delay deleting storage space; determining whether to send to the data node an emptying instruction for emptying the data blocks corresponding to all the data block identifiers in the delay queue of the data node; and sending the emptying instruction to the data node in a case that it is determined to send the emptying instruction to the data node.
-
-
13. (canceled)
-
14. The method according to claim 12, wherein the method further comprises:
sending to the data node a time configuration instruction carrying a specified time length, wherein the time configuration instruction is utilized in dynamic configuration of the predetermined time threshold, such that the data node updates the predetermined time threshold to the specified time length based on the time configuration instruction and deletes the data block corresponding to the data block identifier after the predetermined time threshold is reached.
-
15. The method according to claim 12, wherein the method further comprises:
sending to the management node a file recovering instruction for recovering the specified file, such that the management node recovers an eligible first correspondence relation and recovers a second correspondence relation after receives the file recovering instruction, wherein the eligible first correspondence relation is a first correspondence relation which is backed-up before the deleting instruction is sent and a time point for backup is closest to a time point of sending the deleting instruction, the first correspondence relation comprises a relation between the specified file and a data block identifier of a data block in the specified file, the second correspondence relation is a mapping from the data block identifier of the data block in the specified file to the data node.
-
16-34. -34. (canceled)
-
35. The method according to claim 4, wherein before deleting the data blocks corresponding to all the data block identifiers in the delay queue in response to an emptying instruction sent by a client for emptying the data blocks corresponding to all the data block identifiers in the delay queue, the method further comprises:
-
determining data blocks in the data node corresponding to all the data block identifiers in the delay queue; calculating a parameter of occupation of the determined data blocks in the data node; and sending the parameter of occupation to the management node, wherein the client determines after checking the parameter of occupation whether to send to the data node the emptying instruction for emptying the data blocks corresponding to all the data block identifiers in the delay queue.
-
-
36. The method according to claim 4, wherein the method further comprises:
-
receiving a time configuration instruction carrying a specified time length sent by the client, wherein the time configuration instruction is utilized in dynamic configuration of the predetermined time threshold; and updating the predetermined time threshold to the specified time length based on the time configuration instruction.
-
-
37. The method according to claim 8, wherein the method further comprises:
receiving a parameter of occupation sent by the data node, wherein the parameter of occupation comprises a delay deleting storage space and a delay deleting percentage, wherein the delay deleting storage space is a storage space occupied by the data blocks corresponding to all the data block identifiers in the delay queue of the data node, and the delay deleting percentage is a percentage of the entire storage space of the data node occupied by the delay deleting storage space, such that the client determines after checking the parameter of occupation whether to send to the data node an emptying instruction for emptying the data blocks corresponding to all the data block identifiers in the delay queue.
-
38. The method according to claim 10, wherein the method further comprises:
receiving a parameter of occupation sent by the data node, wherein the parameter of occupation comprises a delay deleting storage space and a delay deleting percentage, wherein the delay deleting storage space is a storage space occupied by the data blocks corresponding to all the data block identifiers in the delay queue of the data node, and the delay deleting percentage is a percentage of the entire storage space of the data node occupied by the delay deleting storage space, such that the client determines after checking the parameter of occupation whether to send to the data node an emptying instruction for emptying the data blocks corresponding to all the data block identifiers in the delay queue.
Specification