Method, device, node and system for managing file in distributed data warehouse
First Claim
Patent Images
1. A method for managing file in distributed data warehouse, comprising:
- acquiring, by a data node, a deleting instruction carrying a data block identifier, wherein the deleting instruction is sent by a management node;
suspending, by the data node, the deleting instruction; and
deleting, by the data node, a data block corresponding to the data block identifier after a condition is met;
wherein the process of suspending, by the data node, the deleting instruction comprises storing the data block identifier into a delay queue;
wherein the process of deleting, by the data node, the data block corresponding to the data block identifier after the condition is met comprises;
deleting, by the data node, data blocks corresponding to all the data block identifiers in the delay queue in response to an emptying instruction sent by a client for emptying the data blocks corresponding to all the data block identifiers in the delay queue wherein before deleting the data blocks corresponding to all the data block identifiers in the delay queue in response to an emptying instruction sent by a client for emptying the data blocks corresponding to all the data block identifiers in the delay queue, the method further comprises;
determining data blocks in the data node corresponding to all the data block identifiers in the delay queue;
calculating a parameter of occupation of the determined data blocks in the data node; and
sending the parameter of occupation to the management node, wherein the client determines after checking the parameter of occupation whether to send to the data node the emptying instruction for emptying the data blocks corresponding to all the data block identifiers in the delay queue;
wherein the parameter of occupation comprises a delay deleting storage space and a delay deleting percentage, the process of calculating a parameter of occupation of the determined data blocks in the data node comprises;
calculating a storage space occupied by the determined data blocks in the data node as the delay deleting storage space; and
calculating a percentage of an entire storage space of the data node occupied by the delay deleting storage space as the delay deleting percentage.
1 Assignment
0 Petitions
Accused Products
Abstract
A method, a device, a node and a system for managing file in distributed data warehouse are provided. The method includes: acquiring, by a data node, a deleting instruction carrying a data block identifier, wherein the deleting instruction is sent by a management node; suspending, by the data node, the deleting instruction; and deleting, by the data node, a data block corresponding to the data block identifier after a condition is met, thereby resolving the technical issue that an accidentally deleted file can not be recovered by setting a trash in the management node in some cases and ensuring the data security of the Hadoop system.
4 Citations
10 Claims
-
1. A method for managing file in distributed data warehouse, comprising:
-
acquiring, by a data node, a deleting instruction carrying a data block identifier, wherein the deleting instruction is sent by a management node; suspending, by the data node, the deleting instruction; and deleting, by the data node, a data block corresponding to the data block identifier after a condition is met; wherein the process of suspending, by the data node, the deleting instruction comprises storing the data block identifier into a delay queue; wherein the process of deleting, by the data node, the data block corresponding to the data block identifier after the condition is met comprises; deleting, by the data node, data blocks corresponding to all the data block identifiers in the delay queue in response to an emptying instruction sent by a client for emptying the data blocks corresponding to all the data block identifiers in the delay queue wherein before deleting the data blocks corresponding to all the data block identifiers in the delay queue in response to an emptying instruction sent by a client for emptying the data blocks corresponding to all the data block identifiers in the delay queue, the method further comprises; determining data blocks in the data node corresponding to all the data block identifiers in the delay queue; calculating a parameter of occupation of the determined data blocks in the data node; and sending the parameter of occupation to the management node, wherein the client determines after checking the parameter of occupation whether to send to the data node the emptying instruction for emptying the data blocks corresponding to all the data block identifiers in the delay queue; wherein the parameter of occupation comprises a delay deleting storage space and a delay deleting percentage, the process of calculating a parameter of occupation of the determined data blocks in the data node comprises; calculating a storage space occupied by the determined data blocks in the data node as the delay deleting storage space; and calculating a percentage of an entire storage space of the data node occupied by the delay deleting storage space as the delay deleting percentage. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A method for managing file in distributed data warehouse, comprising:
-
receiving from a client an instruction for deleting a specified file; determining, by a management node, a data block which belongs to the specified file and is stored in a data node; sending to the data node, by the management node, a deleting instruction carrying a data block identifier of the data block, wherein the deleting instruction is suspended by the data node, until the data node deletes the data block corresponding to the data block identifier after a condition is met; receiving a file recovering instruction sent by the client for recovering the specified file; recovering an eligible first correspondence relation, wherein the eligible first correspondence relation is a first correspondence relation which is backed-up before the deleting instruction is sent and a time point for backup is closest to a time point of sending the deleting instruction, and the first correspondence relation comprises a relation between the specified file and a data block identifier of a data block in the specified file; and recovering a second correspondence relation, wherein the second correspondence relation is a mapping from the data block identifier of the data block to the data node storing the data block; wherein the method further comprises; sending to the data node an emptying instruction for emptying the data blocks corresponding to all the data block identifiers in the delay queue; and receiving a parameter of occupation sent by the data node, wherein the parameter of occupation comprises a delay deleting storage space and a delay deleting percentage, wherein the delay deleting storage space is a storage space occupied by the data blocks corresponding to all the data block identifiers in the delay queue of the data node, and the delay deleting percentage is a percentage of the entire storage space of the data node occupied by the delay deleting storage space, such that the client determines after checking the parameter of occupation whether to send to the data node an emptying instruction for emptying the data blocks corresponding to all the data block identifiers in the delay queue. - View Dependent Claims (7)
-
-
8. A method for managing file in distributed data warehouse, wherein the method comprises:
-
sending to a management node an instruction for deleting a specified file, wherein the instruction for deleting the specified file is utilized by the management node to determine a data block which belongs to the specified file and is stored in a data node, and the management node sends to the data node a deleting instruction carrying a data block identifier of the data block, wherein the deleting instruction is suspended by the data node, until the data node deletes the data block corresponding to the data block identifier after a condition is met; wherein the data block identifier is stored into a delay queue by the data node, the method further comprises; checking a parameter of occupation sent to the management node by each data node, wherein the parameter of occupation comprises a delay deleting storage space and a delay deleting percentage, wherein the delay deleting storage space is a storage space occupied by the data blocks corresponding to all the data block identifiers in the delay queue of the data node, and the delay deleting percentage is a percentage of the entire storage space of the data node occupied by the delay deleting storage space; determining whether to send to the data node an emptying instruction for emptying the data blocks corresponding to all the data block identifiers in the delay queue of the data node; and sending the emptying instruction to the data node in a case that it is determined to send the emptying instruction to the data node. - View Dependent Claims (9, 10)
-
Specification