Automatic repair of corrupt HBases
First Claim
1. A method for maintaining table integrity of a datastore table in a distributed data cluster that relies on the datastore table for locating data, the datastore table having rows and being partitioned into regions, each region having a start key and a stop key for identifying which rows map to which region, the distributed data cluster including (1) a number of region servers, each region server maintaining one or more of the regions and (2) a distributed file system (DFS) that stores the data, the method comprising:
- identifying, by scanning all rows in the datastore table, whether each possible row in the datastore table maps to one and only one region;
upon identifying that a particular row does not map to one and only one region, determining that a table integrity problem exists;
determining a type of the table integrity problem;
deciding a repair option based on the type of the table integrity problem, wherein the repair option is to cause the particular row to become mapped to one and only one correct region, in consistency with the data stored in the DFS; and
resolving the table integrity problem by executing the repair option.
5 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods for checking for region consistency and table integrity problems and automatically repairing a corrupted HBase cluster. The methods and systems operate in a diagnostic mode and a diagnostic and repair mode. The methods include fixing table integrity problems, such as backwards table regions, table region holes, table region overlap, and the like to restore table integrity invariant. Once the table integrity has been restored, each row key resolves to exactly one region. The methods further include fixing region inconsistencies, such as bad region assignment, no region present in the meta table, region information not in the Hadoop Distributed File System (HDFS), and the like to restore region consistency invariant. The information in the HDFS is taken as ground truth and any meta table or assignment problems that are inconsistent with the HDFS is deemed wrong and removed.
-
Citations
21 Claims
-
1. A method for maintaining table integrity of a datastore table in a distributed data cluster that relies on the datastore table for locating data, the datastore table having rows and being partitioned into regions, each region having a start key and a stop key for identifying which rows map to which region, the distributed data cluster including (1) a number of region servers, each region server maintaining one or more of the regions and (2) a distributed file system (DFS) that stores the data, the method comprising:
-
identifying, by scanning all rows in the datastore table, whether each possible row in the datastore table maps to one and only one region; upon identifying that a particular row does not map to one and only one region, determining that a table integrity problem exists; determining a type of the table integrity problem; deciding a repair option based on the type of the table integrity problem, wherein the repair option is to cause the particular row to become mapped to one and only one correct region, in consistency with the data stored in the DFS; and resolving the table integrity problem by executing the repair option. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A method for maintaining region consistency in a distributed data cluster that relies on a datastore table for locating data, the datastore table having rows and being partitioned into regions, each region having a start key and a stop key for identifying which rows map to which region, the distributed data cluster including (1) a number of region servers, each region server maintaining one or more of the regions and (2) a distributed file system (DFS) that stores the data, the method comprising:
-
gathering information associated with the regions from locations including the region servers, the DFS, and a meta table that records which region server has a particular row for a particular access request, identifying, based on the information associated with the regions, whether each available region is assigned to one and only one region server; upon identifying that a particular region is not assigned to one and only one region server, or that any information associated with the regions from one location is inconsistent with such information from another location, determining that a region consistency problem exists; determining a type of the region consistency problem; deciding a repair option based on the type of the region consistency problem, wherein the repair option is to cause region information from the region servers and region information in the meta table to be consistent with the data stored in the DFS; and resolving the region consistency problem by executing the repair option. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20)
-
-
21. A distributed database system having a datastore table for locating data, the datastore table having rows and being partitioned into regions, each region having a start key and a stop key for identifying which rows map to which region, the system comprising:
-
a number of region servers, each region server maintaining one or more of the regions; a distributed file system (DFS) that stores the data; and a master node configured to, (A) in a first mode, perform; identifying, by scanning all rows in the datastore table, whether each possible row in the datastore table maps to one and only one region; upon identifying that a particular row does not map to one and only one region, determining that a table integrity problem exists; determining a type of the table integrity problem; deciding a repair option based on the type of the table integrity problem, wherein the repair option is to cause the particular row to become mapped to one and only one correct region, in consistency with the data stored in the DFS; and resolving the table integrity problem by executing the repair option; (B) in a second mode, perform; gathering information associated with the regions from locations including the region servers, the DFS, and a meta table that records which region server has a particular row for a particular access request, identifying, based on the information associated with the regions, whether each available region is assigned to one and only one region server; upon identifying that a particular region is not assigned to one and only one region server, or that any information associated with the regions from one location is inconsistent with such information from another location, determining that a region consistency problem exists; determining a type of the region consistency problem; deciding a repair option based on the type of the region consistency problem, wherein the repair option is to cause region information from the region servers and region information in the meta table to be consistent with the data stored in the DFS; and resolving the region consistency problem by executing the repair option.
-
Specification