Instance-based distributed data recovery method and apparatus

US 10,783,163 B2
Filed: 11/27/2015
Issued: 09/22/2020
Est. Priority Date: 08/20/2015
Status: Active Grant

First Claim

Patent Images

1. An instance-based distributed data recovery method for a distributed data system, the distributed data system comprising a database cluster that includes at least one master node and a plurality of non-master nodes, and a distributed file system that includes a database including multiple primary storage units and multiple secondary storage units, wherein, during normal operation when the master node and the non-master nodes are all running online, the primary storage units are managed by respective ones of the plurality of non-master nodes and the multiple secondary storage units are managed by the master node, and wherein each of the multiple secondary storage units stores indexes of multiple primary storage units, and each of the multiple primary storage units stores one instance, the method comprising:

detecting, by the master node, one of the non-master nodes going down;

allocating, by the master node, multiple secondary storage units that index multiple primary storage units managed by the non-master node that is going down to at least one online node, an online node being a non-master node that remains online;

performing, by the online node to which the multiple secondary storage units have been allocated by the master node, hash grouping on instances stored on logs of the non-master node that is going down and allocating the instances to multiple threads inside the online node; and

recovering, by the online node, data of the multiple primary storage units managed by the non-master node that is going down in parallel in the multiple threads.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present application discloses an instance-based distributed data recovery method. A specific implementation of the method includes: detecting a non-master down node; allocating multiple secondary storage units corresponding to the down node to at least one online node; performing hash grouping on instances stored on logs and allocating the instances to multiple threads; and recovering data of multiple primary storage units in parallel inside the online node. Embodiments of the present invention recover data of a down node in a distributed database in parallel in nodes.

1 Citation

14 Claims

1. An instance-based distributed data recovery method for a distributed data system, the distributed data system comprising a database cluster that includes at least one master node and a plurality of non-master nodes, and a distributed file system that includes a database including multiple primary storage units and multiple secondary storage units, wherein, during normal operation when the master node and the non-master nodes are all running online, the primary storage units are managed by respective ones of the plurality of non-master nodes and the multiple secondary storage units are managed by the master node, and wherein each of the multiple secondary storage units stores indexes of multiple primary storage units, and each of the multiple primary storage units stores one instance, the method comprising:
- detecting, by the master node, one of the non-master nodes going down;
  
  allocating, by the master node, multiple secondary storage units that index multiple primary storage units managed by the non-master node that is going down to at least one online node, an online node being a non-master node that remains online;
  
  performing, by the online node to which the multiple secondary storage units have been allocated by the master node, hash grouping on instances stored on logs of the non-master node that is going down and allocating the instances to multiple threads inside the online node; and
  
  recovering, by the online node, data of the multiple primary storage units managed by the non-master node that is going down in parallel in the multiple threads.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method according to claim 1, wherein indexes of the secondary storage unit are stored in a tertiary storage unit, and the data stored in the multiple primary storage units is ordered according to the instances.
  - 3. The method according to claim 2, wherein the master node manages the tertiary storage unit and the secondary storage units.
  - 4. The method according to claim 1, wherein the multiple secondary storage units corresponding to the non-master node that is going down are evenly allocated to the at least one online node.
  - 5. The method according to claim 1, wherein the performing, by the online node, hash grouping on instances stored on logs of the non-master node that is going down and allocating the instances to multiple threads inside the online node comprises:
    - performing hash grouping on the instances stored on the logs to map logs of same instances to a same thread of the multiple threads, so as to allocate the logs to the multiple threads according to the different instances.
  - 6. The method according to claim 5, wherein the hash grouping step is:
    - converting an instance name recorded in a log record to convert each character of the character string into an ASCII code, and summing the ASCII codes to obtain a sum which is a 32-bit integer value; and
      
      performing a modulo operation on the number of recovery threads by using the value, to obtain a thread ID of a thread for recovering the instance.
  - 7. The method according to claim 1, wherein the at least one online node performs logical recursion in an own process according to content of the log to recover data.

8. A device, comprising:
- a processor; and
  
  a memory, storing computer readable instructions thereon, the computer readable instructions when executed by the processor, causing the processor to;
  
  detect, by a master node, a non-master node that is going down;
  
  allocate, by the master node, multiple secondary storage units that index multiple primary storage units managed by the non-master node that is going down to at least one online node, an online node being a non-master node that remains online, wherein a distributed data system comprises a database cluster that includes at least one master node and a plurality of non-master nodes, and a distributed file system that includes a database including multiple primary storage units and multiple secondary storage units, wherein, during normal operation when the master node and the non-master nodes are all running online, the primary storage units are managed by respective ones of the plurality of non-master nodes and the multiple secondary storage units are managed by the master node, and wherein each of the multiple secondary storage units stores indexes of multiple primary storage units, and each of the multiple primary storage units stores one instance;
  
  perform, by the online node to which the multiple secondary storage units have been allocated by the master node, hash grouping on instances stored on logs of the non-master node that is going down and allocate the instances to multiple threads inside the online node; and
  
  recover, by the online node, data of multiple primary storage units managed by the non-master node that is going down in parallel in the multiple threads.
- View Dependent Claims (9, 10, 11, 12, 13)
- - 9. The device according to claim 8, wherein indexes of the secondary storage unit are stored in a tertiary storage unit and the data stored in the multiple primary storage units is ordered according to the instances.
  - 10. The device according to claim 9, wherein the master node manages the tertiary storage unit and the secondary storage units.
  - 11. The device according to claim 8, wherein the multiple secondary storage units corresponding to the non-master node that is going down are evenly allocated to the at least one online node.
  - 12. The device according to claim 8, wherein the perform, by the online node, hash grouping on instances stored on logs of the non-master node that is going down and allocating the instances to multiple threads inside the online node comprises:
    - performing hash grouping on the instances stored on the logs to map logs of same instances to a same thread of the multiple threads, so as to allocate the logs to the multiple threads according to the different instances.
  - 13. The device according to claim 12, wherein the hash grouping step is:
    - converting an instance name recorded in a log record to convert each character of the character string into an ASCII code, and summing the ASCII codes to obtain a sum which is a 32-bit integer value; and
      
      performing a modulo operation on the number of recovery threads by using the value, to obtain a thread ID of a thread for recovering the instance.

14. A non-transitory computer storage medium storing computer readable instructions, the computer readable instructions when executed by a processor, causing the processor to:
- detect, by a master node, a non-master node that is going down;
  
  allocate, by the master node, multiple secondary storage units that index multiple primary storage units managed by the non-master node that is going down to at least one online node, an online node being a non-master node that remains online, wherein a distributed data system comprises a database cluster that includes at least one master node and a plurality of non-master nodes, and a distributed file system that includes a database including multiple primary storage units and multiple secondary storage units, wherein, during normal operation when the master node and the non-master nodes are all running online, the primary storage units are managed by respective ones of the plurality of non-master nodes and the multiple secondary storage units are managed by the master node, and wherein each of the multiple secondary storage units stores indexes of multiple primary storage units, and each of the multiple primary storage units stores one instance;
  
  perform, by the online node to which the multiple secondary storage units have been allocated by the master node, hash grouping on instances stored on logs of the non-master node that is going down and allocate the instances to multiple threads inside the online node; and
  
  recover, by the online node, data of multiple primary storage units managed by the non-master node that is going down in parallel in the multiple threads.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Beijing Baidu Netcom Science and Technology Company Limited (Baidu Incorporated)
Original Assignee
Beijing Baidu Netcom Science and Technology Company Limited (Baidu Incorporated)
Inventors
Lai, Chunbo, Xue, Yingfei, Wang, Pu, Zhao, Bo
Primary Examiner(s)
Vu, Bai D

Application Number

US15/533,955
Publication Number

US 20180150536A1
Time in Patent Office

1,761 Days
Field of Search

None
US Class Current
CPC Class Codes

G06F 16/2246 Trees, e.g. B+trees

G06F 16/27 Replication, distribution o...

Instance-based distributed data recovery method and apparatus

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

1 Citation

14 Claims

Specification

Use Cases

Quick Links

Others

Instance-based distributed data recovery method and apparatus

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

1 Citation

14 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others