High availability failover manager

US 9,785,525 B2
Filed: 09/24/2015
Issued: 10/10/2017
Est. Priority Date: 09/24/2015
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

receiving a write request directed towards a logical unit (LUN), the write request having a data, a logical block address (LBA) and a length representing an address range of the LUN, the LBA and the length mapped to a volume associated with the LUN, the write request received at a first node of a plurality of nodes of a cluster, each node of the cluster having a memory and attached to a storage array storing the volume;

recording the write request in a first non-volatile log of the first node, the first non-volatile log stored on a storage device different from the storage array storing the volume;

monitoring a state of availability of the first node to service the volume;

in response to a lack of availability of the first node to service the volume, determining whether a second node is able to takeover service of the volume; and

in response to determining that the second node is able to takeover service of the volume, triggering a failover of the volume to the second node of the cluster, wherein the first non-volatile log is mirrored to a second non-volatile log accessible by the second node, and wherein the second non-volatile log is up to date with the first non-volatile log.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A high availability (HA) failover manager maintains data availability of one or more input/output (I/O) resources in a cluster by ensuring that each I/O resource is available (e.g., mounted) on a hosting node of the cluster and that each I/O resource may be available on one or more partner nodes of the cluster if a node (i.e., a local node) were to fail. The HA failover manager (HA manager) processes inputs from various sources of the cluster to determine whether failover is enabled for a local node and each partner node in an HA group, and for triggering failover of the I/O resources to the partner node as necessary. For each I/O resource, the HA manager may track state information including (i) a state of the I/O resource (e.g., mounted or un-mounted); (ii) the partner node(s) ability to service the I/O resource; and (iii) whether a non-volatile log recording I/O requests is synchronized to the partner node(s). The HA manager interacts with various layers of a storage I/O stack to mount and un-mount the I/O resources on one or more nodes of the cluster through the use of well-defined interfaces, e.g., application programming interfaces.

Citations

20 Claims

1. A method comprising:
- receiving a write request directed towards a logical unit (LUN), the write request having a data, a logical block address (LBA) and a length representing an address range of the LUN, the LBA and the length mapped to a volume associated with the LUN, the write request received at a first node of a plurality of nodes of a cluster, each node of the cluster having a memory and attached to a storage array storing the volume;
  
  recording the write request in a first non-volatile log of the first node, the first non-volatile log stored on a storage device different from the storage array storing the volume;
  
  monitoring a state of availability of the first node to service the volume;
  
  in response to a lack of availability of the first node to service the volume, determining whether a second node is able to takeover service of the volume; and
  
  in response to determining that the second node is able to takeover service of the volume, triggering a failover of the volume to the second node of the cluster, wherein the first non-volatile log is mirrored to a second non-volatile log accessible by the second node, and wherein the second non-volatile log is up to date with the first non-volatile log.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1 further comprising:
    - recording the state of availability of the first node in a cluster database; and
      
      replicating the cluster database from the first node to the second node of the cluster.
  - 3. The method of claim 2 further comprising:
    - registering a callback on the second node to monitor a change to the state of availability of the first node recorded in the cluster database; and
      
      receiving a notification at the second node in response to the change of the state of the availability of the first node.
  - 4. The method of claim 2 wherein transactions to the cluster database are ordered and consistent.
  - 5. The method of claim 2 further comprising:
    - scanning a table of the cluster database at the second node, the table including the state of availability of the first node.
  - 6. The method of claim 1 further comprising:
    - analyzing heuristics for a plurality of network links used to mirror the first non-volatile log to the second non-volatile log, the plurality of network links connecting the first and second nodes of the cluster; and
      
      configuring the mirror to efficiently use the plurality of network links.
  - 7. The method of claim 1 wherein triggering the failover of the volume further comprises:
    - mounting the volume on the second node using an operation having no knowledge of the state of availability of the first node to service the volume.
  - 8. The method of claim 1 wherein recording of the state of availability of the first node further comprises using a consensus protocol involving three or more nodes of the cluster.
  - 9. The method of claim 3 further comprising:
    - determining the state of availability of the first node to service the volume by receiving a notification from the cluster database.

10. A method comprising:
- receiving a write request directed towards a logical unit (LUN), the write request having a data, a logical block address (LBA) and a length representing an address range of the LUN, the LBA and the length mapped to a volume associated with the LUN, the write request received at a first node of a plurality of nodes of the cluster, each node of the cluster having a memory and attached to a storage array storing the volume;
  
  recording the write request in a first portion of a non-volatile random access memory (NVRAM) of the first node;
  
  recording a state of availability of the first node to service the volume in a cluster database;
  
  in response to a lack of availability of the first node to service the volume, winning a race at the second node against the first node to update the cluster database to mark the first node as being unavailable to service the volume; and
  
  triggering a failover of the volume to the second node of the cluster, wherein the write request is mirrored to a second portion of the NVRAM accessible by the second node, and wherein the second portion of the NVRAM is up to date with the first portion of the NVRAM.

11. A system comprising:
- a cluster having first and second nodes each having a memory connected to a processor via a bus;
  
  a storage array coupled to each node of the cluster;
  
  a storage I/O stack executing on the processor of each node of the cluster, the storage I/O stack configured to;
  
  receive a write request directed towards a logical unit (LUN), the write request having a data, a logical block address (LBA) and a length representing an address range of the LUN, the LBA and the length mapped to a volume associated with the LUN, the write request received at the first node of the cluster, the volume stored on the storage array;
  
  record the write request in a first non-volatile log of the first node, the first non-volatile log stored on a storage device different from the storage array;
  
  monitor a state of availability of the first node to service the volume;
  
  in response to a lack of availability of the first node to service the volume, determine whether the second node is able to takeover service of the volume; and
  
  in response to determining that the second node is able to takeover service of the volume, trigger a failover of the volume to the second node of the cluster, wherein the first non-volatile log is mirrored to a second non-volatile log accessible by the second node, and wherein the second non-volatile log is up to date with the first non-volatile log.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
- - 12. The system of claim 11 wherein the storage I/O stack is further configured to:
    - record the state of availability of the first node in a cluster database; and
      
      replicate the cluster database from the first node to the second node of the cluster.
  - 13. The system of claim 12 wherein the storage I/O stack is further configured to:
    - register a callback on the second node to monitor a change to the state of availability of the first node recorded in the cluster database; and
      
      receive a notification at the second node in response to the change of the state of the availability of the first node.
  - 14. The system of claim 12 wherein transactions to the cluster database are ordered and consistent.
  - 15. The system of claim 12 wherein the storage I/O stack is further configured to:
    - scan a table of the cluster database at the second node, the table including the state of availability of the first node.
  - 16. The system of claim 11 wherein the storage I/O stack is further configured to:
    - analyze heuristics for a plurality of network links used to mirror the first non-volatile log to the second non-volatile log, the plurality of network links connecting the first and second nodes of the cluster; and
      
      configure the mirror to efficiently use the plurality of network links.
  - 17. The system of claim 11 wherein the storage I/O stack when configured to trigger a failover over of the volume is further configured to mount the volume on the second node using an operation having no knowledge of the state of availability of the first node to service the volume.
  - 18. The system of claim 11 wherein the storage I/O stack when configured to record the state of availability of the first node is further configured to use a consensus algorithm involving three or more nodes of the cluster.
  - 19. The system of claim 13 wherein the storage I/O stack is further configured to determine the state of availability of first node by receiving a notification from the cluster database.
  - 20. The system of claim 11 further comprising:
    - a non-volatile random access memory (NVRAM) on the first node apportioned into a first portion and a second portion, wherein the first portion includes the first non-volatile log, and wherein the second portion includes the second non-volatile log.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
NetApp, Inc.
Original Assignee
NetApp, Inc.
Inventors
Watanabe, Steven S., Strange, Stephen H., Muth, John, Malone, Kimberly A., Patel, Kayuri H.
Primary Examiner(s)
Iqbal, Nadeem

Application Number

US14/864,026
Publication Number

US 20170091056A1
Time in Patent Office

747 Days
Field of Search
US Class Current
CPC Class Codes

G06F 11/1441   Resetting or repowering

G06F 11/2033   switching over of hardware ...

G06F 11/2035   without idle spare hardware

G06F 11/2046   where the redundant compone...

G06F 11/2069   Management of state, config...

G06F 11/2094   Redundant storage or storag...

G06F 11/2097   maintaining the standby con...

G06F 2201/805   Real-time

High availability failover manager

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

High availability failover manager

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links