System and method for managing software upgrades in a distributed computing system
First Claim
1. A system for managing a software upgrade in a distributed computing system having a plurality of nodes that provides a plurality of fault-tolerant services, wherein a first set of nodes providing a first fault-tolerant service can differ from, yet can also overlap with, a second set of nodes providing a second service, and wherein making an upgrade current on a given node can take an amount of time sufficient to be considered a fault if not otherwise masked by fault-tolerance, the system comprising:
- at least one node which is communicatively connected to the plurality of nodes and which is configured to receive a software release, and to upgrade each of the plurality of nodes with the software release in a sequential manner, accounting for the possibility of different versions of software running on different nodes of the system, whereby the plurality of fault-tolerant services remain available while the software upgrade is in progress a service which is configured to notify the at least one node when a first copy of fault-tolerant service becomes unavailable; and
wherein the at least one node is configured to assign a protected status to a node including any portion of a surviving copy of the fault-tolerant service, the protected status being effective to prevent the node from being upgraded.
7 Assignments
0 Petitions
Accused Products
Abstract
A system and method for managing software upgrades in a distributed computing system. The distributed computing system may include a plurality of nodes which provide one or more fault-tolerant services. The system and method perform software upgrades in a sequential or “rolling” manner (e.g., node by node). The rolling upgrade process allows all services and data of the distributed computing system to remain operable and available throughout the upgrade process.
223 Citations
19 Claims
-
1. A system for managing a software upgrade in a distributed computing system having a plurality of nodes that provides a plurality of fault-tolerant services, wherein a first set of nodes providing a first fault-tolerant service can differ from, yet can also overlap with, a second set of nodes providing a second service, and wherein making an upgrade current on a given node can take an amount of time sufficient to be considered a fault if not otherwise masked by fault-tolerance, the system comprising:
-
at least one node which is communicatively connected to the plurality of nodes and which is configured to receive a software release, and to upgrade each of the plurality of nodes with the software release in a sequential manner, accounting for the possibility of different versions of software running on different nodes of the system, whereby the plurality of fault-tolerant services remain available while the software upgrade is in progress a service which is configured to notify the at least one node when a first copy of fault-tolerant service becomes unavailable; and
wherein the at least one node is configured to assign a protected status to a node including any portion of a surviving copy of the fault-tolerant service, the protected status being effective to prevent the node from being upgraded. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A method for managing a software upgrade in a distributed computing system having a plurality of nodes that provide a plurality of fault-tolerant services, wherein a first set of nodes providing a first fault-tolerant service can differ from, yet can also overlap with, a second set of nodes providing a second service, and wherein making an upgrade current on a given node can take an amount of time sufficient to be considered a fault if not otherwise masked by fault-tolerance, the method comprising the steps of:
-
receiving a new software release;
upgrading each of the plurality of nodes with the new software release in a sequential manner, accounting for the possibility of different versions of software running on different nodes of the system, whereby the plurality of fault-tolerant services remains available while the software upgrade is in progress determining whether a first copy of a fault-tolerant service has become unavailable; and
preventing any node having a surviving copy of the fault-tolerant service from being upgraded while the first copy of the fault-tolerant service is unavailable. - View Dependent Claims (13, 14, 15, 16, 17)
-
-
18. A method for managing a software upgrade in a distributed file system having a plurality of nodes, which provide a plurality of fault-tolerant services, wherein a first set of nodes providing a first fault-tolerant service can differ from, yet can also overlap with, a second set of nodes providing a second service, and wherein making an upgrade current on a given node can take an amount of time sufficient to be considered a fault if not otherwise masked by fault-tolerance, comprising:
-
receiving a new software release;
determining whether the new software release is compatible with a current release running on the distributed file system;
initiating a rolling upgrade process if the new software release is compatible;
performing the rolling upgrade process by sequentially loading and rebooting each of the plurality of nodes with the new software release;
accounting for the possibility of different versions of software running on different nodes of the system; and
ensuring that the plurality of fault-tolerant services remains available throughout the rolling upgeade process services remains available throughout the rolling upgrade process includes the steps of;
determining whether a first copy of a fault-tolerant service has become unavailable; and
preventing any node having a surviving copy of the fault-tolerant service from being upgraded while the first copy of the fault-tolerant service is unavailable. - View Dependent Claims (19)
-
Specification