Rancher calls RKE (Rancher Kubernetes Engine) as a library when provisioning and editing RKE clusters. For more information on configuring the upgrade strategy for RKE clusters, refer to the RKE documentation.

This section covers the following topics:

Tested Kubernetes Versions
Recommended Best Practice for Upgrades
Rolling Back

New Features

As of Rancher v2.3.0, the Kubernetes metadata feature was added, which allows Rancher to ship Kubernetes patch versions without upgrading Rancher. For details, refer to the

As of Rancher v2.4.0,

The ability to import K3s Kubernetes clusters into Rancher was added, along with the ability to upgrade Kubernetes when editing those clusters. For details, refer to the section on imported clusters.
New advanced options are exposed in the Rancher UI for configuring the upgrade strategy of an RKE cluster: Maximum Worker Nodes Unavailable and Drain nodes. These options leverage the new cluster upgrade process of RKE v1.1.0, in which worker nodes are upgraded in batches, so that applications can remain available during cluster upgrades, under

Tested Kubernetes Versions

Before a new version of Rancher is released, it’s tested with the latest minor versions of Kubernetes to ensure compatibility. For example, Rancher v2.3.0 is was tested with Kubernetes v1.15.4, v1.14.7, and v1.13.11. For details on which versions of Kubernetes were tested on each Rancher version, refer to the

How Upgrades Work

RKE v1.1.0 changed the way that clusters are upgraded.

In this section of the you’ll learn what happens when you edit or upgrade your RKE Kubernetes cluster.

Recommended Best Practice for Upgrades

When upgrading the Kubernetes version of a cluster, we recommend that you:

Take a snapshot.
Initiate a Kubernetes upgrade.
If the upgrade fails, revert the cluster to the pre-upgrade Kubernetes version. This is achieved by selecting the Restore etcd and Kubernetes version option. This will return your cluster to the pre-upgrade kubernetes version before restoring the etcd snapshot.

The restore operation will work on a cluster that is not in a healthy or active state.

When upgrading the Kubernetes version of a cluster, we recommend that you:

Take a snapshot.
Initiate a Kubernetes upgrade.
If the upgrade fails, restore the cluster from the etcd snapshot.

The cluster cannot be downgraded to a previous Kubernetes version.

Upgrading the Kubernetes Version

Expand Cluster Options.
Click Save.

Result: Kubernetes begins upgrading for the cluster.

Rolling Back

Available as of v2.4

A cluster can be restored to a backup in which the previous Kubernetes version was used. For more information, refer to the following sections:

Restoring a cluster from backup

Configuring the Upgrade Strategy

As of RKE v1.1.0, additional upgrade options became available to give you more granular control over the upgrade process. These options can be used to maintain availability of your applications during a cluster upgrade if certain conditions and requirements are met.

The upgrade strategy can be configured in the Rancher UI, or by editing the . More advanced options are available by editing the .

From the Rancher UI, the maximum number of unavailable worker nodes can be configured. During a cluster upgrade, worker nodes will be upgraded in batches of this size.

By default, the maximum number of unavailable worker is defined as 10 percent of all worker nodes. This number can be configured as a percentage or as an integer. When defined as a percentage, the batch size is rounded down to the nearest node, with a minimum of one node.

To change the default number or percentage of worker nodes,

Go to the cluster view in the Rancher UI.
Click ⋮ > Edit.
In the Advanced Options section, go to the Maxiumum Worker Nodes Unavailable field. Enter the percentage of worker nodes that can be upgraded in a batch. Optionally, select Count from the drop-down menu and enter the maximum unavailable worker nodes as an integer.
Click Save.

Result: The cluster is updated to use the new upgrade strategy.

To enable draining each node during a cluster upgrade,

Go to the cluster view in the Rancher UI.
Click ⋮ > Edit.
In the Advanced Options section, go to the Drain nodes field and click Yes.
Choose a safe or aggressive drain option. For more information about each option, refer to
Optionally, configure a grace period. The grace period is the timeout given to each pod for cleaning things up, so they will have chance to exit gracefully. Pods might need to finish any outstanding requests, roll back transactions or save state to some external storage. If this value is negative, the default value specified in the pod will be used.
Optionally, configure a timeout, which is the amount of time the drain should continue to wait before giving up.
Click Save.

Result: The cluster is updated to use the new upgrade strategy.

Note: As of Rancher v2.4.0, there is a known issue in which the Rancher UI doesn’t show state of etcd and controlplane as drained, even though they are being drained.

Available as of RKE v1.1.0

In you’ll learn the requirements to prevent downtime for your applications when upgrading the cluster.

More advanced upgrade strategy configuration options are available by editing the .

For details, refer to Configuring the Upgrade Strategy in the RKE documentation. The section also includes an example for configuring the upgrade strategy.

Troubleshooting

If a node doesn’t come up after an upgrade, the command errors out.

No upgrade will proceed if the number of unavailable nodes exceeds the configured maximum.

If an upgrade stops, you may need to fix an unavailable node or remove it from the cluster before the upgrade can continue.

A failed node could be in many different states:

Powered off
Unavailable
User drains a node while upgrade is in process, so there are no kubelets on the node

If the max unavailable number of nodes is reached during an upgrade, Rancher user clusters will be stuck in updating state and not move forward with upgrading any other control plane nodes. It will continue to evaluate the set of unavailable nodes in case one of the nodes becomes available. If the node cannot be fixed, you must remove the node in order to continue the upgrade.

Upgrading and Rolling Back Kubernetes