etcd backup and recovery for Rancher launched Kubernetes clusters can be easily performed. Snapshots of the etcd database are taken and saved either locally onto the etcd nodes or to a S3 compatible target. The advantages of configuring S3 is that if all etcd nodes are lost, your snapshot is saved remotely and can be used to restore the cluster.
Rancher recommends enabling the , but one-time snapshots can easily be taken as well. Rancher allows restore from or if you don’t have any snapshots, you can still restore etcd.
As of Rancher v2.4.0, clusters can also be restored to a prior Kubernetes version and cluster configuration.
This section covers the following topics:
- Restoring a Cluster from a Snapshot
- Enabling snapshot features for clusters created before Rancher v2.2.0
The list of all available snapshots for the cluster is available.
In the Global view, navigate to the cluster that you want to view snapshots.
Click Tools > Snapshots from the navigation bar to view the list of saved snapshots. These snapshots include a timestamp of when they were created.
If your Kubernetes cluster is broken, you can restore the cluster from a snapshot.
Restores changed in Rancher v2.4.0.
Snapshots are composed of the cluster data in etcd, the Kubernetes version, and the cluster configuration in the These components allow you to select from the following options when restoring a cluster from a snapshot:
- Restore just the etcd contents: This restore is similar to restoring to snapshots in Rancher before v2.4.0.
- Restore etcd, Kubernetes versions and cluster configuration: This option should be used if you changed both the Kubernetes version and cluster configuration when upgrading.
In the Global view, navigate to the cluster that you want to restore from a snapshots.
Click the ⋮ > Restore Snapshot.
Select the snapshot that you want to use for restoring your cluster from the dropdown of available snapshots.
In the Restoration Type field, choose one of the restore options described above.
Click Save.
Result: The cluster will go into updating
state and the process of restoring the etcd
nodes from the snapshot will start. The cluster is restored when it returns to an active
state.
Prerequisites:
- Make sure your etcd nodes are healthy. If you are restoring a cluster with unavailable etcd nodes, it’s recommended that all etcd nodes are removed from Rancher before attempting to restore. For clusters in which Rancher used node pools to provision , new etcd nodes will automatically be created. For custom clusters, please ensure that you add new etcd nodes to the cluster.
- To restore snapshots from S3, the cluster needs to be configured to
In the Global view, navigate to the cluster that you want to restore from a snapshot.
Click the ⋮ > Restore Snapshot.
Click Save.
Result: The cluster will go into state and the process of restoring the etcd
nodes from the snapshot will start. The cluster is restored when it returns to an active
state.
If the group of etcd nodes loses quorum, the Kubernetes cluster will report a failure because no operations, e.g. deploying workloads, can be executed in the Kubernetes cluster. The cluster should have three etcd nodes to prevent a loss of quorum. If you want to recover your set of etcd nodes, follow these instructions:
Keep only one etcd node in the cluster by removing all other etcd nodes.
On the single remaining etcd node, run the following command:
This command outputs the running command for etcd, save this command to use later.
Stop the etcd container that you launched in the previous step and rename it to
etcd-old
.$ docker rename etcd etcd-old
Take the saved command from Step 2 and revise it:
- If you originally had more than 1 etcd node, then you need to change
--initial-cluster
to only contain the node that remains. - Add
--force-new-cluster
to the end of the command.
- If you originally had more than 1 etcd node, then you need to change
Run the revised command.