Migrating from single to multi-master

    This is a risky procedure that can lead to data-loss in the etcd cluster.Please follow all the backup steps before attempting it. Please read theetcd admin guidebefore attempting it.

    We can migrate from a single-master cluster to a multi-master cluster, but this is a complicated operation. It is easier to create a multi-master cluster using Kops (described ). If possible, try to plan this at time of cluster creation.

    During this procedure, you will experience downtime on the API server, butnot on the end user services. During this downtime, existing pods will continueto work, but you will not be able to create new pods and any existing pod thatdies will not be restarted.

    1 - Backups

    b - Backup event etcd cluster

    1. / # etcdctl backup --data-dir /var/etcd/data-events --backup-dir /var/etcd/backup
    2. / # mv /var/etcd/backup/ /var/etcd/data-events/
    3. / # exit
    4. $ kubectl --namespace=kube-system get pod etcd-server-events-ip-172-20-36-161.ec2.internal -o json | jq '.spec.volumes[] | select(.name | contains("varetcdata")) | .hostPath.path'
    5. "/mnt/master-vol-0bb5ad222911c6777/var/etcd/data-events"
    6. $ ssh admin@<master-node>
    7. admin@ip-172-20-36-161:~$ sudo -i
    8. root@ip-172-20-36-161:~# mv /mnt/master-vol-0bb5ad222911c6777/var/etcd/data-events/backup/ /home/admin/backup-events
    9. root@ip-172-20-36-161:~# chown -R admin: /home/admin/backup-events/
    10. root@ip-172-20-36-161:~# exit
    11. admin@ip-172-20-36-161:~$ exit

    a - Create new master instance group

    Create 1 kops instance group for the first one of your new masters, ina different AZ from the existing one.

    1. $ kops create instancegroup master-<availability-zone2> --subnet <availability-zone2> --role Master

    Example:

    1. $ kops create ig master-eu-west-1b --subnet eu-west-1b --role Master
    • maxSize and minSize should be 1,
    • only one zone should be listed.

    Instance group for the third master, in a different AZ from the existing one, isalso required. However, real EC2 instance is not required until the second master launches.

    1. $ kops create instancegroup master-<availability-zone3> --subnet <availability-zone3> --role Master

    Example:

    1. $ kops create ig master-eu-west-1c --subnet eu-west-1c --role Master
    • maxSize and minSize should be 0,
    • only one zone should be listed.

    c - Reference the new masters in your cluster configuration

    kops will refuse to have only 2 members in the etcd clusters, so we have toreference a third one, even if we have not created it yet.

    1. $ kops edit cluster example.com
    • In .spec.etcdClusters add 2 new members in each cluster, one for each newavailability zone.
    1. - instanceGroup: master-<availability-zone2>
    2. name: <availability-zone2-name>
    3. - instanceGroup: master-<availability-zone3>
    4. name: <availability-zone3-name>

    Example:

    3 - Add a new master

    a - Add a new member to the etcd clusters

    The clusters will stop to work until the new member is started.

    1. $ kubectl --namespace=kube-system exec etcd-server-ip-172-20-36-161.ec2.internal -- etcdctl member add etcd-<availability-zone2-name> http://etcd-<availability-zone2-name>.internal.example.com:2380 \
    2. && kubectl --namespace=kube-system exec etcd-server-events-ip-172-20-36-161.ec2.internal -- etcdctl --endpoint http://127.0.0.1:4002 member add etcd-events-<availability-zone2-name> http://etcd-events-<availability-zone2-name>.internal.example.com:2381

    Example:

    1. $ kubectl --namespace=kube-system exec etcd-server-ip-172-20-36-161.ec2.internal -- etcdctl member add etcd-b http://etcd-b.internal.example.com:2380 \
    1. $ kops update cluster example.com --yes
    2. # wait for the new master to boot and initialize
    3. $ ssh admin@<new-master>
    4. admin@ip-172-20-116-230:~$ sudo -i
    5. root@ip-172-20-116-230:~# systemctl stop kubelet
    6. root@ip-172-20-116-230:~# systemctl stop protokube
    • In both /etc/kubernetes/manifests/etcd-events.manifest and/etc/kubernetes/manifests/etcd.manifest, edit theETCD_INITIAL_CLUSTER_STATE variable to existing.
    • In the same files, remove the third non-existing member fromETCD_INITIAL_CLUSTER.
    • Delete the containers and the data directories:
    1. root@ip-172-20-116-230:~# docker stop $(docker ps | grep "gcr.io/google_containers/etcd" | awk '{print $1}')
    2. root@ip-172-20-116-230:~# rm -r /mnt/master-vol-03b97b1249caf379a/var/etcd/data-events/member/

    Launch them again:

    1. root@ip-172-20-116-230:~# systemctl start kubelet

    At this point, both etcd clusters should be healthy with two members:

    1. $ kubectl --namespace=kube-system exec etcd-server-ip-172-20-36-161.ec2.internal -- etcdctl member list
    2. $ kubectl --namespace=kube-system exec etcd-server-ip-172-20-36-161.ec2.internal -- etcdctl cluster-health
    3. $ kubectl --namespace=kube-system exec etcd-server-events-ip-172-20-36-161.ec2.internal -- etcdctl --endpoint http://127.0.0.1:4002 member list
    4. $ kubectl --namespace=kube-system exec etcd-server-events-ip-172-20-36-161.ec2.internal -- etcdctl --endpoint http://127.0.0.1:4002 cluster-health

    If not, check /var/log/etcd.log for problems.

    Restart protokube on the new master:

    1. root@ip-172-20-116-230:~# systemctl start protokube

    a - Edit instance group

    Prepare to launch the third master instance:

    • Replace maxSize and minSize values to 1.

    b - Add a new member to the etcd clusters

    1. $ kubectl --namespace=kube-system exec etcd-server-ip-172-20-36-161.ec2.internal -- etcdctl member add etcd-<availability-zone3-name> http://etcd-<availability-zone3-name>.internal.example.com:2380 \
    2. && kubectl --namespace=kube-system exec etcd-server-events-ip-172-20-36-161.ec2.internal -- etcdctl --endpoint http://127.0.0.1:4002 member add etcd-events-<availability-zone3-name> http://etcd-events-<availability-zone3-name>.internal.example.com:2381

    Example:

    1. $ kubectl --namespace=kube-system exec etcd-server-ip-172-20-36-161.ec2.internal -- etcdctl member add etcd-c http://etcd-c.internal.example.com:2380 \
    2. && kubectl --namespace=kube-system exec etcd-server-events-ip-172-20-36-161.ec2.internal -- etcdctl --endpoint http://127.0.0.1:4002 member add etcd-events-c http://etcd-events-c.internal.example.com:2381
    1. $ kops update cluster example.com --yes
    2. # wait for the third master to boot and initialize
    3. admin@ip-172-20-139-130:~$ sudo -i
    4. root@ip-172-20-139-130:~# systemctl stop kubelet
    5. root@ip-172-20-139-130:~# systemctl stop protokube

    Reinitialize the etcd instances:

    • In both /etc/kubernetes/manifests/etcd-events.manifest and/etc/kubernetes/manifests/etcd.manifest, edit theETCD_INITIAL_CLUSTER_STATE variable to existing.
    • Delete the containers and the data directories:

      1. root@ip-172-20-139-130:~# docker stop $(docker ps | grep "gcr.io/google_containers/etcd" | awk '{print $1}')
      2. root@ip-172-20-139-130:~# rm -r /mnt/master-vol-019796c3511a91b4f//var/etcd/data-events/member/
      3. root@ip-172-20-139-130:~# rm -r /mnt/master-vol-0c89fd6f6a256b686/var/etcd/data/member/

      Launch them again:

      1. root@ip-172-20-139-130:~# systemctl start kubelet

      At this point, both etcd clusters should be healthy with three members:

      1. $ kubectl --namespace=kube-system exec etcd-server-ip-172-20-36-161.ec2.internal -- etcdctl member list
      2. $ kubectl --namespace=kube-system exec etcd-server-ip-172-20-36-161.ec2.internal -- etcdctl cluster-health
      3. $ kubectl --namespace=kube-system exec etcd-server-events-ip-172-20-36-161.ec2.internal -- etcdctl --endpoint http://127.0.0.1:4002 member list
      4. $ kubectl --namespace=kube-system exec etcd-server-events-ip-172-20-36-161.ec2.internal -- etcdctl --endpoint http://127.0.0.1:4002 cluster-health

      If not, check /var/log/etcd.log for problems.

      1. root@ip-172-20-139-130:~# systemctl start protokube

    5 - Cleanup

    To be sure that everything runs smoothly and is setup correctly, it is advisedto terminate the masters one after the other (always keeping 2 of them up andrunning). They will be restarted with a clean config and should join the otherswithout any problems.

    While optional, this last step allows you to be sure that your masters arefully configured by Kops and that there is no residual manual configuration.If there is any configuration problem, they will be detected during this stepand not during a future upgrade or, worse, during a master failure.

    In case you failed to upgrade to multi-master you will need to restore from the backup you have taken previously.

    Take extra care because kops will not start etcd and etcd-events with the same ID on an/or for example but will mix them (ex: etcd-b and etcd-events-c on & etcd-c and etcd-events-b on ); this can be double checked in Route53 where kops will create DNS records for your services.

    If your 2nd spinned master failed and cluster becomes inconsistent edit the corresponding kops master instancegroup and switch MinSize and MaxSize to “0” and run an update on your cluster.

    Next ssh into your primary master:

    systemctl stop kubeletsystemctl stop protokube

    Reinitialize the etcd instances:

    • In both /etc/kubernetes/manifests/etcd-events.manifest and/etc/kubernetes/manifests/etcd.manifest, add theETCD_FORCE_NEW_CLUSTER variable with value 1.

    Now start back the services and watch for the logs:

    systemctl start kubelettail -f /var/log/etcd* # for errors, if no errors encountered re-start also protokubesystemctl start protokube

    Test if your master is reboot-proof:

    Note! Would recommend also to use Amazon Lambda to take daily Snapshots of all your persistent volume so you can have from what to recover in case of failures.