Recover tserver/master

NoteIn this scenario we have a N-node setup, with replication factor (RF)=3.

It’s a good idea to have a cron/systemd setup to ensure that the yb-master/yb-tserver process is restarted if it is not running.This handles transient failures (such as a node rebooting or process crash due to a bug/some unexpected behavior).

If the node failure is a permanent failure, for the yb-tserver, simply starting another yb-tserver process on the new node is sufficient.It’ll join the cluster and load-balancer will automatically take the new yb-tserver into consideration and start rebalancing tablets to it.

If a new yb-master needs to be started to replace a failed master, the master quorum needs to be updated.Suppose, the original yb-masters were n1, n2, n3. And n3 needs to be replaced with a new yb-master n4. Then you’ll use the sub-command :

This is to handle the case if the restarts at some point in the future.

You might also be interested in how to perform planned cluster changes (such as moving the entire cluster to a brand new set of nodes — say move from machines of type A to type B).