High Availability
- Runs three replicas of critical control plane components.
- Sets production-ready CPU and memory resource requests on data plane proxies
- Requires that the proxy auto-injector be functional for any pods to be scheduled.
- Sets on critical control plane components to ensure, if possible, that they are scheduled on separate nodes and in separate zones by default.
You can enable HA mode at control plane installation time with the flag:
Also note the Viz extension also supports an --ha
flag with similar characteristics:
You can override certain aspects of the HA behavior at installation time by passing other flags to the install
command. For example, you can override the number of replicas for critical components with the --controller-replicas
flag:
See the full install
CLI documentation for reference.
The linkerd upgrade
command can be used to enable HA mode on an existing control plane:
Proxy injector failure policy
The HA proxy injector is deployed with a stricter failure policy to enforce automatic proxy injection. This setup ensures that no annotated workloads are accidentally scheduled to run on your cluster, without the Linkerd proxy. (This can happen when the proxy injector is down.)
If proxy injection process failed due to unrecognized or timeout errors during the admission phase, the workload admission will be rejected by the Kubernetes API server, and the deployment will fail.
Hence, it is very important that there is always at least one healthy replica of the proxy injector running on your cluster.
Note
See the Kubernetes for more information on the admission webhook failure policy.
Per recommendation from the Kubernetes documentation, the proxy injector should be disabled for the kube-system
namespace.
This can be done by labeling the kube-system
namespace with the following label:
The Kubernetes API server will not call the proxy injector during the admission phase of workloads in namespace with this label.
If your Kubernetes cluster have built-in reconcilers that would revert any changes made to the kube-system
namespace, you should loosen the proxy injector failure policy following these .
Pod anti-affinity rules
All critical control plane components are deployed with pod anti-affinity rules to ensure redundancy.
Linkerd uses a requiredDuringSchedulingIgnoredDuringExecution
pod anti-affinity rule to ensure that the Kubernetes scheduler does not colocate replicas of critical component on the same node. A preferredDuringSchedulingIgnoredDuringExecution
pod anti-affinity rule is also added to try to schedule replicas in different zones, where possible.
Note that these anti-affinity rules don’t apply to add-on components like Prometheus and Grafana.
The Linkerd Viz extension provides a pre-configured Prometheus pod, but for production workloads we recommend setting up your own Prometheus instance. To scrape the data plane metrics, follow the instructions . This will provide you with more control over resource requirement, backup strategy and data retention.
When planning for memory capacity to store Linkerd timeseries data, the usual guidance is 5MB per meshed pod.
If your Prometheus is experiencing regular events due to the amount of data coming from the data plane, the two key parameters that can be adjusted are:
storage.tsdb.retention.size
defines the maximum number of bytes that can be stored for blocks. A lower value will also help to reduce the number ofOOMKilled
events
For more information and other supported storage options, see the Prometheus documentation here.
Working with Cluster AutoScaler
The Linkerd proxy stores its mTLS private key in a tmpfs emptyDir volume to ensure that this information never leaves the pod. This causes the default setup of Cluster AutoScaler to not be able to scale down nodes with injected workload replicas.
The workaround is to annotate the injected workload with the cluster-autoscaler.kubernetes.io/safe-to-evict: "true"
annotation. If you have full control over the Cluster AutoScaler configuration, you can start the Cluster AutoScaler with the option.
For more information on this, see the Cluster AutoScaler documentation .