This chapter gives your tips & tricks to help you troubleshoot deployments.
In Kubernetes all resources can be inspected using using eitherthe
To get all details of the resource (both specification & status),run the following command:
For example, to get the entire specification and statusof an
ArangoDeployment resource named
my-arangodb in the
Several types of resources (including all ArangoDB custom resources) supportevents. These events show what happened to the resource over time.
Another invaluable source of information is the log of containers being runin Kubernetes.These logs are accessible through the
Pods that group these containers.
To fetch the logs of the default container running in a
To inspect the logs of a specific container in
-c <container-name>.You can find the names of the containers in the
kubectl describe pod ….
Note that the ArangoDB operators are being deployed themselves as a Kubernetes
Deploymentwith 2 replicas. This means that you will have to fetch the logs of 2
Pods runningthose replicas.
There are two common causes for this.
Solution:Add more nodes.
1) There are no
PersistentVolumes available to be bound to the
PersistentVolumeClaims created by the operator.
Node no longer makes regular calls to the Kubernetes API server, it ismarked as not available. Depending on specific settings in your
Pods, Kuberneteswill at some point decide to terminate the
Pod. As long as the
Node is notcompletely removed from the Kubernetes API server, Kubernetes will try to usethe
Node itself to terminate the
ArangoDeployment operator recognizes this condition and will try to replace those
Pods on different nodes. The exact behavior differs per type of server.
When a with
PersistentVolumes hosted on that
Node is broken andcannot be repaired, the data in those
PersistentVolumes is lost.
ArangoDeployment of type
Cluster was using one ofthose
PersistentVolumes, it depends on the type of server that was using the volume.
- If an
Agentwas using the volume, it can be repaired as long as 2 otherAgents are still healthy.
- If a
DBServerwas using the volume, and the replication factor of all databasecollections is 2 or higher, and the remaining DB-Servers are still healthy,the cluster will duplicate the remaining replicas tobring the number of replicas back to the original number.
- If a single server of an deployment was using the volume, and theother single server is still healthy, the other single server will become leader.After replacing the failed single server, the new follower will synchronize withthe leader.