Diagnostics Tool

    • Verifies that the default registry and router are running and correctly configured.

    • Checks **ClusterRoleBindings** and **ClusterRoles** for consistency with base policy.

    • Checks that all of the client configuration contexts are valid and can be connected to.

    • Checks that SkyDNS is working properly and the pods have SDN connectivity.

    • Validates master and node configuration on the host.

    • Checks that nodes are running and available.

    • Checks that systemd units are configured as expected for the host.

    You can deploy OKD in several ways. These include:

    • Built from source

    • Included within a VM image

    • As a container image

    • Using enterprise RPMs

    Each method is suited for a different configuration and environment. To minimize environment assumptions, the diagnostics tool is included with the openshift binary to provide diagnostics within an OKD server or client.

    To use the diagnostics tool, preferably on a master host and as cluster administrator, run:

    This runs all available diagnostics and skips any that do not apply to the environment.

    You can run a specific diagnostics by name or run specific diagnostics by name as you work to address issues. For example:

    1. $ oc adm diagnostics

    The options for the diagnostics tool require working configuration files. For example, the NodeConfigCheck does not run unless a node configuration is available.

    The diagnostics tool uses the standard configuration file locations by default:

    • Client:

      • As indicated by the $KUBECONFIG environment variable

      • ~/.kube/config file

    • Master:

      • /etc/origin/master/master-config.yaml
    • Node:

      • /etc/origin/node/node-config.yaml

    You can specify non-standard locations with the --config, --master-config, and --node-config options. If a configuration file is not specified, related diagnostics are skipped.

    Available diagnostics include:

    An Ansible-deployed cluster provides additional diagnostic benefits for nodes within an OKD cluster. These include:

    • Systemd units are configured to manage the server(s).

    • Both master and node configuration files are in standard locations.

    • Systemd units are created and configured for managing the nodes in a cluster.

    • All components log to journald.

    Keeping to the default location of the configuration files placed by an Ansible-deployed cluster ensures that running oc adm diagnostics works without any flags. If you are not using the default location for the configuration files, you must use the --master-config and --node-config options:

    Systemd units and logs entries in journald are necessary for the current log diagnostic logic. For other deployment types, logs can be stored in single files, stored in files that combine node and master logs, or printed to stdout. If log entries do not use journald, the log diagnostics cannot work and do not run.

    You can run the diagnostics tool as an ordinary user or a cluster-admin, and it runs using the level of permissions granted to the account from which you run it.

    A client with ordinary access can diagnose its connection to the master and run a diagnostic pod. If multiple users or masters are configured, connections are tested for all, but the diagnostic pod only runs against the current user, server, or project.

    A client with cluster-admin access can diagnose the status of infrastructure such as nodes, registry, and router. In each case, running oc adm diagnostics searches for the standard client configuration file in its standard location and uses it if available.

    Additional diagnostic health checks are available through the Ansible-based tooling used to install and manage OKD clusters. They can report common deployment problems for the current OKD installation.

    These checks can be run either using the ansible-playbook command (the same method used during ) or as a containerized version of openshift-ansible. For the ansible-playbook method, the checks are provided by the Git repository. For the containerized method, the openshift/origin-ansible container image is distributed via Docker Hub. Example usage for each method are provided in subsequent sections.

    The following health checks are a set of diagnostic tasks that are meant to be run against the Ansible inventory file for a deployed OKD cluster using the provided health.yml playbook.

    A similar set of checks meant to run as part of the installation process can be found in Configuring Cluster Pre-install Checks. Another set of checks for checking certificate expiration can be found in .

    To run the openshift-ansible health checks using the ansible-playbook command, change to the playbook directory, specify your cluster’s inventory file, and run the health.yml playbook:

    1. $ cd /usr/share/ansible/openshift-ansible
    2. $ ansible-playbook -i <inventory_file> \
    3. playbooks/openshift-checks/health.yml

    To set variables in the command line, include the -e flag with any desired variables in key=value format. For example:

    To disable specific checks, include the variable openshift_disable_check with a comma-delimited list of check names in your inventory file before running the playbook. For example:

    1. openshift_disable_check=etcd_traffic,etcd_volume

    Alternatively, set any checks to disable as variables with -e openshift_disable_check=<check1>,<check2> when running the ansible-playbook command.

    Running Health Checks via Docker CLI

    You can run the openshift-ansible playbooks in a container, avoiding the need for installing and configuring Ansible, on any host that can run the origin-ansible image via the Docker CLI.

    Run the following as a non-root user that has privileges to run containers:

    In the previous command, the SSH key is mounted with the :Z option so that the container can read the SSH key from its restricted SELinux context. Adding this option means that your original SSH key file is relabeled similarly to system_u:object_r:container_file_t:s0:c113,c247. For more details about :Z, see the docker-run(1) man page.

    These volume mount specifications can have unexpected consequences. For example, if you mount, and therefore relabel, the $HOME/.ssh directory, sshd becomes unable to access the public keys to allow remote login. To avoid altering the original file labels, mount a copy of the SSH key or directory.

    Mounting an entire .ssh directory can be helpful for:

    • Allowing you to use an SSH configuration to match keys with hosts or modify other connection parameters.