Debugging

    While there is no common culprit when debugging, the DNS Proxy shares the least code with other system and so is more likely the least audited in this chain. The cascading caching scheme is also complex in its behaviour. Determining whether an issue is caused by the DNS components, in the policy layer or in the datapath is often the first step when debugging toFQDNs related issues. Generally, working top-down is easiest as the information needed to verify low-level correctness can be collected in the initial debug invocations.

    REFUSED vs NXDOMAIN responses

    The proxy uses REFUSED DNS responses to indicate a denied request. Some libc implementations, notably musl which is common in Alpine Linux images, terminate the whole DNS search in these cases. This often manifests as a connect error in applications, as the libc lookup returns no data. To work around this, denied responses can be configured to be NXDOMAIN by setting the command line argument.

    Monitor Events

    The DNS Proxy emits multiple L7 DNS monitor events. One for the request and one for the response (if allowed). Often the L7 DNS rules are paired with L3 toFQDNs rules and events relating to those rules are also relevant.

    Note

    Be sure to run cilium monitor on the same node as the pod being debugged!

    The above is for a simple curl cilium.io in a pod. The L7 DNS request is the first set of message and the subsequent L3 connection is the HTTP component. AAAA DNS lookups commonly happen but were removed to simplify the example.

    • If no L7 DNS requests appear, the proxy redirect is not in place. This may mean that the policy does not select this endpoint or there is an issue with the proxy redirection. Whether any redirects exist can be checked with cilium status --all-redirects. In the past, a bug occurred with more permissive L3 rules overriding the proxy redirect, causing the proxy to never see the requests.
    • If the L7 DNS request is blocked, with an explicit denied message, then the requests are not allowed by the proxy. This may be due to a typo in the network policy, or the matchPattern rule not allowing this domain. It may also be due to a bug in policy propagation to the DNS Proxy.
    • If the DNS request is allowed, with an explicit message, and it should not be, this may be because a more general policy is in place that allows the request. matchPattern: "*" visibility policies are commonly in place and would supersede all other, more restrictive, policies. If no other policies are in place, incorrect allows may indicate a bug when passing policy information to the proxy. There is no way to dump the rules in the proxy, but a debug log is printed when a rule is added. Look for DNS Proxy updating matchNames in allowed list during UpdateRules. The pkg/proxy/dns.go file contains the DNS proxy implementation.

    If L7 DNS behaviour seems correct, see the sections below to further isolate the issue. This can be verified with cilium fqdn cache list. The IPs in the response should appear in the cache for the appropriate endpoint. The lookup time is included in the json output of the command.

    1. $ kubectl exec pod/cilium-sbp8v -n cilium -- cilium fqdn cache list
    2. Endpoint Source FQDN TTL ExpirationTime IPs
    3. 3459 lookup cilium.io. 3600 2020-04-21T15:04:27.146Z 104.198.14.52

    DNS Proxy Errors

    REFUSED responses are returned when the proxy encounters an error during processing. This can be confusing to debug as that is also the response when a DNS request is denied. An error log is always printed in these cases. Some are callbacks provided by other packages via daemon in cilium-agent.

    • Rejecting DNS query from endpoint due to error: This is the “normal” policy-reject message. It is a debug log.
    • cannot extract endpoint IP from DNS request: The proxy cannot read the socket information to read the source endpoint IP. This could mean an issue with the datapath routing and information passing.
    • cannot extract endpoint ID from DNS request: The proxy cannot use the source endpoint IP to get the cilium-internal ID for that endpoint. This is different from the Security Identity. This could mean that cilium is not managing this endpoint and that something has gone awry. It could also mean a routing problem where a packet has arrived at the proxy incorrectly.
    • cannot extract destination IP:port from DNS request: The proxy cannot read the socket information of the original request to obtain the intended target IP:Port. This could mean an issue with the datapath routing and information passing.
    • cannot find server ip in ipcache: The proxy cannot resolve a Security Identity for the target IP of the DNS request. This should always succeed, as world catches all IPs not set by more specific entries. This can mean a broken ipcache BPF table.
    • Rejecting DNS query from endpoint due to error: While checking if the DNS request was allowed (based on Endpoint ID, destination IP:Port and the DNS query) an error occurred. These errors would come from the internal rule lookup in the proxy, the allowed field.
    • Timeout waiting for response to forwarded proxied DNS lookup: The proxy forwards requests 1:1 and does not cache. It applies a 10s timeout on responses to those requests, as the client will retry within this period (usually). Bursts of these errors can happen if the DNS target server misbehaves and many pods see DNS timeouts. This isn’t an actual problem with cilium or the proxy although it can be caused by policy blocking the DNS target server if it is in-cluster.
    • Timed out waiting for datapath updates of FQDN IP information; returning response: When the proxy updates the DNS caches with response data, it needs to allow some time for that information to get into the datapath. Otherwise, pods would attempt to make the outbound connection (the thing that caused the DNS lookup) before the datapath is ready. Many stacks retry the SYN in such cases but some return an error and some apps further crash as a response. This delay is configurable by setting the --tofqdns-proxy-response-max-delay command line argument but defaults to 100ms. It can be exceeded if the system is under load.

    Identities and Policy

    • A per-Endpoint DNSZombieMapping list of IPs that have expired from the per-Endpoint cache but are waiting for the Connection Tracking GC to mark them in-use or not. This can take up to 12 hours to occur. This list is size-limited by --tofqdns-max-deferred-connection-deletes.
    • A global DNSCache where all endpoint and poller DNS data is collected. It does apply the --tofqdns-min-ttl value but not the --tofqdns-endpoint-max-ip-per-hostname value.

    If an IP exists in the FQDN cache (check with cilium fqdn cache list) then toFQDNs rules that select a domain name, either explicitly via matchName or via , should cause IPs for that domain to have allocated Security Identities. These can be listed with:

    1. $ kubectl exec pod/cilium-sbp8v -n cilium -- cilium identity list
    2. ID LABELS
    3. 1 reserved:host
    4. 2 reserved:world
    5. 3 reserved:unmanaged
    6. 4 reserved:health
    7. 5 reserved:init
    8. 6 reserved:remote-node
    9. 323 k8s:class=xwing
    10. k8s:io.cilium.k8s.policy.cluster=default
    11. k8s:io.cilium.k8s.policy.serviceaccount=default
    12. k8s:io.kubernetes.pod.namespace=default
    13. k8s:org=alliance
    14. ...
    15. 16777217 cidr:104.198.14.52/32
    16. reserved:world

    Note that CIDR identities are allocated locally on the node and have a high-bit set so they are often in the 16-million range. Note that this is the identity in the monitor output for the HTTP connection.

    In cases where there is no matching identity for an IP in the fqdn cache it may simply be because no policy selects an associated domain. The policy system represents each toFQDNs: rule with a FQDNSelector instance. These receive updates from a global NameManage in the daemon. They can be listed along with other selectors (roughly corresponding to any L3 rule):

    In this example 16777217 is used by two selectors, one with matchPattern: "*" and another empty one. This is because of the policy in use:

    1. apiVersion: cilium.io/v2
    2. kind: CiliumNetworkPolicy
    3. metadata:
    4. name: "tofqdn-dns-visibility"
    5. spec:
    6. endpointSelector:
    7. any:org: alliance
    8. egress:
    9. - toPorts:
    10. - port: "53"
    11. protocol: ANY
    12. rules:
    13. dns:
    14. - matchPattern: "*"
    15. - toFQDNs:
    16. - matchPattern: "*"

    The L7 DNS rule has an implicit L3 allow-all because it defines only L4 and L7 sections. This is the second selector in the list, and includes all possible L3 identities known in the system. In contrast, the first selector, which corresponds to the toFQDNS: matchName: "*" rule would list all identities for IPs that came from the DNS Proxy. Other CIDR identities would not be included.

    Unintended DNS Policy Drops

    toFQDNSs policy enforcement relies on the source POD performing a DNS query before using an IP address returned in the DNS response. Sometimes PODs may hold on to a DNS response and start new connections to the same IP address at a later time. This may trigger policy drops if the DNS response has expired as requested by the DNS server in the time-to-live (TTL) value in the response. When DNS is used for service load balancing the advertised TTL value may be short (e.g., 60 seconds). To allow for reasonable POD behavior without unintended policy drops Cilium employs a configurable minimum DNS TTL value via --tofqdns-min-ttl which defaults to 3600 seconds. This setting overrides short TTLs and allows the POD to use the IP address in the DNS response for one hour. Existing connections also keep the IP address as allowed in the policy. Any new connections opened by the POD using the same IP address without performing a new DNS query after the (possibly extended) DNS TTL has expired can be dropped by Cilium policy enforcement. To allow PODs to use the DNS response after TTL expiry for new connections a command line option --tofqdns-idle-connection-grace-period may be used to keep the IP-address/name mapping valid in the policy for an extended time after DNS TTL expiry. This option takes effect only if the POD has opened at least one connection during the DNS TTL period.

    Datapath Plumbing

    For a policy to be fully realized the datapath for an Endpoint must be updated. In the case of a new DNS-source IP, the CIDR identity associated with it must propagate from the selectors to the Endpoint specific policy. Unless a new policy is being added, this often only involves updating the Policy Map of the Endpoint with the new CIDR Identity of the IP. This can be verified:

    1. $ kubectl exec pod/cilium-sbp8v -n cilium -- cilium bpf policy get 3459
    2. DIRECTION LABELS (source:key[=value]) PORT/PROTO PROXY PORT BYTES PACKETS
    3. Ingress reserved:unknown ANY NONE 1367 7
    4. Ingress reserved:host ANY NONE 0 0
    5. Egress reserved:unknown 53/TCP 36447 0 0
    6. Egress reserved:unknown 53/UDP 36447 138 2
    7. Egress cidr:104.198.14.52/32 ANY NONE 477 6
    8. reserved:world

    Note that the labels for identities are resolved here. This can be skipped, or there may be cases where this doesn’t occur:

    An identity missing here can be an error in various places:

    • Policy doesn’t actually allow this Endpoint to connect. A sanity check is to use cilium endpoint list to see if cilium thinks it should have policy enforcement.
    • Endpoint regeneration is slow and the Policy Map has not been updated yet. This can occur in cases where we have leaked IPs from the DNS cache (i.e. they were never deleted correctly) or when there are legitimately many IPs. It can also simply mean an overloaded node or even a deadlock within cilium.
    • A more permissive policy has removed the need to include this identity. This is likely a bug, however, as the IP would still have an identity allocated and it would be included in the Policy Map. In the past, a similar bug occurred with the L7 redirect and that would stop this whole process at the beginning.

    Mutexes / Locks and Data Races

    Note

    This section only applies to Golang code.

    There are a few options available to debug Cilium data races and deadlocks.

    To debug data races, Golang allows -race to be passed to the compiler to compile Cilium with race detection. Additionally, the flag can be provided to go test to detect data races in a testing context.

    To compile a Cilium binary with race detection, you can do:

    1. $ make RACE=1

    To run unit tests with race detection, you can do:

    1. $ make RACE=1 unit-tests

    Deadlock detection

    Cilium can be compiled with a build tag lockdebug which will provide a seamless wrapper over the standard mutex types in Golang, via sasha-s/go-deadlock library. No action is required, besides building the binary with this tag.