Telemetry and Monitoring
Linkerd’s telemetry and monitoring features function automatically, withoutrequiring any work on the part of the developer. These features include:
- Recording of top-line (“golden”) metrics (request volume, success rate, andlatency distributions) for HTTP, HTTP/2, and gRPC traffic.
- Recording of TCP-level metrics (bytes in/out, etc) for other TCP traffic.
- Generating topology graphs that display the runtime relationship betweenservices.
Live, on-demand request sampling.This data can be consumed in several ways:
- Directly from Linkerd’s built-in Prometheus instance
This is the percentage of successful requests during a time window (1 minute bydefault).
In the output of the command , this metric is splitinto EFFECTIVE_SUCCESS and ACTUAL_SUCCESS. For routes configured with retries,the former calculates the percentage of success after retries (as perceived bythe client-side), and the latter before retries (which can expose potentialproblems with the service).
Times taken to service requests per service/route are split into 50th, 95th and99th percentiles. Lower percentiles give you an overview of the averageperformance of the system, while tail percentiles help catch outlier behavior.
Lifespan of Linkerd metrics
Linkerd is not designed as a long-term historical metrics store. WhileLinkerd’s control plane does include a Prometheus instance, this instanceexpires metrics at a short, fixed interval (currently 6 hours).
See for more.