Statistics

    The cluster manager has a statistics tree rooted at cluster_manager. with the following statistics. Any character in the stats name is replaced with . Stats include all clusters managed by the cluster manager, including both clusters used for data plane upstreams and control plane xDS clusters.

    Every cluster has a statistics tree rooted at cluster.. with the following statistics:

    Name

    Type

    Description

    upstream_cx_total

    Counter

    Total connections

    upstream_cx_active

    Gauge

    Total active connections

    upstream_cx_http1_total

    Counter

    Total HTTP/1.1 connections

    upstream_cx_http2_total

    Counter

    Total HTTP/2 connections

    upstream_cx_connect_fail

    Counter

    Total connection failures

    upstream_cx_connect_timeout

    Counter

    Total connection connect timeouts

    upstream_cx_idle_timeout

    Counter

    Total connection idle timeouts

    upstream_cx_connect_attempts_exceeded

    Counter

    Total consecutive connection failures exceeding configured connection attempts

    upstream_cx_overflow

    Counter

    Total times that the cluster’s connection circuit breaker overflowed

    upstream_cx_connect_ms

    Histogram

    Connection establishment milliseconds

    upstream_cx_length_ms

    Histogram

    Connection length milliseconds

    upstream_cx_destroy

    Counter

    Total destroyed connections

    upstream_cx_destroy_local

    Counter

    Total connections destroyed locally

    upstream_cx_destroy_remote

    Counter

    Total connections destroyed remotely

    upstream_cx_destroy_with_active_rq

    Counter

    Total connections destroyed with 1+ active request

    upstream_cx_destroy_local_with_active_rq

    Counter

    Total connections destroyed locally with 1+ active request

    upstream_cx_destroy_remote_with_active_rq

    Counter

    Total connections destroyed remotely with 1+ active request

    upstream_cx_close_notify

    Counter

    Total connections closed via HTTP/1.1 connection close header or HTTP/2 GOAWAY

    upstream_cx_rx_bytes_total

    Counter

    Total received connection bytes

    upstream_cx_rx_bytes_buffered

    Gauge

    Received connection bytes currently buffered

    upstream_cx_tx_bytes_total

    Counter

    Total sent connection bytes

    upstream_cx_tx_bytes_buffered

    Gauge

    Send connection bytes currently buffered

    upstream_cx_pool_overflow

    Counter

    Total times that the cluster’s connection pool circuit breaker overflowed

    upstream_cx_protocol_error

    Counter

    Total connection protocol errors

    upstream_cx_max_requests

    Counter

    Total connections closed due to maximum requests

    upstream_cx_none_healthy

    Counter

    Total times connection not established due to no healthy hosts

    upstream_rq_total

    Counter

    Total requests

    upstream_rq_active

    Gauge

    Total active requests

    upstream_rq_pending_total

    Counter

    Total requests pending a connection pool connection

    upstream_rq_pending_overflow

    Counter

    Total requests that overflowed connection pool circuit breaking and were failed

    upstream_rq_pending_failure_eject

    Counter

    Total requests that were failed due to a connection pool connection failure

    upstream_rq_pending_active

    Gauge

    Total active requests pending a connection pool connection

    upstream_rq_cancelled

    Counter

    Total requests cancelled before obtaining a connection pool connection

    upstream_rq_maintenance_mode

    Counter

    Total requests that resulted in an immediate 503 due to maintenance mode

    upstream_rq_timeout

    Counter

    Total requests that timed out waiting for a response

    upstream_rq_per_try_timeout

    Counter

    Total requests that hit the per try timeout

    upstream_rq_rx_reset

    Counter

    Total requests that were reset remotely

    upstream_rq_tx_reset

    Counter

    Total requests that were reset locally

    upstream_rq_retry

    Counter

    Total request retries

    upstream_rq_retry_success

    Counter

    upstream_rq_retry_overflow

    Counter

    Total requests not retried due to circuit breaking

    upstream_flow_control_paused_reading_total

    Counter

    Total number of times flow control paused reading from upstream

    upstream_flow_control_resumed_reading_total

    Counter

    Total number of times flow control resumed reading from upstream

    upstream_flow_control_backed_up_total

    Counter

    Total number of times the upstream connection backed up and paused reads from downstream

    upstream_flow_control_drained_total

    Counter

    Total number of times the upstream connection drained and resumed reads from downstream

    upstream_internal_redirect_failed_total

    Counter

    Total number of times failed internal redirects resulted in redirects being passed downstream.

    upstream_internal_redirect_succeed_total

    Counter

    Total number of times internal redirects resulted in a second upstream request.

    membership_change

    Counter

    Total cluster membership changes

    membership_healthy

    Gauge

    Current cluster healthy total (inclusive of both health checking and outlier detection)

    membership_degraded

    Gauge

    Current cluster degraded total

    membership_total

    Gauge

    Current cluster membership total

    retry_or_shadow_abandoned

    Counter

    Total number of times shadowing or retry buffering was canceled due to buffer limits

    config_reload

    Counter

    Total API fetches that resulted in a config reload due to a different config

    update_attempt

    Counter

    Total cluster membership update attempts

    update_success

    Counter

    Total cluster membership update successes

    update_failure

    Counter

    Total cluster membership update failures

    update_empty

    Counter

    Total cluster membership updates ending with empty cluster load assignment and continuing with previous config

    update_no_rebuild

    Counter

    Total successful cluster membership updates that didn’t result in any cluster load balancing structure rebuilds

    version

    Gauge

    Hash of the contents from the last successful API fetch

    max_host_weight

    Gauge

    Maximum weight of any host in the cluster

    bind_errors

    Counter

    Total errors binding the socket to the configured source address

    assignment_timeout_received

    Counter

    Total assignments received with endpoint lease information.

    assignment_stale

    Counter

    Number of times the received assignments went stale before new assignments arrived.

    Health check statistics

    If health check is configured, the cluster has an additional statistics tree rooted at cluster..health_check. with the following statistics:

    Name

    Type

    Description

    attempt

    Counter

    Number of health checks

    success

    Counter

    Number of successful health checks

    failure

    Counter

    Number of immediately failed health checks (e.g. HTTP 503) as well as network failures

    passive_failure

    Counter

    Number of health check failures due to passive events (e.g. x-envoy-immediate-health-check-fail)

    network_failure

    Counter

    Number of health check failures due to network error

    verify_cluster

    Counter

    Number of health checks that attempted cluster name verification

    healthy

    Gauge

    Number of healthy members

    Outlier detection statistics

    If is configured for a cluster, statistics will be rooted at cluster..outlier_detection. and contain the following:

    Circuit breakers statistics will be rooted at cluster..circuit_breakers.. and contain the following:

    Name

    Type

    Description

    cx_open

    Gauge

    Whether the connection circuit breaker is closed (0) or open (1)

    cx_pool_open

    Gauge

    rq_pending_open

    Gauge

    Whether the pending requests circuit breaker is closed (0) or open (1)

    rq_open

    Gauge

    Whether the requests circuit breaker is closed (0) or open (1)

    rq_retry_open

    Gauge

    Whether the retry circuit breaker is closed (0) or open (1)

    remaining_cx

    Gauge

    Number of remaining connections until the circuit breaker opens

    remaining_pending

    Gauge

    Number of remaining pending requests until the circuit breaker opens

    remaining_rq

    Gauge

    Number of remaining requests until the circuit breaker opens

    remaining_retries

    Gauge

    Number of remaining retries until the circuit breaker opens

    Dynamic HTTP statistics

    If HTTP is used, dynamic HTTP response code statistics are also available. These are emitted by various internal systems as well as some filters such as the and rate limit filter. They are rooted at cluster.. and contain the following statistics:

    Name

    Type

    Description

    upstreamrq_completed

    Counter

    Total upstream requests completed

    upstream_rq<xx>

    Counter

    Aggregate HTTP response codes (e.g., 2xx, 3xx, etc.)

    upstreamrq<>

    Counter

    Specific HTTP response codes (e.g., 201, 302, etc.)

    upstreamrq_time

    Histogram

    Request time milliseconds

    canary.upstream_rq_completed

    Counter

    Total upstream canary requests completed

    canary.upstream_rq<xx>

    Counter

    Upstream canary aggregate HTTP response codes

    canary.upstreamrq<>

    Counter

    Upstream canary specific HTTP response codes

    canary.upstreamrq_time

    Histogram

    Upstream canary request time milliseconds

    internal.upstream_rq_completed

    Counter

    Total internal origin requests completed

    internal.upstream_rq<xx>

    Counter

    Internal origin aggregate HTTP response codes

    internal.upstreamrq<>

    Counter

    Internal origin specific HTTP response codes

    internal.upstreamrq_time

    Histogram

    Internal origin request time milliseconds

    external.upstream_rq_completed

    Counter

    Total external origin requests completed

    external.upstream_rq<xx>

    Counter

    External origin aggregate HTTP response codes

    external.upstreamrq<>

    Counter

    External origin specific HTTP response codes

    external.upstream_rq_time

    Histogram

    External origin request time milliseconds

    Alternate tree dynamic HTTP statistics

    If alternate tree statistics are configured, they will be present in the cluster... namespace. The statistics produced are the same as documented in the dynamic HTTP statistics section above.

    If the service zone is available for the local service (via ) and the upstream cluster, Envoy will track the following statistics in cluster..zone... namespace.

    Load balancer statistics

    Statistics for monitoring load balancer decisions. Stats are rooted at cluster.. and contain the following statistics:

    Name

    Type

    Description

    lb_recalculate_zone_structures

    Counter

    The number of times locality aware routing structures are regenerated for fast decisions on upstream locality selection

    lb_healthy_panic

    Counter

    Total requests load balanced with the load balancer in panic mode

    lb_zone_cluster_too_small

    Counter

    No zone aware routing because of small upstream cluster size

    lb_zone_routing_all_directly

    Counter

    Sending all requests directly to the same zone

    lb_zone_routing_sampled

    Counter

    Sending some requests to the same zone

    lb_zone_routing_cross_zone

    Counter

    Zone aware routing mode but have to send cross zone

    lb_local_cluster_not_ok

    Counter

    Local host set is not set or it is panic mode for local cluster

    lb_zone_number_differs

    Counter

    Number of zones in local and upstream cluster different

    lb_zone_no_capacity_left

    Counter

    Total number of times ended with random zone selection due to rounding error

    original_dst_host_invalid

    Counter

    Total number of invalid hosts passed to original destination load balancer

    Load balancer subset statistics

    Statistics for monitoring decisions. Stats are rooted at cluster.. and contain the following statistics:

    Name

    Type

    Description

    lb_subsets_active

    Gauge

    Number of currently available subsets

    lb_subsets_created

    Counter

    Number of subsets created

    lb_subsets_removed

    Counter

    Number of subsets removed due to no hosts

    lb_subsets_selected

    Counter

    Number of times any subset was selected for load balancing

    lb_subsets_fallback

    Counter

    Number of times the fallback policy was invoked

    lb_subsets_fallback_panic

    Counter

    Number of times the subset panic mode triggered

    Statistics for monitoring the size and effective distribution of hashes when using the ring hash load balancer. Stats are rooted at cluster..ring_hash_lb. and contain the following statistics:

    Maglev load balancer statistics

    Statistics for monitoring effective host weights when using the Maglev load balancer. Stats are rooted at cluster..maglev_lb. and contain the following statistics:

    Name

    Type

    Description

    min_entries_per_host

    Gauge

    Minimum number of entries for a single host

    max_entries_per_host

    Maximum number of entries for a single host