Performance and Scalability

    The Istio data plane components, the Envoy proxies, handle data flowing through the system. The Istio control plane component, Istiod, configures the data plane. The data plane and control plane have distinct performance concerns.

    The mesh consists of 1000 services and 2000 sidecars with 70,000 mesh-wide requests per second. After running the tests using Istio 1.10, we get the following results:

    • The Envoy proxy uses 0.35 vCPU and 40 MB memory per 1000 requests per second going through the proxy.
    • Istiod uses 1 vCPU and 1.5 GB of memory.

    Istiod configures sidecar proxies based on user authored configuration files and the current state of the system. In a Kubernetes environment, Custom Resource Definitions (CRDs) and deployments constitute the configuration and state of the system. The Istio configuration objects like gateways and virtual services, provide the user-authored configuration. To produce the configuration for the proxies, Istiod processes the combined configuration and system state from the Kubernetes environment and the user-authored configuration.

    The control plane supports thousands of services, spread across thousands of pods with a similar number of user authored virtual services and other configuration objects. Istiod’s CPU and memory requirements scale with the amount of configurations and possible system states. The CPU consumption scales with the following factors:

    • The rate of deployment changes.
    • The rate of configuration changes.
    • The number of proxies connecting to Istiod.

    however this part is inherently horizontally scalable.

    When namespace isolation is enabled, a single Istiod instance can support 1000 services, 2000 sidecars with 1 vCPU and 1.5 GB of memory. You can increase the number of Istiod instances to reduce the amount of time it takes for the configuration to reach all proxies.

    Data plane performance depends on many factors, for example:

    • Number of client connections
    • Target request rate
    • Request size and Response size
    • Number of proxy worker threads
    • Protocol
    • CPU cores

    Since the sidecar proxy performs additional work on the data path, it consumes CPU and memory. As of Istio 1.7, a proxy consumes about 0.5 vCPU per 1000 requests per second.

    The memory consumption of the proxy depends on the total configuration state the proxy holds. A large number of listeners, clusters, and routes can increase memory usage. Istio 1.1 introduced namespace isolation to limit the scope of the configuration sent to a proxy. In a large namespace, the proxy consumes approximately 50 MB of memory.

    Since the proxy normally doesn’t buffer the data passing through, request rate doesn’t affect the memory consumption.

    Since Istio injects a sidecar proxy on the data path, latency is an important consideration. Istio adds an authentication filter and a telemetry filter and a metadata exchange filter to the proxy. Every additional filter adds to the path length inside the proxy and affects latency.

    The Envoy proxy collects raw telemetry data after a response is sent to the client. The time spent collecting raw telemetry for a request does not contribute to the total time taken to complete that request. However, since the worker is busy handling the request, the worker won’t start handling the next request immediately. This process adds to the queue wait time of the next request and affects average and tail latencies. The actual tail latency depends on the traffic pattern.

    Note: in Istio release 1.7, we are introducing a new way of measuring performance by enabling in the load generator. It helps by modeling random traffic from the client side when using connection pools. In the next section, we will present both jitter and non-jitter performance measurements.

    Inside the mesh, a request traverses the client-side proxy and then the server-side proxy. In the default configuration of Istio 1.10 (i.e. Istio with telemetry v2), the two proxies add about 2.65 ms and 2.91 ms to the 90th and 99th percentile latency, respectively, over the baseline data plane latency. After enabling jitter, those numbers reduced to 1.7 ms and 2.69 ms, respectively. We obtained these results using the for the http/1.1 protocol, with a 1 kB payload at 1000 requests per second using 16 client connections, 2 proxy workers and mutual TLS enabled.

    P90 latency vs client connections without jitter

    P99 latency vs client connections without jitter

    P90 latency vs client connections with jitter

    P99 latency vs client connections with jitter

    • Client pod directly calls the server pod, no sidecars are present.
    • none_both Istio proxy with no Istio specific filters configured.
    • v2-stats-wasm_both Client and server sidecars are present with telemetry v2 v8 configured.
    • v2-stats-nullvm_both Client and server sidecars are present with telemetry v2 configured by default.
    • v2-sd-full-nullvm_both Export Stackdriver metrics, access logs and edges with telemetry v2 nullvm configured.
    • v2-sd-nologging-nullvm_both Same as above, but does not export access logs.
    • - a constant throughput load testing tool.
    • - a realistic cloud native application.
    • - a synthetic application with configurable topology.