Configuring concurrency

For per-revision concurrency, you must configure both and autoscaling.knative.dev/target for a soft limit, or containerConcurrency for a .

For global concurrency, you can set the container-concurrency-target-default value.

It is possible to set either a soft or hard concurrency limit.

NOTE: If both a soft and a hard limit are specified, the smaller of the two values will be used. This prevents the Autoscaler from having a target value that is not permitted by the hard limit value.

The soft limit is a targeted limit rather than a strictly enforced bound. In some situations, particularly if there is a sudden burst of requests, this value can be exceeded.

The hard limit is an enforced upper bound. If concurrency reaches the hard limit, surplus requests will be buffered and must wait until enough capacity is free to execute the requests.

Global key: container-concurrency-target-default
Per-revision annotation key: autoscaling.knative.dev/target
Possible values: An integer.
Default: "100"

Example:

Per Revision
Global (Operator)

apiVersion: v1
kind: ConfigMap
metadata:
 name: config-autoscaler
 namespace: knative-serving
data:
 container-concurrency-target-default: "200"

apiVersion: operator.knative.dev/v1alpha1
kind: KnativeServing
metadata:
  name: knative-serving
spec:
    autoscaler:
      container-concurrency-target-default: "200"

Hard limit

The hard limit is specified per Revision using the field on the Revision spec. This setting is not an annotation.

There is no global setting for the hard limit in the autoscaling ConfigMap, because containerConcurrency has implications outside of autoscaling, such as on buffering and queuing of requests. However, a default value can be set for the Revision’s containerConcurrency field in config-defaults.yaml.

The default value is 0, meaning that there is no limit on the number of requests that are allowed to flow into the revision.
A value greater than 0 specifies the exact number of requests that are allowed to flow to the replica at any one time.
Global key: container-concurrency (in config-defaults.yaml)
Possible values: integer
Default: 0, meaning no limit
Global (Defaults ConfigMap)

Example:

apiVersion: v1
kind: ConfigMap
metadata:
 name: config-defaults
 namespace: knative-serving
data:
 container-concurrency: "50"

apiVersion: operator.knative.dev/v1alpha1
kind: KnativeServing
metadata:
  name: knative-serving
spec:
    defaults:
      container-concurrency: "50"

Target utilization

In addition to the literal settings explained previously, concurrency values can be further adjusted by using a target utilization value.

This value specifies what percentage of the previously specified target should actually be targeted by the Autoscaler. This is also known as specifying the hotness at which a replica runs, which causes the Autoscaler to scale up before the defined hard limit is reached.

Global key: container-concurrency-target-percentage
Per-revision annotation key: autoscaling.knative.dev/targetUtilizationPercentage
Possible values: float
Default: 70

Example:

Global (ConfigMap)

apiVersion: v1
kind: ConfigMap
metadata:
 name: config-autoscaler
 namespace: knative-serving
data:
 container-concurrency-target-percentage: "80"

apiVersion: operator.knative.dev/v1alpha1
kind: KnativeServing
metadata:
  name: knative-serving
spec:
  config:
    autoscaler: