Configuring concurrency
For per-revision concurrency, you must configure both and autoscaling.knative.dev/target
for a soft limit, or containerConcurrency
for a .
For global concurrency, you can set the container-concurrency-target-default
value.
It is possible to set either a soft or hard concurrency limit.
NOTE: If both a soft and a hard limit are specified, the smaller of the two values will be used. This prevents the Autoscaler from having a target value that is not permitted by the hard limit value.
The soft limit is a targeted limit rather than a strictly enforced bound. In some situations, particularly if there is a sudden burst of requests, this value can be exceeded.
The hard limit is an enforced upper bound. If concurrency reaches the hard limit, surplus requests will be buffered and must wait until enough capacity is free to execute the requests.
- Global key:
container-concurrency-target-default
- Per-revision annotation key:
autoscaling.knative.dev/target
- Possible values: An integer.
- Default:
"100"
Example:
apiVersion: v1
kind: ConfigMap
metadata:
name: config-autoscaler
namespace: knative-serving
data:
container-concurrency-target-default: "200"
apiVersion: operator.knative.dev/v1alpha1
kind: KnativeServing
metadata:
name: knative-serving
spec:
autoscaler:
container-concurrency-target-default: "200"
Hard limit
The hard limit is specified per Revision using the field on the Revision spec. This setting is not an annotation.
There is no global setting for the hard limit in the autoscaling ConfigMap, because containerConcurrency
has implications outside of autoscaling, such as on buffering and queuing of requests. However, a default value can be set for the Revision’s containerConcurrency
field in config-defaults.yaml
.
The default value is
0
, meaning that there is no limit on the number of requests that are allowed to flow into the revision.A value greater than
0
specifies the exact number of requests that are allowed to flow to the replica at any one time.Global key:
container-concurrency
(inconfig-defaults.yaml
)Possible values: integer
Default:
0
, meaning no limit- Global (Defaults ConfigMap)
Example:
apiVersion: v1
kind: ConfigMap
metadata:
name: config-defaults
namespace: knative-serving
data:
container-concurrency: "50"
apiVersion: operator.knative.dev/v1alpha1
kind: KnativeServing
metadata:
name: knative-serving
spec:
defaults:
container-concurrency: "50"
Target utilization
In addition to the literal settings explained previously, concurrency values can be further adjusted by using a target utilization value.
This value specifies what percentage of the previously specified target should actually be targeted by the Autoscaler. This is also known as specifying the hotness at which a replica runs, which causes the Autoscaler to scale up before the defined hard limit is reached.
- Global key:
container-concurrency-target-percentage
- Per-revision annotation key:
autoscaling.knative.dev/targetUtilizationPercentage
- Possible values: float
- Default:
70
Example:
apiVersion: v1
kind: ConfigMap
metadata:
name: config-autoscaler
namespace: knative-serving
data:
container-concurrency-target-percentage: "80"
apiVersion: operator.knative.dev/v1alpha1
kind: KnativeServing
metadata:
name: knative-serving
spec:
config:
autoscaler: