Autoscale Sample App - Go
A Kubernetes cluster with Knative Serving installed.
A for viewing scaling graphs (optional).
The load generator installed (
go get -u github.com/rakyll/hey
).Clone this repository, and move into the sample directory:
Deploy the Service
Deploy the Knative Service:
kubectl apply --filename docs/serving/samples/autoscale-go/service.yaml
Obtain the URL of the service (once
Ready
):$ kubectl get ksvc autoscale-go
NAME URL LATESTCREATED LATESTREADY READY REASON
autoscale-go http://autoscale-go.default.1.2.3.4.xip.io autoscale-go-96dtk autoscale-go-96dtk True
Make a request to the autoscale app to see it consume some resources.
curl "http://autoscale-go.default.1.2.3.4.xip.io?sleep=100&prime=10000&bloat=5"
Allocated 5 Mb of memory.
The largest prime less than 10000 is 9973.
Slept for 100.13 milliseconds.
Send 30 seconds of traffic maintaining 50 in-flight requests.
hey -z 30s -c 50 \
"http://autoscale-go.default.1.2.3.4.xip.io?sleep=100&prime=10000&bloat=5" \
&& kubectl get pods
NAME READY STATUS RESTARTS AGE
autoscale-go-00001-deployment-78cdc67bf4-2w4sk 3/3 Running 0 26s
autoscale-go-00001-deployment-78cdc67bf4-pg55p 3/3 Running 0 18s
autoscale-go-00001-deployment-78cdc67bf4-q8bf9 3/3 Running 0 1m
autoscale-go-00001-deployment-78cdc67bf4-thjbq 3/3 Running 0 26s
Analysis
Panic
The autoscaler calculates average concurrency over a 60 second window so it takes a minute for the system to stablize at the desired level of concurrency. However the autoscaler also calculates a 6 second panic
window and will enter panic mode if that window reached 2x the target concurrency. In panic mode the autoscaler operates on the shorter, more sensitive panic window. Once the panic conditions are no longer met for 60 seconds, the autoscaler will return to the initial 60 second stable
window.
|
Panic Target---> +--| 20
| |
| <------Panic Window
| |
| | |
| <-----------Stable Window
| | |
--------------------------+-------------------------+--+ 0
120 60 0
TIME
Customization
The autoscaler supports customization through annotations. There are two autoscaler classes built into Knative:
kpa.autoscaling.knative.dev
which is the concurrency-based autoscaler described above (the default), andhpa.autoscaling.knative.dev
which delegates to the Kubernetes HPA which autoscales on CPU usage.Example of a Service scaled on CPU:
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: autoscale-go
namespace: default
spec:
template:
metadata:
annotations:
# Standard Kubernetes CPU-based autoscaling.
autoscaling.knative.dev/class: hpa.autoscaling.knative.dev
autoscaling.knative.dev/metric: cpu
spec:
containers:
- image: gcr.io/knative-samples/autoscale-go:0.1
Additionally the autoscaler targets and scaling bounds can be specified in annotations. Example of a Service with custom targets and scale bounds:
apiVersion: serving.knative.dev/v1
metadata:
name: autoscale-go
spec:
template:
metadata:
annotations:
# Knative concurrency-based autoscaling (default).
autoscaling.knative.dev/class: kpa.autoscaling.knative.dev
autoscaling.knative.dev/metric: concurrency
# Target 10 requests in-flight per pod.
autoscaling.knative.dev/target: "10"
# Disable scale to zero with a minScale of 1.
autoscaling.knative.dev/minScale: "1"
# Limit scaling to 100 pods.
autoscaling.knative.dev/maxScale: "100"
spec:
containers:
- image: gcr.io/knative-samples/autoscale-go:0.1
Note: for an
hpa.autoscaling.knative.dev
class service, theautoscaling.knative.dev/target
specifies the CPU percentage target (default"80"
).
Demo
View the Kubecon Demo of Knative autoscaler customization (32 minutes).
kubectl port-forward --namespace knative-monitoring $(kubectl get pods --namespace knative-monitoring --selector=app=grafana --output=jsonpath="{.items..metadata.name}") 3000
Send 60 seconds of traffic maintaining 100 concurrent requests.
Send 60 seconds of traffic maintaining 100 qps with short requests (10 ms).
hey -z 60s -q 100 \
"http://autoscale-go.default.1.2.3.4.xip.io?sleep=10"
Send 60 seconds of traffic maintaining 100 qps with long requests (1 sec).
hey -z 60s -q 100 \
"http://autoscale-go.default.1.2.3.4.xip.io?sleep=1000"
Send 60 seconds of traffic with heavy CPU usage (~1 cpu/sec/request, total 100 cpus).
hey -z 60s -q 100 \
"http://autoscale-go.default.1.2.3.4.xip.io?prime=40000000"
Send 60 seconds of traffic with heavy memory usage (1 gb/request, total 5 gb).
hey -z 60s -c 5 \