Autoscale Sample App - Go

    1. A Kubernetes cluster with Knative Serving) installed.
    2. The load generator installed (go get -u github.com/rakyll/hey).
    3. Clone this repository, and move into the sample directory:

    Deploy the Service

    1. Deploy the sample Knative Service:

      1. kubectl apply -f docs/serving/autoscaling/autoscale-go/service.yaml
    2. Obtain the URL of the service (once Ready):

      1. $ kubectl get ksvc autoscale-go
      2. NAME URL LATESTCREATED LATESTREADY READY REASON
      3. autoscale-go http://autoscale-go.default.1.2.3.4.sslip.io autoscale-go-96dtk autoscale-go-96dtk True
    1. Make a request to the autoscale app to see it consume some resources.

      1. curl "http://autoscale-go.default.1.2.3.4.sslip.io?sleep=100&prime=10000&bloat=5"
      1. Allocated 5 Mb of memory.
      2. The largest prime less than 10000 is 9973.
      3. Slept for 100.13 milliseconds.
    2. Send 30 seconds of traffic maintaining 50 in-flight requests.

      1. Summary:
      2. Total: 30.3379 secs
      3. Slowest: 0.7433 secs
      4. Fastest: 0.1672 secs
      5. Average: 0.2778 secs
      6. Requests/sec: 178.7861
      7. Total data: 542038 bytes
      8. Size/request: 99 bytes
      9. Response time histogram:
      10. 0.167 [1] |
      11. 0.282 [1303] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■
      12. 0.340 [1894] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
      13. 0.398 [471] |■■■■■■■■■■
      14. 0.455 [159] |■■■
      15. 0.513 [68] |■
      16. 0.570 [18] |
      17. 0.628 [14] |
      18. 0.686 [21] |
      19. 0.743 [13] |
      20. 10% in 0.1805 secs
      21. 25% in 0.2197 secs
      22. 50% in 0.2801 secs
      23. 75% in 0.3129 secs
      24. 90% in 0.3596 secs
      25. 95% in 0.4020 secs
      26. 99% in 0.5457 secs
      27. Details (average, fastest, slowest):
      28. DNS+dialup: 0.0007 secs, 0.1672 secs, 0.7433 secs
      29. DNS-lookup: 0.0000 secs, 0.0000 secs, 0.0000 secs
      30. req write: 0.0001 secs, 0.0000 secs, 0.0045 secs
      31. resp wait: 0.2766 secs, 0.1669 secs, 0.6633 secs
      32. resp read: 0.0002 secs, 0.0000 secs, 0.0065 secs
      33. Status code distribution:
      34. [200] 5424 responses
      1. NAME READY STATUS RESTARTS AGE
      2. autoscale-go-00001-deployment-78cdc67bf4-2w4sk 3/3 Running 0 26s
      3. autoscale-go-00001-deployment-78cdc67bf4-dd2zb 3/3 Running 0 24s
      4. autoscale-go-00001-deployment-78cdc67bf4-pg55p 3/3 Running 0 18s
      5. autoscale-go-00001-deployment-78cdc67bf4-q8bf9 3/3 Running 0 1m
      6. autoscale-go-00001-deployment-78cdc67bf4-thjbq 3/3 Running 0 26s

    Analysis

    Panic

    The autoscaler calculates average concurrency over a 60 second window so it takes a minute for the system to stablize at the desired level of concurrency. However the autoscaler also calculates a 6 second panic window and will enter panic mode if that window reached 2x the target concurrency. In panic mode the autoscaler operates on the shorter, more sensitive panic window. Once the panic conditions are no longer met for 60 seconds, the autoscaler will return to the initial 60 second stable window.

    1. |
    2. Panic Target---> +--| 20
    3. | |
    4. | <------Panic Window
    5. Stable Target---> +-------------------------|--| 10 CONCURRENCY
    6. | | |
    7. | | |
    8. --------------------------+-------------------------+--+ 0
    9. 120 60 0
    10. TIME

    Customization

    The autoscaler supports customization through annotations. There are two autoscaler classes built into Knative:

    1. kpa.autoscaling.knative.dev which is the concurrency-based autoscaler described earlier (the default), and
    2. hpa.autoscaling.knative.dev which delegates to the Kubernetes HPA which autoscales on CPU usage.

    Example of a Service scaled on CPU:

    1. apiVersion: serving.knative.dev/v1
    2. kind: Service
    3. metadata:
    4. name: autoscale-go
    5. namespace: default
    6. spec:
    7. template:
    8. metadata:
    9. annotations:
    10. # Standard Kubernetes CPU-based autoscaling.
    11. autoscaling.knative.dev/class: hpa.autoscaling.knative.dev
    12. autoscaling.knative.dev/metric: cpu
    13. spec:
    14. containers:
    15. - image: gcr.io/knative-samples/autoscale-go:0.1

    Additionally the autoscaler targets and scaling bounds can be specified in annotations. Example of a Service with custom targets and scale bounds:

    Note

    Demo

    View the of Knative autoscaler customization (32 minutes).

    Other Experiments

    1. Send 60 seconds of traffic maintaining 100 concurrent requests.

      1. hey -z 60s -c 100 \
      2. "http://autoscale-go.default.1.2.3.4.sslip.io?sleep=100&prime=10000&bloat=5"
    2. Send 60 seconds of traffic maintaining 100 qps with short requests (10 ms).

      1. hey -z 60s -q 100 \
      2. "http://autoscale-go.default.1.2.3.4.sslip.io?sleep=10"
    3. Send 60 seconds of traffic maintaining 100 qps with long requests (1 sec).

      1. hey -z 60s -q 100 \
      2. "http://autoscale-go.default.1.2.3.4.sslip.io?sleep=1000"
    4. Send 60 seconds of traffic with heavy CPU usage (~1 cpu/sec/request, total 100 cpus).

      1. hey -z 60s -q 100 \
      2. "http://autoscale-go.default.1.2.3.4.sslip.io?prime=40000000"
    1. kubectl delete -f docs/serving/autoscaling/autoscale-go/service.yaml

    Further reading

    Autoscaling Developer Documentation