Kubernetes HPAv2 with OpenFaaS

    We'll use the hey tool which also features in the OpenFaaS workshop for auto-scaling.

    Download the latest release for your OS from this page and rename it to or hey.exe:

    Helm

    Some of the dependencies in this tutorial may rely on helm. The client-side CLI called helm is not insecure, but some people have concerns about using the server-side component named tiller. If you think that using helm's tiller component is insecure, then you should use the helm template command instead.

    Install OpenFaaS

    Use the Deployment guide or an existing installation, you should also install the CLI.

    HPAv2 relies on the Kubernetes metrics-server which can be installed using a helm chart.

    Most cloud providers are compatible with the metrics-server, but some are not, or require an additional "insecure" flag to be configured.

    You can see the for more options.

    When you see metrics appear from this command, continue with the tutorial.

    1. kubectl get --raw "/apis/metrics.k8s.io/v1beta1/nodes"
    2.  
    3. kubectl get --raw /metrics

    Check the logs of the metrics-server for any errors:

    1. kubectl logs deploy/metrics-server -n kube-system

    If you don't see any metrics, then you may be using a cloud which needs the "insecure" work-around.

    1. helm del --purge metrics-server
    2.  
    3. helm install --name metrics-server --namespace kube-system \
    4. stable/metrics-server \
    5. --set args="{--kubelet-insecure-tls,--kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname}"

    You will now get Pod metrics along with Node metrics, these take a while to propagate.

    1. kubectl top node
    2. NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
    3. nuc7 377m 4% 7955Mi 24%

    Find out the usage across Pods:

    1. # View the functions
    2. kubectl top pod -n openfaas-fn
    3. NAME CPU(cores) MEMORY(bytes)
    4. nodeinfo-6f48f9b548-gbtr4 2m 3Mi
    5.  
    6. # View the core services
    7. kubectl top pod -n openfaas
    8. NAME CPU(cores) MEMORY(bytes)
    9. alertmanager-666c65c694-k8q6h 2m 10Mi
    10. basic-auth-plugin-6d97c6dc5b-rbw29 1m 4Mi
    11. faas-idler-67f9dcd4fc-85rbt 1m 4Mi
    12. gateway-7c687d498f-nvjh8 4m 21Mi
    13. nats-d4c9d8d95-fxrc2 1m 6Mi
    14. prometheus-549c7d687d-zznmq 9m 41Mi
    15. queue-worker-544bcb7c67-72l72 1m 3Mi

    Disable auto-scaling with OpenFaaS

    You can either disable the OpenFaaS auto-scaling functionality completely, or just on a per-function basis

    Disable auto-scaling from OpenFaaS on a per function basis

    If you want to mix OpenFaaS auto-scaling and HPAv2, then add the additional label when deploying your functions:

      Or add the label to your stack.yml YAML file:

      Disable auto-scaling from OpenFaaS on all functions

      Disable auto-scaling by scaling alertmanager down to zero replicas, this will stop it from firing alerts.

      1. kubectl scale -n openfaas deploy/alertmanager --replicas=0

      Deploy your first function

      For HPAv2 to work, we need to deploy a function with a minimum request value for CPU.

      Let's create a stack.yml file:

      1. version: 1.0
      2. provider:
      3. name: openfaas
      4. gateway: http://127.0.0.1:31112
      5. functions:
      6. nodeinfo:
      7. image: functions/nodeinfo:latest
      8. skip_build: true
      9. requests:
      10. cpu: 10m
      1. faas-cli deploy

      Check that the CPU request was created on the Deployment:

      1. kubectl describe deploy/nodeinfo -n openfaas-fn | grep cpu
      2.  
      3. cpu: 10m

      Note: you can find the Docker image for other store functions use the following:

      1. faas-cli store list
      2.  
      3. faas-cli store inspect FUNCTION_NAME

      You can create a rule via YAML, or create one via the CLI:

      1. kubectl autoscale deployment -n openfaas-fn \
      2. nodeinfo \
      3. --cpu-percent=50 \
      4. --min=1 \
      5. --max=10
      6.  
      7. horizontalpodautoscaler.autoscaling/nodeinfo autoscaled
      • -n openfaas-fn refers to where the function is deployed
      • nodeinfo is the name of the function
      • —cpu-percentage is the level of CPU the Pod should reach before additional replicas are added
      • —min minimum number of Pods
      • maximum number of Pods

      You can use kubectl describe hpa/nodeinfo -n openfaas-fn to get detailed information including any events such as scaling up and down.

      1. kubectl describe hpa/nodeinfo -n openfaas-fn
      2.  
      3. Name: nodeinfo
      4. Namespace: openfaas-fn
      5. Labels: <none>
      6. Annotations: <none>
      7. CreationTimestamp: Sat, 10 Aug 2019 11:48:42 +0000
      8. Reference: Deployment/nodeinfo
      9. Metrics: ( current / target )
      10. resource cpu on pods (as a percentage of request): 20% (2m) / 50%
      11. Min replicas: 1
      12. Max replicas: 10
      13. Conditions:
      14. Type Status Reason Message
      15. ---- ------ ------ -------
      16. AbleToScale True ScaleDownStabilized recent recommendations were higher than current one, applying the highest recent recommendation
      17. ScalingActive True ValidMetricFound the HPA was able to successfully calculate a replica count from cpu resource utilization (percentage of request)
      18. ScalingLimited False DesiredWithinRange the desired count is within the acceptable range
      19. Events: <none>

      Generate some traffic

      Use the following command with hey

      1. export OPENFAAS_URL=http://127.0.0.1:31112
      2.  
      3. hey -c 5 \
      4. -z 5m \
      5. $OPENFAAS_URL/function/nodeinfo
      • -c will simulate 5 concurrent users
      • -z will run for 5m and then complete

      You should note that HPA is designed to react slowly to changes in traffic, both for scaling up and for scaling down. In some instances you may wait 20 minutes for all your Pods to scale back down to a normal level after the load has stopped.

      Now in a new window monitor the progress:

      1. kubectl describe hpa/nodeinfo -n openfaas

      Or in an automated fashion:

      1. # watch -n 5 "kubectl describe hpa/nodeinfo -n openfaas-fn"

      You can also monitor the replicas of your function in the OpenFaaS UI or via the CLI:

      1. watch -n 5 "faas-cli list"

      Here is an example of the replicas scaling up in response to the traffic created by hey:

      1. Name: nodeinfo
      2. Namespace: openfaas-fn
      3. Labels: <none>
      4. Annotations: <none>
      5. CreationTimestamp: Sat, 10 Aug 2019 11:48:42 +0000
      6. Reference: Deployment/nodeinfo
      7. Metrics: ( current / target )
      8. resource cpu on pods (as a percentage of request): 12325% (1232m) / 50%
      9. Min replicas: 1
      10. Max replicas: 10
      11. Deployment pods: 8 current / 10 desired
      12. Conditions:
      13. Type Status Reason Message
      14. ---- ------ ------ -------
      15. AbleToScale True SucceededRescale the HPA controller was able to update the target scale to 10
      16. ScalingActive True ValidMetricFound the HPA was able to successfully calculate a replica count from cpu resource utilization (percentage of request)
      17. ScalingLimited True TooManyReplicas the desired replica count is more than the maximum replica count
      18. Events:
      19. Type Reason Age From Message
      20. ---- ------ ---- ---- -------
      21. Normal SuccessfulRescale 18s horizontal-pod-autoscaler New size: 8; reason: cpu resource utilization (percentage of request) above target
      22. Normal SuccessfulRescale 3s horizontal-pod-autoscaler New size: 10; reason: cpu resource utilization (percentage of request) above target

      Note that whilst the scaling up was relatively quick, the scale-down may take significantly longer.

      In this tutorial we disabled the auto-scaling built into OpenFaaS which uses Prometheus and Alertmanager, and added in Kubernetes' own HPAv2 mechanism and its metrics-server.

      Notes and caveats

      • Additional resource usage by Kubernetes

      The additional services and workloads mentioned above do not come for free, and you may notice an increase in CPU and memory consumption across your cluster. Generally the auto-scaler in OpenFaaS is more efficient and light-weight.

      • Scale to zero

      You may not be able to scale your functions to zero if they are being managed by a HPAv2 rule with a minimum value of , this is because after they are scaled down, the HPA rule will override the setting.