Using CPU Manager

    CPU Manager is useful for workloads that have some of these attributes:

    • Require as much CPU time as possible.

    • Are sensitive to processor cache misses.

    • Are low-latency network applications.

    • Coordinate with other processes and benefit from sharing a single processor cache.

    Setting up CPU Manager

    1. Optionally, label a node:

    2. Enable CPU manager support on the target node:

      For example:

      1. # oc edit cm node-config-compute -n openshift-node
      2. ...
      3. kubeletArguments:
      4. ...
      5. feature-gates:
      6. - CPUManager=true
      7. cpu-manager-policy:
      8. - static
      9. cpu-manager-reconcile-period:
      10. - 5s
      11. kube-reserved: (1)
      12. - cpu=500m
      13. # systemctl restart atomic-openshift-node
    3. Create a pod that requests a core or multiple cores. Both limits and requests must have their CPU value set to a whole integer. That is the number of cores that will be dedicated to this pod:

      1. # cat cpumanager.yaml
      2. apiVersion: v1
      3. kind: Pod
      4. metadata:
      5. generateName: cpumanager-
      6. spec:
      7. containers:
      8. - name: cpumanager
      9. image: gcr.io/google_containers/pause-amd64:3.0
      10. requests:
      11. cpu: 1
      12. memory: "1G"
      13. limits:
      14. cpu: 1
      15. memory: "1G"
      16. cpumanager: "true"
    4. Create the pod:

      1. # oc describe pod cpumanager
      2. Name: cpumanager-4gdtn
      3. Namespace: test
      4. Node: perf-node.example.com/172.31.62.105
      5. ...
      6. Limits:
      7. cpu: 1
      8. memory: 1G
      9. Requests:
      10. cpu: 1
      11. memory: 1G
      12. ...
      13. QoS Class: Guaranteed
      14. Node-Selectors: cpumanager=true
      15. region=primary
    5. Verify that the cgroups are set up correctly. Get the PID of the pause process:

      1. # systemd-cgls -l
      2. ├─1 /usr/lib/systemd/systemd --system --deserialize 20
      3. ├─kubepods.slice
      4. ├─kubepods-pod0ec1ab8b_e1c4_11e7_bb22_027b30990a24.slice
      5. ├─docker-b24e29bc4021064057f941dc5f3538595c317d294f2c8e448b5e61a29c026d1c.scope
      6. └─44216 /pause

      Pods of QoS tier Guaranteed are placed within the kubepods.slice. Pods of other QoS tiers end up in child cgroups of kubepods.

      1. # cd /sys/fs/cgroup/cpuset/kubepods.slice/kubepods-pod0ec1ab8b_e1c4_11e7_bb22_027b30990a24.slice/docker-b24e29bc4021064057f941dc5f3538595c317d294f2c8e448b5e61a29c026d1c.scope
      2. cpuset.cpus 2
      3. tasks 44216
    6. Check the allowed CPU list for the task:

    7. Verify that another pod (in this case, the pod in the QoS tier) on the system can not run on the core allocated for the Guaranteed pod:

      1. # cat /sys/fs/cgroup/cpuset/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podbe76ff22_dead_11e7_b99e_027b30990a24.slice/docker-da621bea7569704fc39f84385a179923309ab9d832f6360cccbff102e73f9557.scope/cpuset.cpus
      2. 0-1,3
      1. # oc describe node perf-node.example.com
      2. ...
      3. Capacity:
      4. cpu: 4
      5. memory: 16266720Ki
      6. pods: 40
      7. Allocatable:
      8. cpu: 3500m
      9. memory: 16164320Ki
      10. pods: 40
      11. ---
      12. Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits
      13. --------- ---- ------------ ---------- --------------- -------------
      14. test cpumanager-4gdtn 1 (28%) 1 (28%) 1G (6%) 1G (6%)
      15. test cpumanager-hczts 1 (28%) 1 (28%) 1G (6%) 1G (6%)
      16. test cpumanager-r9wrq 1 (28%) 1 (28%) 1G (6%) 1G (6%)
      17. ...
      18. Allocated resources:
      19. (Total limits may be over 100 percent, i.e., overcommitted.)
      20. CPU Requests CPU Limits Memory Requests Memory Limits
      21. ------------ ---------- --------------- -------------
      22. 3 (85%) 3 (85%) 5437500k (32%) 9250M (55%)

      This VM has four CPU cores. You set kube-reserved to 500 millicores, meaning half of one core is subtracted from the total capacity of the node to arrive at the Node Allocatable amount.

      If you try to schedule a fourth pod, the system will accept the pod, but it will never be scheduled:

      1. # oc get pods --all-namespaces |grep test
      2. test cpumanager-4gdtn 1/1 Running 0 8m
      3. test cpumanager-hczts 1/1 Running 0 8m
      4. test cpumanager-r9wrq 1/1 Running 0 8m