Using CPU Manager

CPU Manager is useful for workloads that have some of these attributes:

Require as much CPU time as possible.
Are sensitive to processor cache misses.
Are low-latency network applications.
Coordinate with other processes and benefit from sharing a single processor cache.

Setting up CPU Manager

Optionally, label a node:

Enable CPU manager support on the target node:

For example:

# oc edit cm node-config-compute -n openshift-node
...
kubeletArguments:
...
  feature-gates:
  - CPUManager=true
  cpu-manager-policy:
  - static
  cpu-manager-reconcile-period:
  - 5s
  kube-reserved: (1)
  - cpu=500m
# systemctl restart atomic-openshift-node

Create a pod that requests a core or multiple cores. Both limits and requests must have their CPU value set to a whole integer. That is the number of cores that will be dedicated to this pod:

# cat cpumanager.yaml
apiVersion: v1
kind: Pod
metadata:
  generateName: cpumanager-
spec:
  containers:
  - name: cpumanager
    image: gcr.io/google_containers/pause-amd64:3.0
      requests:
        cpu: 1
        memory: "1G"
      limits:
        cpu: 1
        memory: "1G"
    cpumanager: "true"

Create the pod:

# oc describe pod cpumanager
Name:         cpumanager-4gdtn
Namespace:    test
Node:         perf-node.example.com/172.31.62.105
...
    Limits:
      cpu:     1
      memory:  1G
    Requests:
      cpu:        1
      memory:     1G
...
QoS Class:       Guaranteed
Node-Selectors:  cpumanager=true
                 region=primary

Verify that the cgroups are set up correctly. Get the PID of the pause process:

# systemd-cgls -l
├─1 /usr/lib/systemd/systemd --system --deserialize 20
├─kubepods.slice
│ ├─kubepods-pod0ec1ab8b_e1c4_11e7_bb22_027b30990a24.slice
│ │ ├─docker-b24e29bc4021064057f941dc5f3538595c317d294f2c8e448b5e61a29c026d1c.scope
│ │ │ └─44216 /pause

Pods of QoS tier Guaranteed are placed within the kubepods.slice. Pods of other QoS tiers end up in child cgroups of kubepods.

# cd /sys/fs/cgroup/cpuset/kubepods.slice/kubepods-pod0ec1ab8b_e1c4_11e7_bb22_027b30990a24.slice/docker-b24e29bc4021064057f941dc5f3538595c317d294f2c8e448b5e61a29c026d1c.scope
cpuset.cpus 2
tasks 44216

Check the allowed CPU list for the task:

Verify that another pod (in this case, the pod in the QoS tier) on the system can not run on the core allocated for the Guaranteed pod:

# cat /sys/fs/cgroup/cpuset/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podbe76ff22_dead_11e7_b99e_027b30990a24.slice/docker-da621bea7569704fc39f84385a179923309ab9d832f6360cccbff102e73f9557.scope/cpuset.cpus
0-1,3

# oc describe node perf-node.example.com
...
Capacity:
 cpu:     4
 memory:  16266720Ki
 pods:    40
Allocatable:
 cpu:     3500m
 memory:  16164320Ki
 pods:    40
---
  Namespace                  Name                      CPU Requests  CPU Limits  Memory Requests  Memory Limits
  ---------                  ----                      ------------  ----------  ---------------  -------------
  test                        cpumanager-4gdtn          1 (28%)       1 (28%)     1G (6%)          1G (6%)
  test                        cpumanager-hczts          1 (28%)       1 (28%)     1G (6%)          1G (6%)
  test                        cpumanager-r9wrq          1 (28%)       1 (28%)     1G (6%)          1G (6%)
...
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  CPU Requests  CPU Limits  Memory Requests  Memory Limits
  ------------  ----------  ---------------  -------------
  3 (85%)       3 (85%)     5437500k (32%)   9250M (55%)

This VM has four CPU cores. You set kube-reserved to 500 millicores, meaning half of one core is subtracted from the total capacity of the node to arrive at the Node Allocatable amount.

If you try to schedule a fourth pod, the system will accept the pod, but it will never be scheduled:

# oc get pods --all-namespaces |grep test
test              cpumanager-4gdtn               1/1       Running            0          8m
test              cpumanager-hczts               1/1       Running            0          8m
test              cpumanager-r9wrq               1/1       Running            0          8m