Configure and measure realtime workers performance on OpenShift

Depending on your workloads, you may want to have workers with realtime kernel in your cluster.
How to configure and how to enroll the nodes into an Openshift cluster is not on the scope of this article, we will assume that you have the worker node up and running with realtime kernel.
Once you have it, how to properly schedule workloads that make use of it?

CPU manager and kubelet static policy - https://docs.openshift.com/container-platform/4.2/scalability_and_performance/using-cpu-manager.html

CPU Manager manages groups of CPUs and constrains workloads to specific CPUs. This is useful for several cases, in our case for low-latency applications. In order to enable CPU manager following steps are needed:

Label a node with cpu manager:

# oc label node perf-node.example.com cpumanager=true

Edit the machineconfigpool worker, and add a label to reference a custom kubelet:

# oc edit machineconfigpool worker
metadata:
  creationTimestamp: 2019-xx-xxx
  generation: 3
  labels:
    custom-kubelet: cpumanager-enabled

Create this custom KubeletConfig custom resource, to enable the static policy:

apiVersion: machineconfiguration.openshift.io/v1
kind: KubeletConfig
metadata:
  name: cpumanager-enabled
spec:
  machineConfigPoolSelector:
    matchLabels:
      custom-kubelet: cpumanager-enabled
  kubeletConfig:
     cpuManagerPolicy: static
     cpuManagerReconcilePeriod: 5s
     systemReserved:
      cpu: 1
      memory: 512Mi
     kubeReserved:
      cpu: 2
      memory: 512Mi
# oc create -f cpumanager-kubeletconfig.yaml

This adds the CPU manager feature to the Kubelet config. Machine config pool will apply that, and the affected nodes will be rebooted after the config change. After node has been rebooted, you can check that the policy has been correctly applied:
```
# oc debug node/perf-node.example.com
sh-4.4# cat /host/etc/kubernetes/kubelet.conf | grep cpuManager
cpuManagerPolicy: static        
cpuManagerReconcilePeriod: 5s   
```

Now, in order to make use of it, a pod with cpu and memory settings for limits and requests need to be created. This will create a pod with guaranteed resources:

# cat cpumanager-pod.yaml
apiVersion: v1
kind: Pod
metadata:
  generateName: cpumanager-
spec:
  containers:
  - name: cpumanager
    image: gcr.io/google_containers/pause-amd64:3.0
    resources:
      requests:
        cpu: 1
        memory: "1G"
      limits:
        cpu: 1
        memory: "1G"
  nodeSelector:
    cpumanager: "true"
# oc create -f cpumanager-pod.yaml
# oc describe pod cpumanager
Name:               cpumanager-6cqz7
Namespace:          default
Priority:           0
PriorityClassName:  <none>
Node:  perf-node.example.com/xxx.xx.xx.xxx
...
 Limits:
      cpu:     1
      memory:  1G
    Requests:
      cpu:        1
      memory:     1G
...
QoS Class:       Guaranteed
Node-Selectors:  cpumanager=true

See that the pod has been executed with Guaranteed class. Other pods that have been scheduled with class Burstable cannot run on the cores allocated for Guaranteed pod.

How to measure realtime

Once the kubelet has been correctly configured, it is time to measure the realtime performance. This can be achieved with 2 steps:

Put the system under stress, to execute some tests under heavy load
Run cyclictest on a guaranteed pod inside the Openshift cluster, and collect results there

Put the system under stress

This can be achieved by running pods on free cores. Imagine that we have a worker node with 36 cores. We will leave 1 core left for system, 2 cores left for kube. So we have 33 cores left, we will leave 4 cores for cyclictest. So we need to stress the system on the 29 cores left. You need to do your own calculations for your system:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: stress 
spec:
  replicas: 29
  selector:
    matchLabels:
      name: stress 
  template:
    metadata:
      labels:
        name: stress 
    spec:
      containers:
      - name: stress 
        image: "docker.io/cscojianzhan/stress"
        resources:
          limits:
            memory: "200Mi"
            cpu: "1"
          requests:
            memory: "200Mi"
            cpu: "1"
      nodeSelector:
        node-role.kubernetes.io/worker-rt: ""

# oc apply -f stress_pod.yaml

Measure with cyclictest

This measure needs to be executed on a pod with guaranteed class. So we will run a guaranteed pod with 4 cpus , and we will collect the results of it:

apiVersion: v1 
kind: Pod 
metadata:
  name: cyclictest 
spec:
  restartPolicy: Never 
  containers:
  - name: cyclictest
    image:  docker.io/cscojianzhan/cyclictest
    resources:
      limits:
        cpu: 4
        memory: "400Mi"
      requests:
        cpu: 4
        memory: "400Mi"
    env:
    - name: DURATION
      value: "10m"
    securityContext:
      capabilities:
        add:
          - SYS_NICE
          - SYS_RAWIO
          - IPC_LOCK
    volumeMounts:
    - mountPath: /tmp
      name: results-volume
    - mountPath: /dev/cpu_dma_latency
      name: cstate
  nodeSelector:
    node-role.kubernetes.io/worker-rt: ""
  volumes:
  - name: results-volume
    hostPath:
      path: /tmp
  - name: cstate
    hostPath:
      path: /dev/cpu_dma_latency

See that we have a pod with limits and requests of 4 cpus, and 400M of memory. This guarantees that the pod is allocated with those results. See also the capabilities needed to schedule the pod.
cyclictest is executed during 10 minutes, according to the DURATION env var. This can be modified, it could be over 24h for consistent results.
The final results are publicated on /tmp directory of the pod (see results-volume). This is mapped with hostPath to a directory inside the worker itself. To see the output of the log, you can enter into the worker node and examine /tmp/cyclictest* files there. There you can check the latencies (max, min, average) and also if there have been some histogram overflows. The long period that you run the test, the most accurate results you will find.
To get more information look at: https://wiki.linuxfoundation.org/realtime/documentation/howto/tools/cyclictest/start?s[]=cyclictest#cyclictest

Technology articles

Search This Blog