Depending on your workloads, you may want to have workers with realtime kernel in your cluster.
How to configure and how to enroll the nodes into an Openshift cluster is not on the scope of this article, we will assume that you have the worker node up and running with realtime kernel.
Once you have it, how to properly schedule workloads that make use of it?
See that we have a pod with limits and requests of 4 cpus, and 400M of memory. This guarantees that the pod is allocated with those results. See also the capabilities needed to schedule the pod.
cyclictest is executed during 10 minutes, according to the DURATION env var. This can be modified, it could be over 24h for consistent results.
The final results are publicated on /tmp directory of the pod (see results-volume). This is mapped with hostPath to a directory inside the worker itself. To see the output of the log, you can enter into the worker node and examine /tmp/cyclictest* files there. There you can check the latencies (max, min, average) and also if there have been some histogram overflows. The long period that you run the test, the most accurate results you will find.
To get more information look at: https://wiki.linuxfoundation.org/realtime/documentation/howto/tools/cyclictest/start?s[]=cyclictest#cyclictest
How to configure and how to enroll the nodes into an Openshift cluster is not on the scope of this article, we will assume that you have the worker node up and running with realtime kernel.
Once you have it, how to properly schedule workloads that make use of it?
CPU manager and kubelet static policy - https://docs.openshift.com/container-platform/4.2/scalability_and_performance/using-cpu-manager.html
CPU Manager manages groups of CPUs and constrains workloads to specific CPUs. This is useful for several cases, in our case for low-latency applications. In order to enable CPU manager following steps are needed:- Label a node with cpu manager:
# oc label node perf-node.example.com cpumanager=true
- Edit the machineconfigpool worker, and add a label to reference a custom kubelet:
# oc edit machineconfigpool worker metadata: creationTimestamp: 2019-xx-xxx generation: 3 labels: custom-kubelet: cpumanager-enabled
- Create this custom KubeletConfig custom resource, to enable the static policy:
apiVersion: machineconfiguration.openshift.io/v1 kind: KubeletConfig metadata: name: cpumanager-enabled spec: machineConfigPoolSelector: matchLabels: custom-kubelet: cpumanager-enabled kubeletConfig: cpuManagerPolicy: static cpuManagerReconcilePeriod: 5s systemReserved: cpu: 1 memory: 512Mi kubeReserved: cpu: 2 memory: 512Mi # oc create -f cpumanager-kubeletconfig.yaml
- This adds the CPU manager feature to the Kubelet config. Machine config pool will apply that, and the affected nodes will be rebooted after the config change. After node has been rebooted, you can check that the policy has been correctly applied:
# oc debug node/perf-node.example.com sh-4.4# cat /host/etc/kubernetes/kubelet.conf | grep cpuManager cpuManagerPolicy: static cpuManagerReconcilePeriod: 5s
- Now, in order to make use of it, a pod with cpu and memory settings for limits and requests need to be created. This will create a pod with guaranteed resources:
# cat cpumanager-pod.yaml apiVersion: v1 kind: Pod metadata: generateName: cpumanager- spec: containers: - name: cpumanager image: gcr.io/google_containers/pause-amd64:3.0 resources: requests: cpu: 1 memory: "1G" limits: cpu: 1 memory: "1G" nodeSelector: cpumanager: "true" # oc create -f cpumanager-pod.yaml # oc describe pod cpumanager Name: cpumanager-6cqz7 Namespace: default Priority: 0 PriorityClassName: <none> Node: perf-node.example.com/xxx.xx.xx.xxx ... Limits: cpu: 1 memory: 1G Requests: cpu: 1 memory: 1G ... QoS Class: Guaranteed Node-Selectors: cpumanager=true
- See that the pod has been executed with Guaranteed class. Other pods that have been scheduled with class Burstable cannot run on the cores allocated for Guaranteed pod.
How to measure realtime
Once the kubelet has been correctly configured, it is time to measure the realtime performance. This can be achieved with 2 steps:- Put the system under stress, to execute some tests under heavy load
- Run cyclictest on a guaranteed pod inside the Openshift cluster, and collect results there
Put the system under stress
This can be achieved by running pods on free cores. Imagine that we have a worker node with 36 cores. We will leave 1 core left for system, 2 cores left for kube. So we have 33 cores left, we will leave 4 cores for cyclictest. So we need to stress the system on the 29 cores left. You need to do your own calculations for your system:apiVersion: apps/v1
kind: Deployment
metadata:
name: stress
spec:
replicas: 29
selector:
matchLabels:
name: stress
template:
metadata:
labels:
name: stress
spec:
containers:
- name: stress
image: "docker.io/cscojianzhan/stress"
resources:
limits:
memory: "200Mi"
cpu: "1"
requests:
memory: "200Mi"
cpu: "1"
nodeSelector:
node-role.kubernetes.io/worker-rt: ""
# oc apply -f stress_pod.yaml
Measure with cyclictest
This measure needs to be executed on a pod with guaranteed class. So we will run a guaranteed pod with 4 cpus , and we will collect the results of it:apiVersion: v1
kind: Pod
metadata:
name: cyclictest
spec:
restartPolicy: Never
containers:
- name: cyclictest
image: docker.io/cscojianzhan/cyclictest
resources:
limits:
cpu: 4
memory: "400Mi"
requests:
cpu: 4
memory: "400Mi"
env:
- name: DURATION
value: "10m"
securityContext:
capabilities:
add:
- SYS_NICE
- SYS_RAWIO
- IPC_LOCK
volumeMounts:
- mountPath: /tmp
name: results-volume
- mountPath: /dev/cpu_dma_latency
name: cstate
nodeSelector:
node-role.kubernetes.io/worker-rt: ""
volumes:
- name: results-volume
hostPath:
path: /tmp
- name: cstate
hostPath:
path: /dev/cpu_dma_latency
See that we have a pod with limits and requests of 4 cpus, and 400M of memory. This guarantees that the pod is allocated with those results. See also the capabilities needed to schedule the pod.
cyclictest is executed during 10 minutes, according to the DURATION env var. This can be modified, it could be over 24h for consistent results.
The final results are publicated on /tmp directory of the pod (see results-volume). This is mapped with hostPath to a directory inside the worker itself. To see the output of the log, you can enter into the worker node and examine /tmp/cyclictest* files there. There you can check the latencies (max, min, average) and also if there have been some histogram overflows. The long period that you run the test, the most accurate results you will find.
To get more information look at: https://wiki.linuxfoundation.org/realtime/documentation/howto/tools/cyclictest/start?s[]=cyclictest#cyclictest
Comments
Post a Comment