Skip to main content

Configure and measure realtime workers performance on OpenShift

Depending on your workloads, you may want to have workers with realtime kernel in your cluster.
How to configure and how to enroll the nodes into an Openshift cluster is not on the scope of this article, we will assume that you have the worker node up and running with realtime kernel.
Once you have it, how to properly schedule workloads that make use of it?

CPU manager and kubelet static policy - https://docs.openshift.com/container-platform/4.2/scalability_and_performance/using-cpu-manager.html

CPU Manager manages groups of CPUs and constrains workloads to specific CPUs. This is useful for several cases, in our case for low-latency applications. In order to enable CPU manager following steps are needed:
  • Label a node with cpu manager:
    # oc label node perf-node.example.com cpumanager=true
  • Edit the machineconfigpool worker, and add a label to reference a custom kubelet:
    # oc edit machineconfigpool worker
    metadata:
      creationTimestamp: 2019-xx-xxx
      generation: 3
      labels:
        custom-kubelet: cpumanager-enabled
     
  • Create this custom KubeletConfig custom resource, to enable the static policy:
    apiVersion: machineconfiguration.openshift.io/v1
    kind: KubeletConfig
    metadata:
      name: cpumanager-enabled
    spec:
      machineConfigPoolSelector:
        matchLabels:
          custom-kubelet: cpumanager-enabled
      kubeletConfig:
         cpuManagerPolicy: static
         cpuManagerReconcilePeriod: 5s
         systemReserved:
          cpu: 1
          memory: 512Mi
         kubeReserved:
          cpu: 2
          memory: 512Mi
    # oc create -f cpumanager-kubeletconfig.yaml
  •  This adds the CPU manager feature to the Kubelet config. Machine config pool will apply that, and the affected nodes will be rebooted after the config change. After node has been rebooted, you can check that the policy has been correctly applied:
    # oc debug node/perf-node.example.com
    sh-4.4# cat /host/etc/kubernetes/kubelet.conf | grep cpuManager
    cpuManagerPolicy: static        
    cpuManagerReconcilePeriod: 5s   
  •  Now, in order to make use of it, a pod with cpu and memory settings for limits and requests need to be created. This will create a pod with guaranteed resources:
    # cat cpumanager-pod.yaml
    apiVersion: v1
    kind: Pod
    metadata:
      generateName: cpumanager-
    spec:
      containers:
      - name: cpumanager
        image: gcr.io/google_containers/pause-amd64:3.0
        resources:
          requests:
            cpu: 1
            memory: "1G"
          limits:
            cpu: 1
            memory: "1G"
      nodeSelector:
        cpumanager: "true"
    # oc create -f cpumanager-pod.yaml
    # oc describe pod cpumanager
    Name:               cpumanager-6cqz7
    Namespace:          default
    Priority:           0
    PriorityClassName:  <none>
    Node:  perf-node.example.com/xxx.xx.xx.xxx
    ...
     Limits:
          cpu:     1
          memory:  1G
        Requests:
          cpu:        1
          memory:     1G
    ...
    QoS Class:       Guaranteed
    Node-Selectors:  cpumanager=true
  •  See that the pod has been executed with Guaranteed class. Other pods that have been scheduled with class Burstable cannot run on the cores allocated for Guaranteed pod.

How to measure realtime

Once the  kubelet has been correctly configured, it is time to measure the realtime performance. This can be achieved with 2 steps:
  1. Put the system under stress, to execute some tests under heavy load
  2. Run cyclictest on a guaranteed pod inside the Openshift cluster, and collect results there

Put the system under stress

This can be achieved by running pods on free cores. Imagine that we have a worker node with 36 cores. We will leave 1 core left for system, 2 cores left for kube. So we have 33 cores left, we will leave 4 cores for cyclictest. So we need to stress the system on the 29 cores left. You need to do your own calculations for your system:


apiVersion: apps/v1
kind: Deployment
metadata:
  name: stress 
spec:
  replicas: 29
  selector:
    matchLabels:
      name: stress 
  template:
    metadata:
      labels:
        name: stress 
    spec:
      containers:
      - name: stress 
        image: "docker.io/cscojianzhan/stress"
        resources:
          limits:
            memory: "200Mi"
            cpu: "1"
          requests:
            memory: "200Mi"
            cpu: "1"
      nodeSelector:
        node-role.kubernetes.io/worker-rt: ""
# oc apply -f stress_pod.yaml 

Measure with cyclictest

This measure needs to be executed on a pod with guaranteed class. So we will run a guaranteed pod with 4 cpus , and we will collect the results of it:

apiVersion: v1 
kind: Pod 
metadata:
  name: cyclictest 
spec:
  restartPolicy: Never 
  containers:
  - name: cyclictest
    image:  docker.io/cscojianzhan/cyclictest
    resources:
      limits:
        cpu: 4
        memory: "400Mi"
      requests:
        cpu: 4
        memory: "400Mi"
    env:
    - name: DURATION
      value: "10m"
    securityContext:
      capabilities:
        add:
          - SYS_NICE
          - SYS_RAWIO
          - IPC_LOCK
    volumeMounts:
    - mountPath: /tmp
      name: results-volume
    - mountPath: /dev/cpu_dma_latency
      name: cstate
  nodeSelector:
    node-role.kubernetes.io/worker-rt: ""
  volumes:
  - name: results-volume
    hostPath:
      path: /tmp
  - name: cstate
    hostPath:
      path: /dev/cpu_dma_latency

 See that we have a pod with limits and requests of 4 cpus, and 400M of memory. This guarantees that the pod is allocated with those results. See also the capabilities needed to schedule the pod.
cyclictest is executed during 10 minutes, according to the DURATION env var. This can be modified, it could be over 24h for consistent results.
The final results are publicated on /tmp directory of the pod (see results-volume). This is mapped with hostPath to a directory inside the worker itself. To see the output of the log, you can enter into the worker node and examine /tmp/cyclictest* files there. There you can check the latencies (max, min, average) and also if there have been some histogram overflows. The long period that you run the test, the most accurate results you will find.
To get more information look at: https://wiki.linuxfoundation.org/realtime/documentation/howto/tools/cyclictest/start?s[]=cyclictest#cyclictest

Comments

Popular posts from this blog

Setup an NFS client provisioner in Kubernetes

Setup an NFS client provisioner in Kubernetes One of the most common needs when deploying Kubernetes is the ability to use shared storage. While there are several options available, one of the most commons and easier to setup is to use an NFS server.
This post will explain how to setup a dynamic NFS client provisioner on Kubernetes, relying on an existing NFS server on your systems.
Step 1. Setup an NFS server (sample for CentOS) First thing you will need, of course, is to have an NFS server. This can be easily achieved with some easy steps:

Install nfs package: yum install -y nfs-utils Enable and start nfs service and rpcbind:
systemctl enable rpcbind
systemctl enable nfs-server
systemctl start rpcbind
systemctl start nfs-server
Create the directory that will be shared by NFS, and change the permissions:
mkdir /var/nfsshare
chmod -R 755 /var/nfsshare
chown nfsnobody:nfsnobody /var/nfsshare
 Share the NFS directory over the network, creating the /etc/exports file:
vi /etc/exports
/var/nfsshare …

Create and restore external backups of virtual machines with libvirt

A common need for deployments in production, is to have the possibility of taking backups of your working virtual machines, and export them to some external storage.
Although libvirt offers the possibility of taking snapshots and restore them, those snapshots are intended to be managed locally, and are lost when you destroy your virtual machines.
There may be the need to just trash all your environment, and re-create the virtual machines from an external backup, so this article offers a procedure to achieve it.
First step, create an external snapshot So the first step will be taking an snapshot from your running vm. The best way to take an isolated backup is using blockcopy virsh command. So, how to proceed?

1. First you need to extract all the disks that your vm has. This can be achieved with domblklist command:
DISK_NAME=$(virsh domblklist {{domain}} --details | grep 'disk' | awk '{print $3}')

This will extract the name of the device that the vm is using (vda, hda, et…

How to deploy TripleO Queens without external network

TripleO Queens has an interesting feature that is called 'composable networks'. It allows to deploy Openstack with the choice of networks that you want, depending on your environment. Please see: https://docs.openstack.org/tripleo-docs/latest/install/advanced_deployment/custom_networks.html

By default, the following networks are defined:
StorageStorage ManagementInternal ApiTenantManagementExternal The external network allows to reach the endpoints externally, and also to define networks to reach the vms externally as well. But to have that, it is needed to have a network with external access, routable, on your lab. Not all labs have it, specially for CI environments, so it may be useful to deploy without it, and just have internal access to endpoints and vms. In this blogpost i'm just going to explain how to achieve it.

First make a copy of your original tripleo-heat-templates, to another directory /home/stack/working-templates, and edit the following files:
network_data.…