Have you seen your application get stuck or fail to respond to health check requests, and you can’t find any explanation? It might be because of the CPU quota limit. We will explain more here.
We would highly recommend removing CPU Limits in Kubernetes (or Disable CFS quota in Kublet) if you are using a kernel version with CFS quota bug unpatched. There is a serious, known CFS bug in the kernel that causes un-necessary throttling and stalls.
At Omio, we are 100% Kubernetes. All our stateful and stateless workloads run completely on Kubernetes (hosted using Google’s Kubernetes Engine). Since the last 6 months, we’ve been seeing random stalls. Applications stuck or failing to respond to health checks, broken network connections and so on. This sent us down a deep rabbit hole.
This article covers the following topics.
- A primer on containers & kubernetes
- How CPU request and limit is implemented
- How CPU limit works in multi-core environments
- How do you monitor CPU throttling
- How do you recover
Kubernetes (abbreviated as k8s) is pretty much a de-facto standard in the infrastructure world now. It is a container orchestrator.
In the past, we used to create artifacts such as Java JARs/WARs or Python Eggs or Executables, and throw them across the wall for someone to run them on servers. But to run them, there is more work — application runtime (Java/Python) has to be installed, appropriate files inappropriate places, specific OSes and so on. It takes a lot of configuration management, and is a frequent source of pain between developers and sysadmins. Containers change that. Here, the artifact is a Container image. Imagine it as a fat executable with not only your program, but also the complete runtime (Java/Python/…), necessary files and packages pre-installed & ready to run. This can be shipped and run on a variety of servers without any further customized installations needed.
Containers also run in their own sandboxed environment. They have their own virtual network adapter, their own restricted filesystem, their own process hierarchy, their own CPU and memory limits, etc. This is a kernel feature called namespaces.
Kubernetes is a Container orchestrator. You give it a pool of machines. Then you tell it: “Hey kubernetes — run 10 instances of my container image with 2 cpus and 3GB RAM each, and keep it running!”. Kubernetes orchestrates the rest. It will run them wherever it finds free CPU capacity, restart them if they are unhealthy, do a rolling update if we change the versions, and so on.
Essentially, Kubernetes abstracts away the concept of machines, and makes all of them a single deployment target.
OK, we understand what Containers and Kubernetes are. We also see that, multiple containers can fit inside the same machine.
This is like flat sharing. You take some big flats (machines/nodes) and share it with multiple, diverse tenants (containers). Kubernetes is our rental broker. But how does it keep all those tenants from squabbling with each other? What if one of them takes over the bathroom for half a day? ;)
This is where request and limit come into picture. CPU “Request” is just for scheduling purposes. It’s like the container’s wishlist, used mainly to find the best node suitable for it. Whereas CPU “Limit” is the rental contract. Once we find a node for the container, it absolutely cannot go over the limit.
And here is where the problem arises…
Kubernetes uses kernel throttling to implement CPU limit. If an application goes above the limit, it gets throttled (aka fewer CPU cycles). Memory requests and limits, on the other hand, are implemented differently, and it’s easier to detect. You only need to check if your pod’s last restart status is OOMKilled. But CPU throttling is not easy to identify, because k8s only exposes usage metrics and not cgroup related metrics.
For the sake of simplicity, let’s discuss how it organized in a four-core machine.
The k8s uses a cgroup to control the resource allocation(for both memory and CPU ). It has a hierarchy model and can only use the resource allocated to the parent. The details are stored in a virtual filesystem (
/sys/fs/cgroup). In the case of CPU it’s
The k8s uses
cpu.share file to allocate the CPU resources. In this case, the root cgroup inherits 4096 CPU shares, which are 100% of available CPU power(1 core = 1024; this is fixed value). The root cgroup allocate its share proportionally based on children’s
cpu.share and they do the same with their children and so on. In typical Kubernetes nodes, there are three cgroup under the root cgroup, namely
kubepods. The first two are used to allocate the resource for critical system workloads and non-k8s user space programs. The last one,
kubepods is created by k8s to allocate the resource to pods.
If you look at the above graph, you can see that first and second cgroups have 1024 share each, and the
kubepod has 4096. Now, you may be thinking that there is only 4096 CPU share available in the root, but the total of children’s shares exceeds that value (6144). The answer to this question is, this value is logical, and the Linux scheduler (CFS) uses this value to allocate the CPU proportionally. In this case, the first two cgroups get 680 (16.6% of 4096) each, and kubepod gets the remaining 2736. But in idle case, the first two cgroup would not be using all allocated resources. The scheduler has a mechanism to avoid the wastage of unused CPU shares. Scheduler releases the unused CPU to the global pool so that it can allocate to the cgroups that are demanding for more CPU power(it does in batches to avoid the accounting penalty). The same workflow will be applied to all grandchildren as well.
This mechanism will make sure that CPU power is shared fairly, and no one can steal the CPU from others.
Even though the k8s config for Limit and Requests looks similar, the implementation is entirely different; this is the most misguiding and less documented part.
The k8s uses CFS’s quota mechanism to implement the limit. The config for the limit is configured in two files
cpu.share) under the cgroup directory.
cpu.share, the quota is based on time period and not based on available CPU power.
cfs_period_us is used to define the time period, it’s always 100000us (100ms). k8s has an option to allow to change this value but still alpha and feature gated. The scheduler uses this time period to reset the used quota. The second file,
cfs_quota_us is used to denote the allowed quota in the quota period.
Please note that it also configured in
us unit. Quota can exceed the quota period. Which means you can configure quota more than 100ms.
Let’s discuss two scenarios on 16 core machines (Omio’s most common machine type).
Let’s say you have configured 2 core as CPU limit; the k8s will translate this to 200ms. That means the container can use a maximum of 200ms CPU time without getting throttled.
And here starts all misunderstanding. As I said above, the allowed quota is 200ms, which means if you are running ten parallel threads on 12 core machine (see the second figure) where all other pods are idle, your quota will exceed the limit in 20ms (i.e. 10 * 20ms = 200ms), and all threads running under that pod will get throttled for next 80ms (stop the world). To make the situation worse, the scheduler has a bug that is causing unnecessary throttling and prevents the container from reaching the allowed quota.
Just login to the pod and run
nr_periods— Total schedule period
nr_throttled— Total throttled period out of nr_periods
throttled_time— Total throttled time in ns
We end up with a high throttle rate on multiple applications — up to 50% more than what we assumed the limits were set for!
This cascades as various errors — Readiness probe failures, Container stalls, Network disconnections and timeouts within service calls — all in all leading to reduced latency and increased error rates.
Simple. We disabled CPU limits until the latest kernel with bugfix was deployed across all our clusters.
Immediately, we found a huge reduction in error rates (HTTP 5xx) of our services:
We said at the beginning of this article:
This is like flat sharing. Kubernetes is our rental broker. But how does it keep all those tenants from squabbling with each other? What if one of them takes over the bathroom for half a day? ;)
This is the catch. We risk some containers hogging up all CPUs in a machine. If you have a good application stack in place (e.g. proper JVM tuning, Go tuning, Node VM tuning) — then this is not a problem, you can live with this for a long time. But if you have applications that are either poorly optimized, or simply not optimized (
FROM java:latest) — then results can backfire. At Omio we have automated base Dockerfiles with sane defaults for our primary language stacks, so this was not an issue for us.
Please do monitor USE (Utilization, Saturation and Errors) metrics, API latencies and error rates, and make sure your results match expectations.
This was a wild ride and discovery. The following resources helped us a lot in understanding:
- https://k8s.af/ — search for cpu throttling.
- Kubernetes bug reports: https://github.com/kubernetes/kubernetes/issues/51135#issuecomment-373454012 | https://github.com/kubernetes/kubernetes/issues/67577 | https://gist.github.com/bobrik/2030ff040fad360327a5fab7a09c4ff1
Did you encounter similar issues or want to share your experiences with throttling in containerized production environments? Let us know in the Comments below!
If you liked this article and want to work on similar challenges at scale, why not consider joining forces :)
What is CPU throttling in Kubernetes? ›
But after we resize down the container (container CPU utilization is now 50%, still not high), the response time quadrupled!!! So what's going on here? CPU throttling occurs when you configure a CPU limit on a container, which can invertedly slow your applications response-time.Should you use CPU limits Kubernetes? ›
CPU limits and Throttling
Google, among others, highly recommends it. The danger of not setting a CPU limit is that containers running in the node could exhaust all CPU available. This can trigger a cascade of unwanted events such as having key Kubernetes processes (such as kubelet ) to become unresponsive.
Excessive resource use: If a maximum CPU limit is not specified, a container can use the full CPU capacity of the node. Excess resource usage will slow down other containers on the same node, and can even cause Kubernetes core components like kubectl, the control panel, and kube-proxy to become unresponsive.What is the default CPU limit Kubernetes? ›
The output shows that the Pod's only container has a CPU request of 500m cpu (which you can read as “500 millicpu”), and a CPU limit of 1 cpu . These are the default values specified by the LimitRange.What causes CPU power limit throttling? ›
The three common reasons for power limit throttling: Processor Power Limits PL1/PL2 is set too low in Intel® XTU. Core Voltage limit is set too low in XTU. System doesn't have sufficient cooling and power delivery.What causes CPU throttling? ›
Throttling is a mechanism in Intel® Processors to reduce the clock speed when the temperature in the system reaches above TJ Max (or Tcase). This is to protect the processor and to indicate to the user that there is an overheating issue in their system that they need to monitor.How can I improve my Kubernetes performance? ›
- Define resource limits. ...
- Use optimized, lightweight container images. ...
- Deploy Kubernetes clusters closer to your users.
According to the docs, CPU requests (and limits) are always fractions of available CPU cores on the node that the pod is scheduled on (with a resources. requests. cpu of "1" meaning reserving one CPU core exclusively for one pod). Fractions are allowed, so a CPU request of "0.5" will reserve half a CPU for one pod.How do I set CPU limit in Kubernetes? ›
To specify a CPU request for a container, include the resources:requests field in the Container resource manifest. To specify a CPU limit, include resources:limits . The args section of the configuration file provides arguments for the container when it starts.How do I fix CPU limit exceeded? ›
- Avoid Multiple automation on a Single Object. ...
- Trigger Framework. ...
- Avoid multiple Validation Rules. ...
- Using Map based query. ...
- Use Async Apex. ...
- Aggregate SOQL usage. ...
- Avoid Nested For loop. ...
- Avoid using process builder.
How do I check my CPU limit in Kubernetes? ›
- Step 1: Create a separate namespace. ...
- Step 2: Create a pod with one container and a resource request. ...
- Step 3: Create the pod. ...
- Step 4: View pod requests and limits.
If a Process maximum CPU percentage value less than 100 is specified, throttling is in effect. The Process maximum CPU percentage value sets a maximum limit for the agent's use of the processor over a 1-minute time interval.What is 500m CPU in Kubernetes? ›
CPU resource is always specified as an absolute amount of resource, never as a relative amount. For example, 500m CPU represents the roughly same amount of computing power whether that container runs on a single-core, dual-core, or 48-core machine.How do I know my CPU limit in containers? ›
You can check the field NanoCpus in docker inspect command. Specify how much of the available CPU resources a container can use. For instance, if the host machine has two CPUs and you set --cpus="1.5", the container is guaranteed at most one and a half of the CPUs.What is CPU request and limit in Kubernetes? ›
Requests and limits are the mechanisms Kubernetes uses to control resources such as CPU and memory. Requests are what the container is guaranteed to get. If a container requests a resource, Kubernetes will only schedule it on a node that can give it that resource.How do I know if my CPU is throttling? ›
Fortunately, it's extremely easy; Windows 10 includes a basic hardware monitoring tool that you can use to quickly identify whether your CPU is throttling: Press the Windows Key + R to bring up the Run command box. In the available field, type "perfmon.exe /res" and hit Enter.Should I turn off CPU throttling? ›
Can I turn off CPU throttling? While you can manually alter your CPU throttling, we wouldn't recommend trying to turn the process off altogether. Doing so could actually cause your CPU to overheat, which can damage it.What does power limit throttling mean? ›
So Power Throttle Limit means… your cpu is taking more power(TDP-Thermal Design Power) than proposed… Let me explain with a example. You have a intel cpu i5 9300h.Can throttling damage CPU? ›
Yes. Thermal Throttling is when your components reaches a high temperature, So your computer immediately adjusts this by lowering down the power. If you continue this over time your components might be prone to overheating immediately and eventually cause permanent damage.What is the biggest disadvantage of Kubernetes? ›
The transition to Kubernetes can become slow, complicated, and challenging to manage. Kubernetes has a steep learning curve. It is recommended to have an expert with a more in-depth knowledge of K8s on your team, and this could be expensive and hard to find.
What are you biggest challenges when running Kubernetes? ›
- Security. Security is one of Kubernetes' greatest challenges because of its complexity and vulnerability. ...
- Networking. Traditional networking approaches are not very compatible with Kubernetes. ...
- Interoperability. ...
- Storage. ...
Each node in your cluster must have at least 300 MiB of memory.What is 100m CPU in Kubernetes? ›
100m (milicores) = 0.1 core = 0.1 vCPU = 0.1 AWS vCPU = 0.1 GCP Core. For example, an Intel Core i7-6700 has four cores, but it has Hyperthreading which doubles what the system sees in terms of cores. So in essence, it will show up in Kubernetes as: 8000m = 8 cores = 8 vCPUs.Is 1% CPU usage normal? ›
If you see a background process with a name like Runtime Broker, Windows Session Manager, or Cortana at the top of the CPU column when you hit 100% CPU usage, then you have an issue. These Windows processes should only use a small amount of processing power or memory — 0% or 1% is typical.What is normal CPU level? ›
What CPU temperature is normal? A normal CPU temperature depends on which CPU you use. Generally, anything between 40–65°C (or 104–149°F) is considered a safe heat range for a normal workload. While running more intensive apps or games, the normal CPU temp range can increase to between 70–80°C (158–176°F).What is difference between limit and request in Kubernetes? ›
Kubernetes defines Limits as the maximum amount of a resource to be used by a container. This means that the container can never consume more than the memory amount or CPU amount indicated. Requests, on the other hand, are the minimum guaranteed amount of a resource that is reserved for a container.How do you check CPU and memory utilization in Kubernetes pod? ›
- $ minikube start.
- $ kubectl get pods -n Kube-system.
- $ kubectl top pod.
- $ kubectl top pod --namespace=kube-system.
You specify minimum and maximum CPU values in a LimitRange object. If a Pod does not meet the constraints imposed by the LimitRange, it cannot be created in the namespace.How do I set max CPU usage to 99 %? ›
One solution I found is to go into the battery/power settings and change the Maximum Processor State while plugged in from 100% to 99% (Minimum Processor State is untouched). This seems to help immensely, bringing my CPU temperatures down to a more reasonable 60-65 C.How can CPU usage exceeds 100? ›
Multiple instances of a service running on one server or in a multi-core environment can produce CPU usage percentages well above 100%. If you upgrade from a dual processor to a quad processor under the same architecture, you should see roughly the same CPU numbers for the same loads and applications.
How do I know if my Kubernetes are healthy? ›
- Run the command kubectl describe cluster . If the status is ready, it means that both the cluster infrastructure and the cluster control plane are ready. ...
- If the cluster is not ready, run the following command to determine what is wrong with the cluster infrastructure:
This means that any pod in the system will be able to consume as much CPU and memory on the node that executes the pod. Users may want to impose restrictions on the amount of resources a single pod in the system may consume for a variety of reasons. For example: Each node in the cluster has 2GB of memory.How do I check Kubernetes node CPU usage? ›
Get Node CPU usage and memory usage of each node – Kubectl
The Simple resource-capacity command with kubectl would return the CPU requests and limits and memory requests and limits of each Node available in the cluster. You can use the ‐‐sort cpu. limit flag to sort by the CPU limit.
CPU throttling refers to a technology called Dynamic Frequency Scaling and is a technique in which the processor limits the power to conserve battery and use less energy.Is CPU throttling good? ›
CPU throttling isn't bad per se, as it's simply the processor doing exactly what it should be under the circumstances. However, it does mean that your system won't be performing at its best, meaning productivity workloads could take longer to complete, and framerates will certainly be much lower in gaming.How do I fix CPU throttling? ›
- Disable Obvious Causes (Overclocking or “OC Modes”)
- Enable Low or Balanced Power Modes.
- Set An FPS Cap or Turn on V-Sync.
- Thoroughly Dust Your PC.
- Improve Case Cooling With Proper Case Fan Setup.
- Undervolt Your Graphics Card or Processor.
On Windows 10, Power Throttling is a feature designed to optimize battery life on mobile devices with virtually no drawbacks. As a result, adjusting these settings is not recommended unless you're troubleshooting performance issues with an application.What are the types of throttling? ›
- Capillary tube.
- Automatic throttling valve.
- Thermostatic expansion valve.
- Float valve.
CPUs and GPUs are delicate and usually have maximum specified operating temperatures around 70–80 degrees Celcius. Constantly operating them at that temperature will shorten their lifespan. They throttle at around 100 degrees because going any further may INSTANTLY damage them.