Troubleshooting Pod Evictions in Kubernetes

Pods vanishing in your cluster, what triggers evictions, and how to stop them ?!

If you've ever found yourself puzzled by pods suddenly disappearing from your Kubernetes cluster, you're not alone. One common but often misunderstood reason is pod eviction. It usually comes with a vague message like “Evicted,” leaving developers scrambling to determine what went wrong.

In this post, we’ll break down what pod evictions are, why they happen, how to investigate them, and what you can do to prevent them from disrupting your workloads.

What Is a Pod Eviction?

In Kubernetes, a pod eviction is when the control plane instructs the kubelet to terminate a pod. This typically occurs when a node is under pressure, whether it's low on memory, disk, or CPU, or when a higher-priority pod needs to be scheduled.

You'll often notice an evicted pod in a Failed state with a reason like this:

status:
  phase: Failed
  reason: Evicted
  message: "The node had condition: [MemoryPressure]"

Kubernetes tells you this message: “I had to free up resources, and this pod was chosen.”

Common Causes & Handling

1. Resource Pressure (Memory, Disk, or PIDs)

One of the most common reasons Kubernetes evicts a pod is because the node it's running on is low on critical resources. MemoryPressure, DiskPressure, and PIDPressure are all signals that the node is struggling.

How to identify it:

  • Run kubectl describe pod to see the eviction reason.

  • Run kubectl describe node to check the node conditions.

  • Use kubectl get events --sort-by=.metadata.creationTimestamp to view eviction-related events.

How to resolve it:

  • Adjust the pod’s resource requests and limits to reflect actual usage.

  • Scale out the workload across more nodes to balance the load.

  • Clean up disk usage, particularly if your applications are writing large logs or temporary files.

  • Use monitoring tools like Prometheus or Datadog to track node-level resource usage in real-time.

2. Node Failures or Unavailability

If a node goes into a NotReady state or becomes unreachable, Kubernetes will eventually evict all pods on that node and try to reschedule them elsewhere.

How to identify it:

  • Run kubectl get nodes to check node statuses.

  • Use journalctl -u kubelet on the node to check kubelet logs for issues.

  • Look for events mentioning NodeNotReady or NodeUnreachable.

How to resolve it:

  • Investigate the root cause of the node failure was it a cloud provider outage, a hardware issue, or a kubelet crash?

  • Make use of node pools and auto-repair mechanisms where available (e.g., in GKE, EKS).

  • Implement taints and tolerations to better manage pod placement during node issues.

3. Priority-Based Preemption

Kubernetes supports priority classes, and when resources are tight, lower-priority pods may be preempted (evicted) to make room for higher-priority ones.

How to identify it:

  • Use kubectl describe pod and look for preemption messages.

  • The message may reference another pod that caused the eviction.

How to resolve it:

  • Review and adjust your use of priorityClassName in deployments.

  • If a pod should never be preempted, consider using preemptionPolicy: Never.

  • Allocate more resources to your cluster or add node pools for high-priority workloads.

4. Taints and Tolerations

Taints are a way to mark a node as having a special condition, and only pods with the appropriate toleration can be scheduled there. If a taint is applied to a node and running pods don’t tolerate it, they’ll be evicted.

How to identify it:

  • Run kubectl describe node and look for taints.

  • Check whether the evicted pods had tolerations for those taints.

How to resolve it:

  • Either remove unnecessary taints from nodes or add the appropriate tolerations to your pod spec:

    tolerations:
    - key: "example-key"
      operator: "Equal"
      value: "example-value"
      effect: "NoSchedule"
    

5. Manual or System-Initiated Evictions

Pods can also be evicted as part of cluster operations like autoscaling, node draining, or via policies enforced by tools like the Kubernetes descheduler.

How to identify it:

  • Review recent events with kubectl get events.

  • Check for eviction events triggered by autoscalers, node drains, or policies.

How to resolve it:

  • Use PodDisruptionBudgets to control how many pods can be evicted at a time.

  • Add nodeSelector or affinity rules to guide scheduling behavior more precisely.

  • Monitor and tune autoscaler configurations if it’s scaling down too aggressively.

Summarised Commands for Troubleshooting

Here are some commands you’ll find useful when investigating evictions:

# See recent eviction events
kubectl get events --field-selector reason=Evicted

# List all evicted pods across namespaces
kubectl get pods --all-namespaces --field-selector=status.phase=Failed

# Check node status
kubectl describe node <node-name>

# View kubelet logs (on the node itself)
journalctl -u kubelet

Best Practices to Prevent Evictions

While evictions aren’t always avoidable, here are a few best practices to reduce their frequency and impact:

  • Always set realistic resource requests and limits for your pods.

  • Use Horizontal or Vertical Pod Autoscalers to adapt to changing workloads.

  • Implement PodDisruptionBudgets for critical or stateful applications.

  • Use PriorityClasses thoughtfully and understand the implications of preemption.

  • Continuously monitor cluster health and node capacity.

What is OOM Kills?

It’s important to note that Out-of-Memory (OOM) kills are not the same as evictions, but they can be equally disruptive. These occur when a container exceeds its memory limit, leading the kernel to terminate it.

You can check for OOMKilled containers using:

kubectl get pod <pod-name> -o jsonpath='{.status.containerStatuses[*].lastState.terminated.reason}'

To avoid this, ensure your memory limits reflect real-world usage and consider enabling the Vertical Pod Autoscaler.

Conclusion

Pod evictions are Kubernetes’ way of protecting the overall health of the cluster. But as a developer or operator, it's essential to understand the underlying causes so you can prevent them from affecting your applications. With the right monitoring, resource planning, and scheduling strategies, you can significantly reduce the chances of unexpected pod evictions.

Ready to unlock the secrets of Kubernetes API Server ?

Your Gateway to Mastering Kubernetes Architecture!

EzyInfra.dev – Expert DevOps & Infrastructure consulting! We help you set up, optimize, and manage cloud (AWS, GCP) and Kubernetes infrastructure—efficiently and cost-effectively. Need a strategy? Get a free consultation now!

Share this post

K8s Got You Stuck? We’ve got you covered!

We design, deploy, and optimize K8s so you don’t have to. Let’s talk!
Loading...