Inside Kubernetes [Part 2]: Raft Algorithm & Backup in ETCD Demystified

Inside Kubernetes [Part 2]: Raft Algorithm & Backup in ETCD Demystified

Brush the ETCD Basics

etcd is a distributed key-value store that helps store and access configuration data reliably across a cluster. It's mainly used for critical system configs, service discovery, and distributed coordination.

A typical etcd cluster consists of three, five, or seven nodes. It's commonly used in microservices architectures where containers need service registration and discovery—writing key-value pairs for registration and reading them for discovery.

Since etcd is a distributed system, it follows the CAP theorem and prioritizes Consistency and Partition Tolerance (CP) over Availability (A).

CAP THEOROM

  • etcd always ensures that only the latest committed data is read, preventing split-brain scenarios.

  • It needs persistent storage to keep data safe, even if nodes fail or restart. Losing etcd data can result in complete Kubernetes cluster failure.

  • Uses the Raft Consensus Algorithm to manage highly available replicated logs and ensure consistency.

Raft Consensus Algorithm

Consensus is a key challenge in fault-tolerant distributed systems. It ensures multiple servers agree on values, and once a decision is made, it’s final. Consensus protocols allow clusters to keep working as long as a majority of nodes are available. For example, in a 5-node cluster, etcd can still operate even if 2 nodes fail. If more than half fail, progress stops, but incorrect data is never returned.

For High Availability (HA), etcd clusters are always set up with an odd number of replicas (3, 5, 7, etc.) to maintain quorum.

A reliable consensus protocol must have these properties:

  • Validity: A value is only decided if it was proposed by a correct process.

  • Agreement: All correct nodes must agree on the same value.

  • Termination: The process must be completed in a finite number of steps.

  • Integrity: If all correct nodes decide on a value, any process should have that value.

Roles in ETCD

  • Leader – The elected leader interacts with clients and coordinates all operations. There can only be one leader at a time.

  • Follower – Followers sync their data with the leader at regular intervals. If the leader fails, a follower can step up and become a candidate for election.

  • Candidate – When an election is triggered, a follower becomes a candidate and requests votes from other nodes to become the leader. Initially, all servers start in this state.

Note: The leader is responsible for log replication and coordination across the cluster.

Raft Process

  1. Leader Election: The cluster starts with an election where nodes compete for leadership. If a leader isn't heard from within a timeout period, a new election is triggered. The node that gets a majority of votes becomes the new leader.

  2. Log Replication: The leader accepts client requests, logs them, and replicates the log entries to followers. Followers apply these updates to stay in sync.

  3. Commitment and Consistency: A log entry is considered committed once it's replicated to a majority of nodes, ensuring consistency and preventing data loss.

  4. Leader Failure and Recovery: If the leader fails, a new leader is elected using the same process, ensuring high availability and consistency in the cluster.

Raft Algorithm Workflow

Types of Failures in etcd

Failures happen all the time in big systems. These can be due to hardware issues, software crashes, network failures, or power outages. In etcd, different types of failures affect the cluster in different ways. Knowing these failures helps in keeping etcd running smoothly.

1. Minor Follower Failure

  • Happens when less than half of the follower nodes fail.

  • The cluster keeps working fine and can still accept requests.

  • No major issues as long as most nodes are still running.

2. Leader Failure

  • If the leader node fails, etcd needs to elect a new leader.

  • This election process takes some time (about an election timeout).

  • During this time:

    • The cluster cannot process writes.

    • Write requests are queued until a new leader is chosen.

    • Some uncommitted writes to the old leader may be lost.

    • However, committed writes are never lost.

  • The new leader extends all leases so nothing expires unexpectedly.

3. Majority Failure

  • Happens when more than half of the nodes fail.

  • The cluster stops working and cannot accept any writes.

  • Recovery is only possible if a majority of nodes come back online.

  • If the majority is gone permanently, disaster recovery (e.g., restoring from backup) is needed.

  • Once a majority is restored, a new leader is elected, and everything goes back to normal.

4. Network Partition

  • This happens when the cluster gets split into two groups:

    • One group has a majority of nodes (keeps working).

    • The other group has a minority of nodes (becomes unavailable).

  • etcd ensures there’s no split brain by letting only the majority group make changes.

  • If the leader is in the majority group, everything works as usual.

  • If the leader is in the minority group:

    • It steps down.

    • The majority group elects a new leader.

  • When the network is fixed, the minority group syncs up with the majority and resumes normal operation.

ETCD Backup

In Unix-like systems, the /etc directory holds system configuration data. Similarly, Kubernetes stores cluster configurations and state in etcd. The name etcd is inspired by /etc, with an added "d" for distributed.

To interact with etcd for backup and restore, we use the command-line tool: etcdctl.

By default, etcd data is stored in the directory "/var/lib/etcd" on Linux systems. 

The Write-Ahead Log (WAL) ensures data persistence by recording every change before applying it to the main database. Since WAL is critical for data durability, etcd usually stores it in /var/lib/etcd-wal.

Step-by-Step ETCD Backup Guide

  1. Create a temporary directory and download the ETCD binaries

# Adding temporary folder etcd
mkdir -p /tmp/etcd && cd /tmp/etcd

# Downloading Binaries
curl -s https://api.github.com/repos/etcd-io/etcd/releases/latest | grep browser_download_url | grep linux-amd64 | cut -d '"' -f 4 | wget -qi -

  1. Unzip the Compressed Binaries

    tar xvf *.tar.gz

  2. Move the etcd folder to /local/bin/ makes the etcd binary globally accessible on your system, simplifying the process of running etcd commands.

    cd etcd-*/
    mv etcd* /usr/local/bin/
    cd ~
    rm -rf /tmp/etcd


4. In Cluster we can check manifest default location with the help of the kubelet config file.

cat /var/lib/kubelet/config.yaml

With this Manifest location, you can check the Kubernetes static pods location and find the Api-server and ETCD pod location then under these pods you can check the certificate file and data-dir location.

How to backup the Etcd & Restore it

As the etcd server is the only stateful component of the K8s cluster. Kubernetes stores all API objects and settings on the etcd server. Backing up this storage is enough to restore the Kubernetes cluster’s state completely.

Taking Snapshots and Verifying it

Check the backup Command flag which you need to include in the command

ETCDCTL_API=3 etcdctl snapshot backup -h

Take a snapshot of the etcd datastore using etcdctl:

ETCDCTL_API=3 etcdctl snapshot save snapshot.db --cacert /etc/kubernetes/pki/etcd/ca.crt --cert /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key

Checkout if the snapshot was successful

ETCDCTL_API=3 etcdctl snapshot status --write-out=table snapshot.db

Important Note: If you are backing up and restoring the cluster do not run the status command after the backup this might temper the backup due to this restore process might fail.

Restoring Etcd From Snapshot & Verify:

Check the present state of the cluster which is stored in a present snapshot taken in the above task:

kubectl get all


To verify, we will now create a pod. Since the new pod is not present in the snapshot, it will not be available when we restore the content using the restore command.

kubectl run testing-restore --image=nginx
kubectl get pods

Check to restore the Command flag which you need to include in command

ETCDCTL_API=3 etcdctl snapshot restore -h

To restore we will have to first delete the present ETCD content. So let's look into and grab all the details we need for the restore command to execute

cat /etc/kubernetes/manifests/etcd.yaml

Will delete the present content of ETCD and execute the restore command

rm -rf /var/lib/etcd
ETCDCTL_API=3 etcdctl snapshot restore snapshot.db --endpoints=https://127.0.0.1:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key --name=kubeadm-master --data-dir=/var/lib/etcd --initial-cluster=kubeadmmaster=https://10.0.0.4:2380 --initial-cluster-token=etcd-cluster-1 --initial-advertise-peerurls=https://10.0.0.4:2380

Verify that the cluster is back to the status at which we had taken the snapshot

kubectl get pods

Here, you can verify that the ‘testing-restore’ pod is not present because it was not saved in the snapshot.db, The rest of the data saved in snapshot.db has been successfully restored.

Conclusion

ETCD ensures consistency, availability, and fault tolerance in distributed systems like Kubernetes. To keep it stable:

  • Maintain quorum availability

  • Take regular backups

  • Use SSDs for storage

  • Monitor cluster health

Following these best practices ensures a reliable and resilient etcd cluster.


EzyInfra.dev is a DevOps and Infrastructure consulting company helping clients in Setting up the Cloud Infrastructure (AWS, GCP), Cloud cost optimization, and manage Kubernetes-based infrastructure. If you have any requirements or want a free consultation for your Infrastructure or architecture, feel free to schedule a call here.

Share this post

K8s Got You Stuck? We’ve got you covered!

We design, deploy, and optimize K8s so you don’t have to. Let’s talk!
Loading...