Baremetal Kubernetes with kubeadm in 10 minutes? Easy!

Photo by Toni Vuohelainen on Flickr
Refill!

A while ago, being on sick leave, I challenged myself with setting up a baremetal K8s cluster. I chose Kubeadm back then, and so now I am sharing the results.

Yeh, maybe you already have the following question popped up in your mind:

Well, you may want to consider a bare-metal cluster, because

๐Ÿ‘‰ you are setting up an internal infrastructure for your company, and using cloud providers is not an option,
๐Ÿ‘‰ you wish to receive close-to-production experience with you own pet K8s,
๐Ÿ‘‰ you are well-aware of Minikube, MicroK8s and others, but always wanted to have a real multi-node cluster out there,
๐Ÿ‘‰ you don't want to pay for a pre-configured AKS/GKE/EKS/whatever solution just yet,
๐Ÿ‘‰ you want to set everything up from scratch, at least once, out of curiosity,
๐Ÿ‘‰ you have free time :)

Why making another article while there is plenty of them already? Well because most of the material is obviously not for beginners. Sometimes the material feels fragmented, ripped ouf of the context of something bigger or simply not covering the entire process.

What you should understand just before we continue.

โ˜๏ธ It still won't be a production-ready environment for mature projects! ๐Ÿšจ๐Ÿšจ๐Ÿšจ
โ˜๏ธ However, with extra hours dedicated, the cluster can become production-ready.
โ˜๏ธ But even so, it will be good enough for hobby projects, proof of concepts, learning and exploration.
โ˜๏ธ You will not receive any GUI control panel, only raw kubectl.

Resonates with your needs? Then let's do it!

Nodes

Kubernetes is a cluster, it means that typically more than one machine is involved. I got for myself two instances of VDS (VPS), with 2 Cores and 4 GB of RAM each (those are the minimum requirements), Centos7 on-board. It was a cheap russian hosting with a data center in Moscow and unlimited bandwidth. It works for me just fine, but you can pick AWS's EC2, DigitalOcean or Vultr, or any other provider.

You may as well get two virtual machines with VirtualBox and Vagrant, but it will not give you the same experience.

  • X.X.X.X - IP address of Node A
  • Y.Y.Y.Y - IP address of Node B

Just as a part of pre-flight preparations, I run the following commands to make the whole process a bit easier.

On my machine:

$
printf "\nX.X.X.X k8s-master\nY.Y.Y.Y k8s-node01\n" | sudo tee -a /etc/hosts;
scp ~/.ssh/id_rsa.pub root@X.X.X.X:~
scp ~/.ssh/id_rsa.pub root@Y.Y.Y.Y:~

On each node under root:

#
yum update
# create a non-root user
useradd admin
# enable sudo
usermod -aG wheel admin
passwd admin
passwd -l admin
# enable key-based sign-in
cd /home/admin/
mkdir ./.ssh
chmod 711 ./.ssh
cat ~/id_rsa.pub >> ./.ssh/authorized_keys
chmod 600 ./.ssh/authorized_keys
chown -R admin:admin ./.ssh

You may also want to disable password-based SSH authentication, disable root login and switch SSH to a different port to ward off brute-forcers.

Now I can sign-in the nodes by typing ssh admin@k8s-node01 without password and able to run sudo.

Common setup

Kubernetes logically consists of two parts: Control plane and Compute.

  • Control plane operates the cluster, and it is hosted on so-called master node(s).
  • Compute runs containerized applications and is hosted on worker nodes.

Therefore, I choose Node A to be the master node, whereas Node B - worker node.

The following set of commands I need to run on each node:

โžก๏ธ Allow nodes to address each other by name:

$
printf "\nX.X.X.X master\nY.Y.Y.Y node01\n" | sudo tee -a /etc/hosts;

Make sure nodes can actually talk to each other:

$
ping master
ping node01

โžก๏ธ K8s does not play nicely with SELinux, switching it off:

$
sudo setenforce 0
sudo sed -i --follow-symlinks 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/sysconfig/selinux

โžก๏ธ K8s uses br_netfilter kernel module for it's internal networking. So I check if module even exists, load it, and enable on-boot load:

$
sudo modprobe br_netfilter
sudo echo "br_netfilter" | sudo tee -a /etc/modules-load.d/br_netfilter.conf

โžก๏ธ Tell the bridge module to use iptables:

$
echo '1' | sudo tee -a /proc/sys/net/bridge/bridge-nf-call-iptables
printf "\nnet.bridge.bridge-nf-call-iptables = 1\nnet.ipv4.ip_forward=1\n" | sudo tee -a /etc/sysctl.conf
sudo sysctl -p

โžก๏ธ Disable firewalld ๐Ÿ‘ป ๐Ÿ˜ฑ

Some out-of-the-box firewall rules are set in a way it paralyzes cluster networking. In production it totally makes sense to find what causes that and fix the issue. Since I am building a non-production environment, I don't bother investing time. So I just disable firewalld and by doing so I wipe out all firewall rules.

$
sudo systemctl stop firewalld
sudo systemctl disable firewalld
sudo systemctl mask --now firewalld
# making sure there is not a single rule left, and the policy is "ACCEPT" on every chain
sudo iptables -L -v -n

โžก๏ธ K8s requires the swap partition to be disabled for good:

$
sudo swapoff -a
sudo sed -i '/ swap / s/^\(.*\)$/#\1/g' /etc/fstab;

โžก๏ธ Install and configure Docker

$
sudo yum install -y yum-utils device-mapper-persistent-data lvm2
sudo yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
sudo yum install -y docker-ce
# changing docker cgroupdriver to "systemd"
sudo mkdir /etc/docker
sudo tee /etc/docker/daemon.json > /dev/null <<'EOF'
{
"exec-opts": ["native.cgroupdriver=systemd"],
"log-driver": "json-file",
"log-opts": {
"max-size": "100m"
},
"storage-driver": "overlay2",
"storage-opts": [
"overlay2.override_kernel_check=true"
]
}
EOF

โžก๏ธ Install kubelet, kubectl and kubeadm

$
sudo tee /etc/yum.repos.d/kubernetes.repo > /dev/null <<'EOF'
[kubernetes]
name=Kubernetes
baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg
https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg
EOF
sudo yum install -y kubelet kubeadm kubectl

โžก๏ธ Launch Docker daemon

$
sudo systemctl start docker
sudo systemctl enable docker

โžก๏ธ Launch Kubelet daemon

Kubeled daemon in charge of doing all K8s-related stuff on this particular node.

$
sudo systemctl start kubelet
sudo systemctl enable kubelet

Allright, hopefully no errors there, and we can proceed to more neat stuff.

Master-specific setup

The commands below should be executed on the master node only.

โžก๏ธ Let kubeadm do all the heavy lifting:

$
sudo kubeadm init --apiserver-advertise-address=X.X.X.X --pod-network-cidr=10.244.0.0/16 --apiserver-cert-extra-sans=localhost,127.0.0.1

If everything went well, I just need to do a few more extra steps.

โžก๏ธ Create a configuration file for the current user (admin)

$
mkdir -p ${HOME}/.kube
sudo cp -i /etc/kubernetes/admin.conf ${HOME}/.kube/config
sudo chown $(id -u):$(id -g) ${HOME}/.kube/config

โžก๏ธ Deploy a pod network

So we have containers. A container is a containerized application written in any language (Node, Java, Go, ...) and running inside of the cluster. A pod is a group of containers, which share common resources. Pods are exposed to the outer world via services.

Yeh, sounds complicated. But this is what makes K8s quite abstracted and therefore unbiased and flexible.

So the pod network then is an implementation of K8s networking model: it enables pods talking to each other, to services, and via services - to the outer world.

There are numerous implementations out there, but I will stick to the kinda default one, which is called flannel. It is basic and proven to be working.

$
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml

Joining the nodes

Hooray! Technically nothing stops us from joining the nodes together now.

On the master node I create a new token for joining the cluster:

$
sudo kubeadm token create

This will get me the token of format xxxxxx.xxxxxxxxxxxxxxxx.

Worker-specific setup

The commands below should be executed on the worker node(s) only.

โžก๏ธ Add a worker node to the cluster

Member that token I made? I need that one to execute the join command on each worker node:

$
sudo kubeadm join X.X.X.X:6443 --token xxxxxx.xxxxxxxxxxxxxxxx --discovery-token-unsafe-skip-ca-verification

From this moment on, there will be no need to do something else on the worker node, so I may end that ssh session.

Checking the status of the cluster

I can now go back to the master node and check how the cluster is doing:

$
kubectl get nodes

If I see something like this...

NAME STATUS ROLES AGE VERSION
node01 Ready <none> 4h7m v1.19.3
master Ready master 45h v1.19.3

... that means: "Congratulations ๐ŸŽ‰๐ŸŽ‰๐ŸŽ‰ You and I just got our own K8s clusters up and running!"

Most of the tutorials usually end here, or they maybe show how to expose an Apache container via a NodePort-type service (make a container accessible on the worker node directly via a random port greater than 1023). This obviously is not commonly-used IRL, as it defeats one of the key advantages of K8s. So instead let's go a bit further and setup ingress.

โžก๏ธ Ingress

Basically, ingress tells the cluster ยซHey, for this given domain, path and protocol route all requests to the service A. For other domain, path and protocol - to the service B (and so on)ยป. So ingress works as a router for the incoming requests.

Sounds quite powerful, doesn't it? I am gonna use ingress-nginx, the default kubernetes ingress, as it is pretty simple and easily configurable:

$
kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v0.40.2/deploy/static/provider/baremetal/deploy.yaml

Let's check how ingress is doing:

$
kubectl get pods -n ingress-nginx

In a few minutes, the status should look somewhat like this:

NAME READY STATUS RESTARTS AGE
ingress-nginx-admission-create-7dhcj 0/1 Completed 0 22m
ingress-nginx-admission-patch-kcsvt 0/1 Completed 0 22m
ingress-nginx-controller-785557f9c9-7ldcl 1/1 Running 0 22m

โžก๏ธ Load balancer

What ingress does not know, obviously, is how exactly the cluster is connected to the outer world. This task is solved by a load balancer that typically works outside of the cluster. Every cloud provider implements it's own one, and their load balancers vary from one to another. Since I don't have any cloud provider here, I need a substitution.

So there is a project called Metallb, which I am going to leverage for this matter. Please find which version is the latest. Also, sometimes they do backward-incompatible installation instruction updates, so if something does not work, feel free to check their docs out!

$
kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.9.4/manifests/namespace.yaml
kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.9.4/manifests/metallb.yaml
kubectl create secret generic -n metallb-system memberlist --from-literal=secretkey="$(openssl rand -base64 128)"

I also need to provide a config file, otherwise the balancer stays dormant.

$
tee /tmp/metallb-conf.yml > /dev/null <<'EOF'
apiVersion: v1
kind: ConfigMap
metadata:
namespace: metallb-system
name: config
data:
config: |
address-pools:
- name: default
protocol: layer2
addresses:
- Y.Y.Y.Y
EOF
kubectl apply -f /tmp/metallb-conf.yml;
rm /tmp/metallb-conf.yml;

To see how the balancer is doing:

$
kubectl get pods -n metallb-system

It should look like this:

NAME READY STATUS RESTARTS AGE
controller-8687cdc65-h6d5m 1/1 Running 0 14s
speaker-b4p95 1/1 Running 0 14s
speaker-xhm72 1/1 Running 0 14s

Okay, so this is pretty it. I have set the cluster up, and it is working. In the next article (which is coming soon) I will deploy an infrastructure using Terraform, with actual domains and SSL support! Stay tuned.

Troubleshooting

๐Ÿ‘‰๐Ÿ› When joining a new node, the process hangs at the "[preflight] Running pre-flight checks" stage forever.

Endless process or timeout error sometimes lie in the I/O plane. Please make sure that nodes can ping each other by IP and the master node has only kubelet-produced rules in iptables.

๐Ÿ‘‰๐Ÿ› Pods are stuck at "ContainerCreating" step

This problem may have many causes. In general, to see the events of a pod and where it stuck, a describe pods command is used:

$
kubectl describe pods -n ingress-nginx

If something goes totally pear-shaped, you may consider running the following piece of commands on each node to revert the cluster to pre kubectl init (join) state:

$
sudo kubeadm reset -f
sudo rm -rf /etc/cni/net.d
sudo iptables -P INPUT ACCEPT
sudo iptables -P FORWARD ACCEPT
sudo iptables -P OUTPUT ACCEPT
sudo iptables -t nat -F
sudo iptables -t mangle -F
sudo iptables -F
sudo iptables -X
sudo systemctl restart docker
sudo systemctl restart kubelet
rm -rf ${HOME}/.kube

Useful links

Sergei Gannochenko

Business-oriented fullstack engineer, in โค๏ธ with Tech.
JS / JS stack: React, Node, Docker, AWS.
15+ years in dev.