Switched to k3os and removed cilium

This commit is contained in:
Karim Naufal 2021-12-05 10:50:51 +01:00
parent 8113016f86
commit 8de7171cf5
7 changed files with 35 additions and 341 deletions

107
README.md
View File

@ -15,7 +15,7 @@
<h2 align="center">Kube-Hetzner</h2> <h2 align="center">Kube-Hetzner</h2>
<p align="center"> <p align="center">
A fully automated, optimized and auto-upgradable, HA-able, k3s cluster on <a href="https://hetzner.com" target="_blank">Hetzner Cloud</a> 🤑 A fully automated, highly optimized and auto-upgradable, HA-able, Kubernetes - k3s on k3os - cluster on <a href="https://hetzner.com" target="_blank">Hetzner Cloud</a> 🥳
</p> </p>
<hr /> <hr />
<br /> <br />
@ -25,35 +25,31 @@
![Product Name Screen Shot][product-screenshot] ![Product Name Screen Shot][product-screenshot]
[Hetzner Cloud](https://hetzner.com) is a good cloud provider that offers very affordable prices for cloud instances. The goal of this project was to create an optimal Kubernetes installation with it. We wanted functionality that was as close as possible to GKE's auto-pilot. [Hetzner Cloud](https://hetzner.com) is a good cloud provider that offers very affordable prices for cloud instances. The goal of this project was to create an optimal and highly optimized Kubernetes installation, that is easy maintained, secure, and automatically upgrades itself. We aimed for functionality that was as close as possible to GKE's auto-pilot.
Here's what is working at the moment: Here's what is working at the moment:
- Lightweight and resource-efficient Kubernetes with [k3s](https://github.com/k3s-io/k3s), and Fedora nodes to take advantage of the latest Linux kernels. - Lightweight and resource-efficient Kubernetes with [k3s](https://github.com/k3s-io/k3s).
- Optimal [Cilium](https://github.com/cilium/cilium) CNI with full BPF support, and Kube-proxy replacement. It uses the Hetzner private subnet underneath to communicate between the nodes, as for the tunneling we use Geneve by default, but native routing also works. - Powered by k3OS nodes to take advantage of an auto-upgragradable and hardened OS, especially designed to run k3s. That means that both the OS and your kube cluster will stay current and up-to-date.
- Automatic OS upgrades, supported by [kured](https://github.com/weaveworks/kured) that initiate a reboot of the node only when necessary and after having drained it properly.
- Automatic HA by setting the required number of servers and agents nodes. - Automatic HA by setting the required number of servers and agents nodes.
- Automatic k3s upgrade by using Rancher's [system-upgrade-controller](https://github.com/rancher/system-upgrade-controller) and tracking the latest 1.x stable branch.
- Optional [Nginx ingress controller](https://kubernetes.github.io/ingress-nginx/) that will automatically use Hetzner's private network to allocate a Hetzner load balancer. - Optional [Nginx ingress controller](https://kubernetes.github.io/ingress-nginx/) that will automatically use Hetzner's private network to allocate a Hetzner load balancer.
It uses Terraform to deploy as it's easy to use, and Hetzner provides a great [Hetzner Terraform Provider](https://registry.terraform.io/providers/hetznercloud/hcloud/latest/docs). _It uses Terraform to deploy as it's easy to use, and Hetzner provides a great [Hetzner Terraform Provider](https://registry.terraform.io/providers/hetznercloud/hcloud/latest/docs)._
<!-- GETTING STARTED --> <!-- GETTING STARTED -->
## Getting started ## Getting started
Follow those simple steps and your world cheapest Kube cluster will be up and running in no time. Follow those simple steps and your world cheapest and coolest Kube cluster will be up and running in no time.
### Prerequisites ### Prerequisites
First and foremost, you need to have a Hetzner Cloud account. You can sign up for free [here](https://hetzner.com/cloud/). First and foremost, you need to have a Hetzner Cloud account. You can sign up for free [here](https://hetzner.com/cloud/).
Then you'll need you have both the [terraform](https://learn.hashicorp.com/tutorials/terraform/install-cli) and [helm](https://helm.sh/docs/intro/install/), and [kubectl](https://kubernetes.io/docs/tasks/tools/) cli installed. The easiest way is to use the [gofish](https://gofi.sh/#install) package manager to install them. Then you'll need you have the [terraform](https://learn.hashicorp.com/tutorials/terraform/install-cli), [helm](https://helm.sh/docs/intro/install/), and [kubectl](https://kubernetes.io/docs/tasks/tools/) cli installed. The easiest way is to use the [gofish](https://gofi.sh/#install) package manager to install them.
```sh ```sh
gofish install terraform gofish install terraform && gofish install kubectl && gofish install helm
gofish install kubectl
gofish install helm
``` ```
### Creating terraform.tfvars ### Creating terraform.tfvars
@ -64,7 +60,7 @@ gofish install helm
### Customize other variables (Optional) ### Customize other variables (Optional)
The number of control plane nodes and worker nodes, and the Hetzner datacenter location, can be customized by adding the variables to your newly created terraform.tfvars file. The number of control plane nodes and worker nodes, the [Hetzner datacenter location](https://docs.hetzner.com/general/others/data-centers-and-connection/) (.i.e. ngb1, fsn1, hel1 ...etc.), and the [Hetzner server types](https://www.hetzner.com/cloud) (i.e. cpx31, cpx41 ...etc.) can be customized by adding the corresponding variables to your newly created terraform.tfvars file.
See the default values in the [variables.tf](variables.tf) file, they correspond to (you can copy-paste and customize): See the default values in the [variables.tf](variables.tf) file, they correspond to (you can copy-paste and customize):
@ -72,8 +68,8 @@ See the default values in the [variables.tf](variables.tf) file, they correspond
servers_num = 2 servers_num = 2
agents_num = 2 agents_num = 2
location = "fsn1" location = "fsn1"
agent_server_type = "cx21" agent_server_type = "cpx21"
control_plane_server_type = "cx11" control_plane_server_type = "cpx11"
``` ```
### Installation ### Installation
@ -100,7 +96,7 @@ To have a complete and useful setup, it is ideal to have an ingress controller r
```sh ```sh
helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm repo update helm repo update
helm install --values=manifests/helm/nginx/values.yaml ingress-nginx ingress-nginx/ingress-nginx -n kube-system helm install --values=manifests/helm/nginx/values.yaml ingress-nginx ingress-nginx/ingress-nginx -n kube-system --kubeconfig kubeconfig.yaml
``` ```
_Note that the default geographic location and instance type of the load balancer can be changed by editing the [values.yaml](manifests/helm/nginx/values.yaml) file._ _Note that the default geographic location and instance type of the load balancer can be changed by editing the [values.yaml](manifests/helm/nginx/values.yaml) file._
@ -129,83 +125,33 @@ hcloud network describe k3s-net
- Log into one of your nodes (replace the location of your private key if needed): - Log into one of your nodes (replace the location of your private key if needed):
```sh ```sh
ssh root@xxx.xxx.xxx.xxx -i ~/.ssh/id_ed25519 -o StrictHostKeyChecking=no ssh rancher@xxx.xxx.xxx.xxx -i ~/.ssh/id_ed25519 -o StrictHostKeyChecking=no
``` ```
### Cilium commands
- Check the status of cilium with the following commands (get the cilium pod name first and replace it in the command):
```sh
kubectl -n kube-system exec --stdin --tty cilium-xxxx -- cilium status
kubectl -n kube-system exec --stdin --tty cilium-xxxx -- cilium status --verbose
```
- Monitor cluster traffic with:
```sh
kubectl -n kube-system exec --stdin --tty cilium-xxxx -- cilium monitor
```
- See the list of kube services with:
```sh
kubectl -n kube-system exec --stdin --tty cilium-xxxx -- cilium service list
```
_For more cilium commands, please refer to their corresponding [Documentation](https://docs.cilium.io/en/latest/cheatsheet)._
### Automatic upgrade ### Automatic upgrade
The nodes and k3s versions are configured to self-upgrade unless you turn that feature off. By default, k3os and its embedded k3s instance get upgraded automatically on each node in an HA and non-disruptive way, thanks to its embedded system upgrade controller. If you wish to turn that feature off, please remove the following label `k3os.io/upgrade=latest` with the following command:
- To turn OS upgrade off, log in to each node and issue:
```sh ```sh
systemctl disable --now dnf-automatic.timer kubectl label node <nodename> 'k3os.io/upgrade'- --kubeconfig kubeconfig.yaml
```
- To turn off k3s upgrade, use kubectl to set the k3s_upgrade label to false for each node (replace the node-name in the command):
```sh
kubectl label node node-name k3s_upgrade=false
``` ```
### Individual components upgrade ### Individual components upgrade
To upgrade individual components, you can use the following commands: To upgrade individual components, you can use the following commands:
- Hetzner CCM - Hetzner CCM and CSI
```sh ```sh
kubectl apply -f https://raw.githubusercontent.com/mysticaltech/kube-hetzner/master/manifests/hcloud-ccm-net.yaml kubectl apply -f https://raw.githubusercontent.com/mysticaltech/kube-hetzner/master/manifests/hcloud-ccm-net.yaml --kubeconfig kubeconfig.yaml
kubectl apply -f https://raw.githubusercontent.com/hetznercloud/csi-driver/master/deploy/kubernetes/hcloud-csi.yml --kubeconfig kubeconfig.yaml
``` ```
- Hetzner CSI - (Optional, if installed) Nginx ingress controller
```sh
kubectl apply -f https://raw.githubusercontent.com/hetznercloud/csi-driver/master/deploy/kubernetes/hcloud-csi.yml
```
- Rancher's system upgrade controller
```sh
kubectl apply -f https://raw.githubusercontent.com/rancher/system-upgrade-controller/master/manifests/system-upgrade-controller.yaml
```
- Kured (used to reboot the nodes after upgrading and draining them)
```sh
latest=$(curl -s https://api.github.com/repos/weaveworks/kured/releases | jq -r '.[0].tag_name')
kubectl apply -f https://github.com/weaveworks/kured/releases/download/$latest/kured-$latest-dockerhub.yaml
```
- Cilium and the Nginx ingress controller
```sh ```sh
helm repo update helm repo update
helm upgrade --values=manifests/helm/cilium/values.yaml cilium cilium/cilium -n kube-system helm upgrade --values=manifests/helm/nginx/values.yaml ingress-nginx ingress-nginx/ingress-nginx -n kube-system --kubeconfig kubeconfig.yaml
helm upgrade --values=manifests/helm/nginx/values.yaml ingress-nginx ingress-nginx/ingress-nginx -n kube-system
``` ```
## Takedown ## Takedown
@ -213,21 +159,17 @@ helm upgrade --values=manifests/helm/nginx/values.yaml ingress-nginx ingress-ngi
If you chose to install the Nginx ingress controller, you need to delete it first to release the load balancer, as follows: If you chose to install the Nginx ingress controller, you need to delete it first to release the load balancer, as follows:
```sh ```sh
helm delete ingress-nginx -n kube-system helm delete ingress-nginx -n kube-system --kubeconfig kubeconfig.yaml
``` ```
Then you can proceed to taking down the rest of the cluster with: Then you can proceed to taking down the rest of the cluster with:
```sh ```sh
kubectl delete -f https://raw.githubusercontent.com/mysticaltech/kube-hetzner/master/manifests/hcloud-ccm-net.yaml --kubeconfig kubeconfig.yaml
kubectl delete -f https://raw.githubusercontent.com/hetznercloud/csi-driver/master/deploy/kubernetes/hcloud-csi.yml --kubeconfig kubeconfig.yaml
terraform destroy -auto-approve terraform destroy -auto-approve
``` ```
Sometimes, the Hetzner network is still in use and refused to be deleted via terraform, in that case you can force delete it with:
```sh
hcloud network delete k3s-net
```
Also, if you had a full blown cluster in use, it's best do delete the whole project in your Hetzner account directly, as there may be other ressources created via operators that are not part of this project. Also, if you had a full blown cluster in use, it's best do delete the whole project in your Hetzner account directly, as there may be other ressources created via operators that are not part of this project.
<!-- ROADMAP --> <!-- ROADMAP -->
@ -268,8 +210,7 @@ Project Link: [https://github.com/mysticaltech/kube-hetzner](https://github.com/
- [k-andy](https://github.com/StarpTech/k-andy) was the starting point for this project. It wouldn't have been possible without it. - [k-andy](https://github.com/StarpTech/k-andy) was the starting point for this project. It wouldn't have been possible without it.
- [Best-README-Template](https://github.com/othneildrew/Best-README-Template) that made writing this readme a lot easier. - [Best-README-Template](https://github.com/othneildrew/Best-README-Template) that made writing this readme a lot easier.
<!-- MARKDOWN LINKS & IMAGES --> - [k3os-hetzner])(https://github.com/hughobrien/k3os-hetzner) was the inspiration for the k3os installation method.
<!-- https://www.markdownguide.org/basic-syntax/#reference-style-links -->
[contributors-shield]: https://img.shields.io/github/contributors/mysticaltech/kube-hetzner.svg?style=for-the-badge [contributors-shield]: https://img.shields.io/github/contributors/mysticaltech/kube-hetzner.svg?style=for-the-badge
[contributors-url]: https://github.com/mysticaltech/kube-hetzner/graphs/contributors [contributors-url]: https://github.com/mysticaltech/kube-hetzner/graphs/contributors

View File

@ -1,117 +0,0 @@
upgradeCompatibility: '1.10'
debug:
# -- Enable debug logging
enabled: false
# verbose:
# gke:
# enabled: true
ipam:
# -- Configure IP Address Management mode.
# ref: https://docs.cilium.io/en/stable/concepts/networking/ipam/
mode: kubernetes
# -- Configure the encapsulation configuration for communication between nodes.
# Possible values:
# - disabled (native routing works, however I feel that geneve is more stable, but I may be wrong)
# - vxlan
# - geneve
tunnel: geneve
# -- Specify the IPv4 CIDR for native routing (ie to avoid IP masquerade for).
# This value corresponds to the configured cluster-cidr.
nativeRoutingCIDR: 10.0.0.0/8
# When enabled, causes legacy routing
# endpointRoutes:
# -- Enable use of per endpoint routes instead of routing via
# the cilium_host interface.
# enabled: false
# -- Enable installation of PodCIDR routes between worker
# nodes if worker nodes share a common L2 network segment.
autoDirectNodeRoutes: false
bpf:
# -- Allow cluster external access to ClusterIP services.
lbExternalClusterIP: false
# -- Enable native IP masquerade support in eBPF
masquerade: true
endpointHealthChecking:
# -- Enable connectivity health checking between virtual endpoints.
enabled: true
# -- Configure ClusterIP service handling in the host namespace (the node).
hostServices:
# -- Enable host reachable services.
enabled: true
# -- Supported list of protocols to apply ClusterIP translation to.
protocols: tcp,udp
externalIPs:
# -- Enable ExternalIPs service support.
enabled: true
hostPort:
# -- Enable hostPort service support.
enabled: true
# -- Configure N-S k8s service loadbalancing
nodePort:
# -- Enable the Cilium NodePort service implementation.
enabled: true
# -- Enable connectivity health checking.
healthChecking: true
ipv4:
# -- Enable IPv4 support.
enabled: true
ipv6:
# -- Enable IPv6 support.
enabled: false
# -- Configure Kubernetes specific configuration
k8s:
# -- requireIPv4PodCIDR enables waiting for Kubernetes to provide the PodCIDR
# range via the Kubernetes node resource
requireIPv4PodCIDR: true
# -- Configure the kube-proxy replacement in Cilium BPF datapath
# Valid options are "disabled", "probe", "partial", "strict".
# ref: https://docs.cilium.io/en/stable/gettingstarted/kubeproxy-free/
kubeProxyReplacement: strict
# -- Enables masquerading of IPv4 traffic leaving the node from endpoints.
enableIPv4Masquerade: true
monitor:
# -- Enable the cilium-monitor sidecar.
enabled: false
# -- Configure service load balancing
loadBalancer:
# -- standalone enables the standalone L4LB which does not connect to
# kube-apiserver.
# standalone: false
# -- algorithm is the name of the load balancing algorithm for backend
# selection e.g. random or maglev
algorithm: maglev
# dsr mode did probably caused packet drops, so falling back to snat
mode: snat
# -- The agent can be put into one of the three policy enforcement modes:
# default, always and never.
# ref: https://docs.cilium.io/en/stable/policy/intro/#policy-enforcement-modes
policyEnforcementMode: never
# -- Enables the enforcement of host policies in the eBPF datapath.
hostFirewall: false

View File

@ -1,139 +0,0 @@
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: kured
rules:
# Allow kured to read spec.unschedulable
# Allow kubectl to drain/uncordon
#
# NB: These permissions are tightly coupled to the bundled version of kubectl; the ones below
# match https://github.com/kubernetes/kubernetes/blob/v1.19.4/staging/src/k8s.io/kubectl/pkg/cmd/drain/drain.go
#
- apiGroups: [""]
resources: ["nodes"]
verbs: ["get", "patch"]
- apiGroups: [""]
resources: ["pods"]
verbs: ["list","delete","get"]
- apiGroups: ["apps"]
resources: ["daemonsets"]
verbs: ["get"]
- apiGroups: [""]
resources: ["pods/eviction"]
verbs: ["create"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: kured
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: kured
subjects:
- kind: ServiceAccount
name: kured
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: kube-system
name: kured
rules:
# Allow kured to lock/unlock itself
- apiGroups: ["apps"]
resources: ["daemonsets"]
resourceNames: ["kured"]
verbs: ["update"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
namespace: kube-system
name: kured
subjects:
- kind: ServiceAccount
namespace: kube-system
name: kured
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: kured
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: kured
namespace: kube-system
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: kured # Must match `--ds-name`
namespace: kube-system # Must match `--ds-namespace`
spec:
selector:
matchLabels:
name: kured
updateStrategy:
type: RollingUpdate
template:
metadata:
labels:
name: kured
spec:
serviceAccountName: kured
tolerations:
- key: node-role.kubernetes.io/master
effect: NoSchedule
hostPID: true # Facilitate entering the host mount namespace via init
restartPolicy: Always
containers:
- name: kured
image: docker.io/weaveworks/kured:1.8.0
# If you find yourself here wondering why there is no
# :latest tag on Docker Hub,see the FAQ in the README
imagePullPolicy: IfNotPresent
securityContext:
privileged: true # Give permission to nsenter /proc/1/ns/mnt
env:
# Pass in the name of the node on which this pod is scheduled
# for use with drain/uncordon operations and lock acquisition
- name: KURED_NODE_ID
valueFrom:
fieldRef:
fieldPath: spec.nodeName
command:
- /usr/bin/kured
- --reboot-sentinel-command="/usr/bin/needs-restarting -r"
# - --force-reboot=false
# - --drain-grace-period=-1
# - --skip-wait-for-delete-timeout=0
# - --drain-timeout=0
# - --period=1h
# - --ds-namespace=kube-system
# - --ds-name=kured
# - --lock-annotation=weave.works/kured-node-lock
# - --lock-ttl=0
# - --prometheus-url=http://prometheus.monitoring.svc.cluster.local
# - --alert-filter-regexp=^RebootRequired$
# - --alert-firing-only=false
# - --reboot-sentinel=/var/run/reboot-required
# - --prefer-no-schedule-taint=""
# - --slack-hook-url=https://hooks.slack.com/...
# - --slack-username=prod
# - --slack-channel=alerting
# - --notify-url="" # See also shoutrrr url format
# - --message-template-drain=Draining node %s
# - --message-template-drain=Rebooting node %s
# - --blocking-pod-selector=runtime=long,cost=expensive
# - --blocking-pod-selector=name=temperamental
# - --blocking-pod-selector=...
# - --reboot-days=sun,mon,tue,wed,thu,fri,sat
# - --start-time=0:00
# - --end-time=23:59:59
# - --time-zone=UTC
# - --annotate-nodes=false
# - --lock-release-delay=30m

View File

@ -47,11 +47,12 @@ resource "hcloud_server" "first_control_plane" {
EOT EOT
} }
# Install the Hetzner Cloud cloud controller and cloud storage interface
provisioner "local-exec" { provisioner "local-exec" {
command = <<-EOT command = <<-EOT
kubectl -n kube-system create secret generic hcloud --from-literal=token=${random_password.k3s_token.result} --from-literal=network=${hcloud_network.k3s.name} --kubeconfig ${path.module}/kubeconfig.yaml kubectl -n kube-system create secret generic hcloud --from-literal=token=${var.hcloud_token} --from-literal=network=${hcloud_network.k3s.name} --kubeconfig ${path.module}/kubeconfig.yaml
kubectl apply -f ${path.module}/manifests/hcloud-ccm-net.yaml --kubeconfig ${path.module}/kubeconfig.yaml kubectl apply -f ${path.module}/manifests/hcloud-ccm-net.yaml --kubeconfig ${path.module}/kubeconfig.yaml
kubectl -n kube-system create secret generic hcloud-csi --from-literal=token=${random_password.k3s_token.result} --kubeconfig ${path.module}/kubeconfig.yaml kubectl -n kube-system create secret generic hcloud-csi --from-literal=token=${var.hcloud_token} --kubeconfig ${path.module}/kubeconfig.yaml
kubectl apply -f https://raw.githubusercontent.com/hetznercloud/csi-driver/master/deploy/kubernetes/hcloud-csi.yml --kubeconfig ${path.module}/kubeconfig.yaml kubectl apply -f https://raw.githubusercontent.com/hetznercloud/csi-driver/master/deploy/kubernetes/hcloud-csi.yml --kubeconfig ${path.module}/kubeconfig.yaml
EOT EOT
} }

View File

@ -22,6 +22,8 @@ k3os:
- "--kubelet-arg" - "--kubelet-arg"
- "cloud-provider=external" - "cloud-provider=external"
- "--flannel-iface=eth1" - "--flannel-iface=eth1"
- "--node-label"
- "k3os.io/upgrade=latest"
token: ${k3s_token} token: ${k3s_token}
ntp_servers: ntp_servers:
- 0.de.pool.ntp.org - 0.de.pool.ntp.org

View File

@ -17,6 +17,7 @@ k3os:
- server - server
- "--cluster-init" - "--cluster-init"
- "--disable-cloud-controller" - "--disable-cloud-controller"
- "--disable-network-policy"
- "--disable=traefik" - "--disable=traefik"
- "--disable=servicelb" - "--disable=servicelb"
- "--disable=local-storage" - "--disable=local-storage"
@ -29,6 +30,8 @@ k3os:
- "${master_ip}" - "${master_ip}"
- "--kubelet-arg" - "--kubelet-arg"
- "cloud-provider=external" - "cloud-provider=external"
- "--node-label"
- "k3os.io/upgrade=latest"
token: ${k3s_token} token: ${k3s_token}
ntp_servers: ntp_servers:
- 0.de.pool.ntp.org - 0.de.pool.ntp.org

View File

@ -18,6 +18,7 @@ k3os:
- "--server" - "--server"
- "https://${master_ip}:6443" - "https://${master_ip}:6443"
- "--disable-cloud-controller" - "--disable-cloud-controller"
- "--disable-network-policy"
- "--disable=traefik" - "--disable=traefik"
- "--disable=servicelb" - "--disable=servicelb"
- "--disable=local-storage" - "--disable=local-storage"
@ -30,6 +31,8 @@ k3os:
- "${node_ip}" - "${node_ip}"
- "--kubelet-arg" - "--kubelet-arg"
- "cloud-provider=external" - "cloud-provider=external"
- "--node-label"
- "k3os.io/upgrade=latest"
token: ${k3s_token} token: ${k3s_token}
ntp_servers: ntp_servers:
- 0.de.pool.ntp.org - 0.de.pool.ntp.org