m/node: switch to cgroupv2

This switches us from legacy cgroup (v1) to cgroup v2 aka unified
cgroup. Our versions of Kubernetes, containerd and runc/gVisor all
support this by now.

cgroup_bpf needs to be enabled in the kernel for containerd with cgroup
v2. Also enable swap as this now works with cgroup v2, this gets rid of
a warning for every pod being started.

We are not really using cgroups ourselves, but as the root cgroup in v2
is special, move our own process into a subgroup at startup.

Change-Id: I8d63b2ad672568c052c3fe1a2306182f033667fa
Reviewed-on: https://review.monogon.dev/c/monogon/+/3207
Tested-by: Jenkins CI
Reviewed-by: Jan Schär <jan@monogon.tech>
diff --git a/metropolis/node/kubernetes/kubelet.go b/metropolis/node/kubernetes/kubelet.go
index 19a79b2..2c46080 100644
--- a/metropolis/node/kubernetes/kubelet.go
+++ b/metropolis/node/kubernetes/kubelet.go
@@ -102,8 +102,12 @@
 		EnableControllerAttachDetach: reconciler.False(),
 		HairpinMode:                  "none",
 		MakeIPTablesUtilChains:       reconciler.False(), // We don't have iptables
-		FailSwapOn:                   reconciler.False(), // Our kernel doesn't have swap enabled which breaks Kubelet's detection
-		CgroupRoot:                   "/",
+		FailSwapOn:                   reconciler.False(),
+		MemorySwap: kubeletconfig.MemorySwapConfiguration{
+			// Only allow burstable pods to use swap
+			SwapBehavior: "LimitedSwap",
+		},
+		CgroupRoot: "/",
 		KubeReserved: map[string]string{
 			"cpu":    "200m",
 			"memory": "300Mi",
@@ -114,7 +118,7 @@
 		VolumePluginDir: s.EphemeralDirectory.FlexvolumePlugins.FullPath(),
 		// Currently we allocate a /24 per node, so we can have a maximum of
 		// 253 pods per node.
-		MaxPods: 253,
+		MaxPods:    253,
 		PodLogsDir: "/data/kubelet/logs",
 	}
 }