Installing Kubernetes with a Windows Node

Eviatar Gerzi
9 min readNov 6, 2022

Installing a Kubernetes cluster is quite easy but joining a Windows node to your cluster is a different thing. In this blog post, we will look at the way we join a Windows machine to a Kubernetes cluster because we needed to patch some files to make it work.

Special thanks to Rory McCune and other Kubernetes and Calico Slack members for their help along the way.

Prerequisites

We used the following settings for our cluster:

Kubernetes cluster version: 1.25.3

Kubernetes Linux master node IP: 10.0.6.226

Kubernetes Windows worker node IP: 10.0.6.162

Installing Kubernetes Master Node (Linux)

Kubeadm Installation

We started by installing our master node on an Ubuntu machine through kubeadm, an installation application for Kubernetes. The first step is to install kubeadm, kubelet, and kubectl through this article. In our case, it was enough to run these commands but you can read the previous article for more information:

sudo apt-get update
sudo apt-get install -y apt-transport-https ca-certificates curl
sudo curl -fsSLo /usr/share/keyrings/kubernetes-archive-keyring.gpg https://packages.cloud.google.com/apt/doc/apt-key.gpgecho "deb [signed-by=/usr/share/keyrings/kubernetes-archive-keyring.gpg] https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee /etc/apt/sources.list.d/kubernetes.listsudo apt-get update
sudo apt-get install -y kubelet kubeadm kubectl
sudo apt-mark hold kubelet kubeadm kubectl

Creating Kubernetes Cluster

Usually, we install the cluster through this guide, by running kubeadm init, but for our case, we used Calico’s guide because we need this plug-in to support the Window node machine. In the guide, they explain how to install a single-host Kubernetes cluster, so we will remove some stages because we want to install a multiple-host cluster:

sudo kubeadm init --pod-network-cidr=192.168.0.0/16mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
kubectl create -f https://raw.githubusercontent.com/projectcalico/calico/v3.24.4/manifests/tigera-operator.yamlkubectl create -f https://raw.githubusercontent.com/projectcalico/calico/v3.24.4/manifests/custom-resources.yaml#Wait for all the pods to run with STATUS of Running
watch kubectl get pods -n calico-system

When done, we want to make sure that our node is ready and all the pods are running without errors:

kubectl get pods --all-namespaces
kubectl get nodes -o wide

Joining Windows Node To The Cluster

We installed Windows Server 2019 (build 17763) node and added all the required container components for Windows.

Prerequisites:

  • curl for Windows
  • Container components (for Windows components make sure you install “Containers”, “Hyper-V”, “Virtual Machine Platform”, “Windows Hypervisor Platform” and “Windows Subsystem for Linux”)
  • 7zip

*Not sure you will need all the container components.

Installing Container Runtime — Containerd 1.6.9

We start by installing the container runtime: containerd. We used their GitHub for the installation, and changed the version to the latest (1.6.9). You can do it by open PowerShell and run:

# Download and extract desired containerd Windows binaries
$Version="1.6.9"
curl.exe -L https://github.com/containerd/containerd/releases/download/v$Version/containerd-$Version-windows-amd64.tar.gz -o containerd-windows-amd64.tar.gz
tar.exe xvf .\containerd-windows-amd64.tar.gz

# Copy and configure
Copy-Item -Path ".\bin\" -Destination "$Env:ProgramFiles\containerd" -Recurse -Force
cd $Env:ProgramFiles\containerd\
.\containerd.exe config default | Out-File config.toml -Encoding ascii

# Review the configuration. Depending on setup you may want to adjust:
# - the sandbox_image (Kubernetes pause image)
# - cni bin_dir and conf_dir locations
Get-Content config.toml

# Register and start service
.\containerd.exe --register-service
Start-Service containerd

*I don’t remember if it was required but I also installed crictl for Windows. We can go to the release page and search for the Windows installation (in our case, crictl-v1.25.0-windows-amd64.tar.gz). We extract one file named crictl.exe and add its location to the PATH, or if you are lazy like me and this is just a test node, add it to C:\Windows\System32.

Installing Calico for Windows

We used Calico’s guide for the installation, let’s begin. We configure strict affinity for the Linux cluster and disable BGP:

kubectl patch ipamconfigurations default --type merge --patch='{"spec": {"strictAffinity": true}}'kubectl patch installation default --type=merge -p '{"spec": {"calicoNetwork": {"bgp": "Disabled"}}}'

Open PowerShell and run the following:

mkdir c:\kInvoke-WebRequest https://projectcalico.docs.tigera.io/scripts/install-calico-windows.ps1 -OutFile c:\install-calico-windows.ps1C:\install-calico-windows.ps1 -KubeVersion 1.25.3 -ServiceCidr 10.96.0.0/12 -DNSServerIPs 10.96.0.10

At this point you might get the following error:

The term ‘c:\CalicoWindows\libs\calico\..\..\..\nssm.exe’ is not recognized as the name of cmdlet, function,….

The solution is here (also here), and to solve it we need to add:

$nssmDir = Get-ChildItem $RootDir -filter "nssm*" -Directory
mv $nssmDir.fullname $RootDir\nssm-2.24

To the install-calico-windows.ps1 script in this block (rows 449-450 in 3.24.3):

Remove-Item $RootDir -Force  -Recurse -ErrorAction SilentlyContinue
Write-Host "Unzip Calico for Windows release..."
Expand-Archive -Force $CalicoZip c:\
# This is a temporary fix to make sure nssm binary is in the correct path.
<here>
ipmo -force $RootDir\libs\calico\calico.psm1

If the script runs without any errors you might get it stuck by printing the following:

Waiting for Calico initialisation to finish...StoredLastBootTime , CurrentLastBootTime 11/3/2022 4:14:58PM
Waiting for Calico initialisation to finish...StoredLastBootTime , CurrentLastBootTime 11/3/2022 4:14:59PM
....

We checked all the services and they are up and running:

Get-Service -Name CalicoNode
Get-Service -Name CalicoFelix
C:\CalicoWindows\kubernetes\install-kube-services.ps1
Start-Service -Name kubelet
Start-Service -Name kube-proxy
Verify kubelet/kube-proxy services are running.
Get-Service -Name kubelet
Get-Service -Name kube-proxy

We checked the kubelet logs ( C:\k\kubelet.out.log) and saw lots of these:

Waiting for interface named 'vEthernet (Ethernet...'.
Waiting for interface named 'vEthernet (Ethernet...'.
Waiting for interface named 'vEthernet (Ethernet...'.
Waiting for interface named 'vEthernet (Ethernet...'.
Waiting for interface named 'vEthernet (Ethernet...'.
Waiting for interface named 'vEthernet (Ethernet...'.
Waiting for interface named 'vEthernet (Ethernet...'.

It was already suggested to solve it here, so we went to the problematic file (kubelet-service.ps1) and marked in comments the following rows, and set the node IP:

"--node-ip=10.0.6.162", `

We re-run the script, and when done we run the join command:

kubeadm token create --print-join-command (from Linux master)kubeadm join 10.0.6.223:6443 --token <REDACTED> --discovery-token-ca-cert-hash sha256:<REDACTED> --V=5

*The kubeadm can be found in C:\k or C:\k\kubernetes\node\bin.

After we finished, we check that we can see the node in a ready status:

root@manager2:/home/cyber# kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
manager2 Ready control-plane 9d v1.25.3 10.0.6.223 <none> Ubuntu 22.04 LTS 5.15.0-52-generic containerd://1.6.8
tmp-w2019x64 Ready <none> 2d20h v1.25.3 10.0.6.162 <none> Windows Server 2019 Standard 10.0.17763.3532 containerd://1.6.9

Troubleshooting Errors

Failed to join node with a container runtime error: “invalid character ‘P’ in string escape code”

When we tried to join the Windows node:

C:\Users\Administrator>kubeadm join 10.0.6.223:6443 --token <REDACTED>

We received the error:

[ERROR CRI]: container runtime is not running: output: time="2022-11-03T10:53:14+02:00" level=fatal msg="getting status of runtime: invalid character 'P' in string escape code"

We saw the same error when using crictl:

C:\Users\Administrator>"C:\Program Files\containerd\crictl.exe" -r "npipe:////./pipe/containerd-containerd" info
time="2022-11-03T10:53:14+02:00" level=fatal msg="getting status of runtime: invalid character 'P' in string escape code"

The containerd was up and running but it was still failing. We solved it by re-installing containerd. We recommend to use GitHub as we did but you can also try Calico’s guide:

Invoke-WebRequest https://projectcalico.docs.tigera.io/scripts/Install-Containerd.ps1 -OutFile c:\Install-Containerd.ps1
c:\Install-Containerd.ps1 -ContainerDVersion 1.6.2 -CNIConfigPath "c:/etc/cni/net.d" -CNIBinPath "c:/opt/cni/bin"

Failed to join node with container runtime error: “container runtime is not running”

When we tried to join the node without containerd we received the following:

C:\Users\Administrator>kubeadm join 10.0.6.223:6443 --token <REDACTED>
[preflight] Running pre-flight checks
error execution phase preflight: [preflight] Some fatal errors occurred:
[ERROR FileAvailable-\etc\kubernetes\bootstrap-kubelet.conf]: \etc\kubernetes\bootstrap-kubelet.conf already exists
[ERROR CRI]: container runtime is not running: output: time="2022-11-02T16:17:51+02:00" level=fatal msg="unable to determine runtime API version: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing open //./pipe/containerd-containerd: The system cannot find the file specified.\""
, error: exit status 1
[ERROR FileAvailable-C:-etc-kubernetes-pki-ca.crt]: C:/etc/kubernetes/pki/ca.crt already exists
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
To see the stack trace of this error execute with --v=5 or higher

The errors for files “already exists” can be solve by simply removing the file. The major error:

container runtime is not running

Can be solved by installing containerd from here, make sure to change the version to the latest, in my case 1.6.9.

PowerShell script fails with …nssm.exe is not recognized as the name of a cmdlet

We already explained it on the article, you can also check this thread on the slack.

When we run

C:\install-calico-windows.ps1 -KubeVersion 1.25.3 -ServiceCidr 10.96.0.0/12 -DNSServerIPs 10.96.0.10

We received:

The term ‘c:\CalicoWindows\libs\calico\..\..\..\nssm.exe’ is not recognized as the name of cmdlet, function,….

The fix is here:

Remove-Item $RootDir -Force  -Recurse -ErrorAction SilentlyContinue
Write-Host "Unzip Calico for Windows release..."
Expand-Archive -Force $CalicoZip c:\
$nssmDir = Get-ChildItem $RootDir -filter "nssm*" -Directory
mv $nssmDir.fullname $RootDir\nssm-2.24
ipmo -force $RootDir\libs\calico\calico.psm1

API is Down, kubelet failed to run: “running with swap on is not supported”

When listing pods or any other command you get:

root@manager2:/home/cyber# kubectl get pods
The connection to the server 10.0.6.223:6443 was refused - did you specify the right host or port?

We first check kubelet by running:

service kubelet status -> showed errors, so we went to check the logsjournalctl -eu kubelet

We received the following errors:

Nov 06 10:56:20 manager2 systemd[1]: kubelet.service: Scheduled restart job, restart counter is at 3274.
Nov 06 10:56:20 manager2 systemd[1]: Stopped kubelet: The Kubernetes Node Agent.
Nov 06 10:56:20 manager2 systemd[1]: Started kubelet: The Kubernetes Node Agent.
Nov 06 10:56:20 manager2 kubelet[30915]: Flag --container-runtime has been deprecated, will be removed in 1.27 as the only valid value is 'remote'
Nov 06 10:56:20 manager2 kubelet[30915]: Flag --pod-infra-container-image has been deprecated, will be removed in 1.27. Image garbage collector will get sandbox image information from CRI.
Nov 06 10:56:20 manager2 kubelet[30915]: I1106 10:56:20.843366 30915 server.go:200] "--pod-infra-container-image will not be pruned by the image garbage collector in kubelet and should also be set in the remote runtime"
Nov 06 10:56:20 manager2 kubelet[30915]: Flag --container-runtime has been deprecated, will be removed in 1.27 as the only valid value is 'remote'
Nov 06 10:56:20 manager2 kubelet[30915]: Flag --pod-infra-container-image has been deprecated, will be removed in 1.27. Image garbage collector will get sandbox image information from CRI.
Nov 06 10:56:20 manager2 kubelet[30915]: I1106 10:56:20.847307 30915 server.go:413] "Kubelet version" kubeletVersion="v1.25.3"
Nov 06 10:56:20 manager2 kubelet[30915]: I1106 10:56:20.847329 30915 server.go:415] "Golang settings" GOGC="" GOMAXPROCS="" GOTRACEBACK=""
Nov 06 10:56:20 manager2 kubelet[30915]: I1106 10:56:20.847544 30915 server.go:825] "Client rotation is on, will bootstrap in background"
Nov 06 10:56:20 manager2 kubelet[30915]: I1106 10:56:20.848779 30915 certificate_store.go:130] Loading cert/key pair from "/var/lib/kubelet/pki/kubelet-client-current.pem".
Nov 06 10:56:20 manager2 kubelet[30915]: I1106 10:56:20.850378 30915 dynamic_cafile_content.go:157] "Starting controller" name="client-ca-bundle::/etc/kubernetes/pki/ca.crt"
Nov 06 10:56:20 manager2 kubelet[30915]: I1106 10:56:20.852182 30915 server.go:660] "--cgroups-per-qos enabled, but --cgroup-root was not specified. defaulting to /"
Nov 06 10:56:20 manager2 kubelet[30915]: E1106 10:56:20.852295 30915 run.go:74] "command failed" err="failed to run Kubelet: running with swap on is not supported, please disable swap! or set --fail-swap-on flag to false. /proc/swaps >
Nov 06 10:56:20 manager2 systemd[1]: kubelet.service: Main process exited, code=exited, status=1/FAILURE
Nov 06 10:56:20 manager2 systemd[1]: kubelet.service: Failed with result 'exit-code'.

A specific row took our attention:

running with swap on is not supported

After running:

swapoff -a

The kubelet service started working.

Getting “Waiting for Calico initialisation to finish…”

IWhile trying to run a pod after long time that I didn’t use it, it was on the status “ContainerCreating” or “Pending”. Checking the logs (“calico-confd.log” and “calico-felix.log”) show lots of the same errors like that:

Waiting for Calico initialisation to finish...StoredLastBootTime , CurrentLastBootTime 1/3/2023 12:15:24 PM

The Calico document refers to this error and suggested reinstalling calico:

This can be caused by Window’s Execution protection feature. Exit the install using Ctrl-C, unblock the scripts, run uninstall-calico.ps1, followed by install-calico.ps1.

In my case it didn’t solve it, but it succeeded to run the pod after sometime.

References

--

--

Eviatar Gerzi

Security researcher interested in reversing, solving CTFs, malware analysis, penetration testing and DevOps security (docker and Kubernetes)