tree e33a710087b67dfe7f34e8434cff8885dc38420c
parent 456961d6589c1afec75954ca94ed631e1f380566
author Lorenz Brun <lorenz@monogon.tech> 1708451106 +0100
committer Lorenz Brun <lorenz@monogon.tech> 1708452732 +0000

m/n/kubernetes: improve CSI registration reliability

Kubelet's plugin registration mechanism is quite awful, it
relies on being notified by inotify that a new registration socket has
been placed into a specific path, which it then interrogates and
reports back if the registration succeeded.

That registration sometimes involves network operations which are prone
to failure. It reports that failure back to the registration server
asynchronously but does not attempt to retry the process.

To actually get Kubelet to retry, one needs to remove and recreate the
registration socket.

This change implements such a mechanism, recreating the socket and
registration server on every reported registration failure.

Supervisor backoff is used to prevent busy-looping on non-transient
errors.

Change-Id: I79eaf0efdf55ccdede15d8cee42cda7c276e4b50
Reviewed-on: https://review.monogon.dev/c/monogon/+/2785
Reviewed-by: Serge Bazanski <serge@monogon.tech>
Reviewed-by: Tim Windelschmidt <tim@monogon.tech>
Tested-by: Jenkins CI
