m/n/kubernetes: improve CSI registration reliability

Kubelet's plugin registration mechanism is quite awful, it
relies on being notified by inotify that a new registration socket has
been placed into a specific path, which it then interrogates and
reports back if the registration succeeded.

That registration sometimes involves network operations which are prone
to failure. It reports that failure back to the registration server
asynchronously but does not attempt to retry the process.

To actually get Kubelet to retry, one needs to remove and recreate the
registration socket.

This change implements such a mechanism, recreating the socket and
registration server on every reported registration failure.

Supervisor backoff is used to prevent busy-looping on non-transient
errors.

Change-Id: I79eaf0efdf55ccdede15d8cee42cda7c276e4b50
Reviewed-on: https://review.monogon.dev/c/monogon/+/2785
Reviewed-by: Serge Bazanski <serge@monogon.tech>
Reviewed-by: Tim Windelschmidt <tim@monogon.tech>
Tested-by: Jenkins CI
1 file changed
tree: e33a710087b67dfe7f34e8434cff8885dc38420c
  1. .github/
  2. build/
  3. cloud/
  4. go/
  5. intellij/
  6. metropolis/
  7. net/
  8. third_party/
  9. tools/
  10. version/
  11. .bazelignore
  12. .bazelproject
  13. .bazelrc
  14. .bazelrc.sandboxroot
  15. .bazelversion
  16. .git-ignore-revs
  17. .gitignore
  18. BUILD.bazel
  19. CODING_STANDARDS.md
  20. go.mod
  21. go.sum
  22. LICENSE
  23. MODULE.bazel
  24. MODULE.bazel.lock
  25. README.md
  26. SETUP.md
  27. shell.nix
  28. WORKSPACE
README.md

Monogon Monorepo

This is the main repository containing the source code for the Monogon Platform.

This is pre-release software - take a look, and check back later!

Environment

Our build environment is self-contained and requires only minimal host dependencies:

  • A Linux machine or VM.
  • Bazelisk >= v1.15.0 (or a working Nix environment).
  • A reasonably recent kernel with user namespaces enabled.
  • Working KVM with access to /dev/kvm (if you want to run tests).

Our docs assume that Bazelisk is available as bazel on your PATH.

Refer to SETUP.md for detailed instructions.

Monogon OS

The source code lives in //metropolis (Metropolis is the codename of Monogon OS).

See the //metropolis/README.md for a developer quick start guide, or see the Monogon OS Handbook for user documentation.

Run a single node demo cluster

Build CLI and node image:

bazel build //metropolis/cli/dbg //:launch --config dbg

Launch an ephemeral test node:

bazel test //:launch --config dbg --test_output=streamed

Run a kubectl command while the test is running:

bazel-bin/metropolis/cli/dbg/dbg_/dbg kubectl describe node

Test suite

Run full test suite:

bazel test --config dbg //...