m/n/core/consensus: work around etcd dial timeout

Observed in an E2E test:

  consensus  ready to serve client requests
  supervisor Runnable root.role.controlplane.launcher.consensus died: returned
             error when NODE_STATE_NEW: bootstrap failed: when getting bootstrap
             client: context deadline exceeded
  supervisor rescheduling supervised node root.role.controlplane.launcher.consensus
             with backoff 681.402139ms
  consensus  data absent, bootstrapping.
  consensus  Bootstrapping PKI: starting etcd...
  supervisor Runnable root.role.controlplane.launcher.consensus died: returned
             error when NODE_STATE_NEW: bootstrap failed: failed to start etcd:
             listen tcp127.0.0.1:7834: bind: address already in use

I'm not sure what caused the original timeout of the client. Let's bump
it to two seconds instead of one.

In addition, let's also properly stop the bootstrap etcd server on
failure, instead of letting it run forever and preventing any subsequent
etcd server from starting up.

Change-Id: Icbcc31cb1e0b9e619360cbd71c5ee81396c79724
Reviewed-on: https://review.monogon.dev/c/monogon/+/1352
Tested-by: Jenkins CI
Reviewed-by: Lorenz Brun <lorenz@monogon.tech>
2 files changed
tree: d04ffe4c6866be5139dbc87424d14cac2baea6cd
  1. .github/
  2. build/
  3. cloud/
  4. go/
  5. intellij/
  6. metropolis/
  7. net/
  8. third_party/
  9. tools/
  10. .bazelignore
  11. .bazelproject
  12. .bazelrc
  13. .bazelrc.sandboxroot
  14. .bazelversion
  15. .git-ignore-revs
  16. .gitignore
  17. BUILD.bazel
  18. CODING_STANDARDS.md
  19. go.mod
  20. go.sum
  21. LICENSE
  22. README.md
  23. SETUP.md
  24. WORKSPACE
README.md

Monogon Monorepo

This is the main repository containing the source code for the Monogon Platform.

This is pre-release software - take a look, and check back later!

Environment

Our build environment is self-contained and requires only minimal host dependencies:

  • A Linux machine or VM.
  • Bazelisk >= v1.15.0
  • A reasonably recent kernel with user namespaces enabled.
  • Working KVM with access to /dev/kvm (if you want to run tests).

Our docs assume that Bazelisk is available as bazel on your PATH.

Refer to SETUP.md for detailed instructions.

Monogon OS

Run a single node demo cluster

Build CLI and node image:

bazel build //metropolis/cli/dbg //:launch -c dbg

Launch an ephemeral test node:

bazel test //:launch -c dbg --test_output=streamed

Run a kubectl command while the test is running:

bazel-bin/metropolis/cli/dbg/dbg_/dbg kubectl describe node

Test suite

Run full test suite:

bazel test -c dbg //...