tree d04ffe4c6866be5139dbc87424d14cac2baea6cd
parent 05f813bf2d311f94dbc8021a85b37ff7c2e33242
author Serge Bazanski <serge@monogon.tech> 1678924016 +0100
committer Serge Bazanski <serge@monogon.tech> 1679000699 +0000

m/n/core/consensus: work around etcd dial timeout

Observed in an E2E test:

  consensus  ready to serve client requests
  supervisor Runnable root.role.controlplane.launcher.consensus died: returned
             error when NODE_STATE_NEW: bootstrap failed: when getting bootstrap
             client: context deadline exceeded
  supervisor rescheduling supervised node root.role.controlplane.launcher.consensus
             with backoff 681.402139ms
  consensus  data absent, bootstrapping.
  consensus  Bootstrapping PKI: starting etcd...
  supervisor Runnable root.role.controlplane.launcher.consensus died: returned
             error when NODE_STATE_NEW: bootstrap failed: failed to start etcd:
             listen tcp127.0.0.1:7834: bind: address already in use

I'm not sure what caused the original timeout of the client. Let's bump
it to two seconds instead of one.

In addition, let's also properly stop the bootstrap etcd server on
failure, instead of letting it run forever and preventing any subsequent
etcd server from starting up.

Change-Id: Icbcc31cb1e0b9e619360cbd71c5ee81396c79724
Reviewed-on: https://review.monogon.dev/c/monogon/+/1352
Tested-by: Jenkins CI
Reviewed-by: Lorenz Brun <lorenz@monogon.tech>
