cloud/shepherd/equinix: implement recoverer

This implements basic recovery functionality for 'stuck' agents. The
shepherd will notice machines with a agent that either never sent a
heartbeat, or stopped sending heartbeats, and will remove their agent
started tags and reboot the machine. Then, the main agent start logic
should kick in again.

More complex recovery flows can be implemented later, this will do for
now.

Change-Id: I2c1b0d0465e4e302cdecce950a041581c2dc8548
Reviewed-on: https://review.monogon.dev/c/monogon/+/1560
Tested-by: Jenkins CI
Reviewed-by: Tim Windelschmidt <tim@monogon.tech>
7 files changed
tree: 3dff4cdf264bed17e66f7aed2c8085b67738104d
  1. .github/
  2. build/
  3. cloud/
  4. go/
  5. intellij/
  6. metropolis/
  7. net/
  8. third_party/
  9. tools/
  10. .bazelignore
  11. .bazelproject
  12. .bazelrc
  13. .bazelrc.sandboxroot
  14. .bazelversion
  15. .git-ignore-revs
  16. .gitignore
  17. BUILD.bazel
  18. CODING_STANDARDS.md
  19. go.mod
  20. go.sum
  21. LICENSE
  22. README.md
  23. SETUP.md
  24. WORKSPACE
README.md

Monogon Monorepo

This is the main repository containing the source code for the Monogon Platform.

This is pre-release software - take a look, and check back later!

Environment

Our build environment is self-contained and requires only minimal host dependencies:

  • A Linux machine or VM.
  • Bazelisk >= v1.15.0
  • A reasonably recent kernel with user namespaces enabled.
  • Working KVM with access to /dev/kvm (if you want to run tests).

Our docs assume that Bazelisk is available as bazel on your PATH.

Refer to SETUP.md for detailed instructions.

Monogon OS

Run a single node demo cluster

Build CLI and node image:

bazel build //metropolis/cli/dbg //:launch -c dbg

Launch an ephemeral test node:

bazel test //:launch -c dbg --test_output=streamed

Run a kubectl command while the test is running:

bazel-bin/metropolis/cli/dbg/dbg_/dbg kubectl describe node

Test suite

Run full test suite:

bazel test -c dbg //...