m/pkg/supervisor: close connections when grpc server exits
When the Listener is closed, Serve will return with an error, but
already established connections will continue to serve requests. Stop or
GracefulStop must be called to close these connections.
This bug often caused the metropolis e2e test to fail on my machine with
the same symptoms as in #276: Node commit always failed with "lost
leadership". This happened because the nodes were sending requests on a
connection that was established before the leader was re-elected and the
grpc listener restarted, and still had the old leadership info.
Change-Id: I797ffa4a40914e5d45e7e4cd15fbb7655e930fa3
Reviewed-on: https://review.monogon.dev/c/monogon/+/2885
Reviewed-by: Serge Bazanski <serge@monogon.tech>
Tested-by: Jenkins CI
Vouch-Run-CI: Serge Bazanski <serge@monogon.tech>
diff --git a/metropolis/pkg/supervisor/supervisor_support.go b/metropolis/pkg/supervisor/supervisor_support.go
index e7b5d34..8d836f2 100644
--- a/metropolis/pkg/supervisor/supervisor_support.go
+++ b/metropolis/pkg/supervisor/supervisor_support.go
@@ -41,17 +41,19 @@
func GRPCServer(srv *grpc.Server, lis net.Listener, graceful bool) Runnable {
return func(ctx context.Context) error {
Signal(ctx, SignalHealthy)
+ defer func() {
+ if graceful {
+ srv.GracefulStop()
+ } else {
+ srv.Stop()
+ }
+ }()
errC := make(chan error)
go func() {
errC <- srv.Serve(lis)
}()
select {
case <-ctx.Done():
- if graceful {
- srv.GracefulStop()
- } else {
- srv.Stop()
- }
return ctx.Err()
case err := <-errC:
return err