metropolis/node/c/localstorage: make writes durable
With just a write() -> rename() we can end up with an empty file in case
of an outage. The write() needs to be followed by an fsync() to avoid
that.
Since all of our localstorage framework contains only rarely-written
'configuration' style files, we now add such an fsync() to every
write().
This should fix some flakes in tests and low-load clusters where eg. a
node can't read back its key material or persisted node roles after
reboot.
Change-Id: Iefae4f8bd68ee2972860a7c58326442c80d8aa8c
Reviewed-on: https://review.monogon.dev/c/monogon/+/2411
Tested-by: Jenkins CI
Reviewed-by: Lorenz Brun <lorenz@monogon.tech>
diff --git a/metropolis/node/core/localstorage/declarative/placement_local.go b/metropolis/node/core/localstorage/declarative/placement_local.go
index f12f654..77d407c 100644
--- a/metropolis/node/core/localstorage/declarative/placement_local.go
+++ b/metropolis/node/core/localstorage/declarative/placement_local.go
@@ -68,10 +68,29 @@
// TODO(q3k): ensure that these do not collide with an existing sibling file, or generate this suffix randomly.
tmp := f.FullPath() + ".__metropolis_tmp"
defer os.Remove(tmp)
- if err := os.WriteFile(tmp, d, mode); err != nil {
+
+ tf, err := os.OpenFile(tmp, os.O_WRONLY|os.O_CREATE, mode)
+ if err != nil {
+ return fmt.Errorf("temporary file open failed: %w", err)
+ }
+ defer tf.Close()
+ if _, err := tf.Write(d); err != nil {
return fmt.Errorf("temporary file write failed: %w", err)
}
+ // Fsync the source file to guarantee that write is durable. Per Theodore Ts'o:
+ //
+ // > data=ordered only guarantees the avoidance of stale data (e.g., the previous
+ // > contents of a data block showing up after a crash, where the previous data
+ // > could be someone's love letters, medical records, etc.). Without the fsync(2)
+ // > a zero-length file is a valid and possible outcome after the rename.
+ if err := tf.Sync(); err != nil {
+ return fmt.Errorf("temporary file sync failed: %w", err)
+ }
+ if err := tf.Close(); err != nil {
+ return fmt.Errorf("temporary file close failed: %w", err)
+ }
+
if err := unix.Rename(tmp, f.FullPath()); err != nil {
return fmt.Errorf("renaming target file failed: %w", err)
}