handbook: structure, introduction, chapter 3 root

This is a first pass at the Metropolis Handbook. We write out the
expected outline of the documentation, and start filling in some
chapters.

The introduction pages and Chapter 1 are a high-level overview of what
Metropolis is and how it's expected to be used.

Chapter 3 will delve deep into the technical details of how Metropolis
is implemented. For now, the introduction/overview section of that
chapter is written.

Change-Id: Ic16aa91ed9a127f4f3791eeaf8d7f5e1a24b1018
Reviewed-on: https://review.monogon.dev/c/monogon/+/453
Tested-by: Jenkins CI
Reviewed-by: Sergiusz Bazanski <serge@monogon.tech>
diff --git a/metropolis/handbook/src/SUMMARY.md b/metropolis/handbook/src/SUMMARY.md
index 7390c82..64639b8 100644
--- a/metropolis/handbook/src/SUMMARY.md
+++ b/metropolis/handbook/src/SUMMARY.md
@@ -1,3 +1,36 @@
-# Summary
+[Metropolis, a cluster operating system](./introduction-00-title.md)
+[How to use this Handbook](./introduction-01-how-to-use.md)
 
-- [Chapter 1](./chapter_1.md)
+- [Metropolis in your Organization](./ch01-00-metropolis-organization.md)
+
+- [Demo Cluster](ch02-00-local-demo-cluster.md)
+  - [Launch locally](ch02-01-launch-locally.md)
+  - [Single node test cluster](ch02-02-single-node-test-cluster.md)
+
+- [Cluster Architecture](ch03-00-cluster-architecture.md)
+  - [Node](ch03-01-node.md)
+  - [Node Runnables and Logging](ch03-02-node-runnables.md)
+  - [Node Storage](ch03-03-node-storage.md)
+  - [Cluster](ch03-04-cluster.md)
+  - [Cluster API](ch03-05-cluster-api.md)
+  - [Identity and Authentication](ch03-06-identity-and-authentication.md)
+  - [Cluster Policy](ch03-07-cluster-policy.md)
+  - [Hardware Attestation](ch03-08-hardware-attestation.md)
+
+- [Production Deployment](ch04-00-production-deployment.md)
+  - [Stability and Releases](ch04-01-stability-and-releases.md)
+  - [Hardware Requirements](ch04-03-requirements.md)
+  - [Organizational Requirements](ch04-04-organizational-requirements.md)
+  - [The metroctl tool](ch04-02-the-metroctl-too.md)
+  - [Configuration and Deployment](ch04-05-configuration-and-deployment.md)
+  - [API and Daily Management](ch04-06-api-and-management.md)
+  - [Giving access to Users](ch04-07-giving-access-to-users.md)
+  - [Metrics and Monitoring](ch04-08-metrics-and-monitoring.md)
+  - [Troubleshooting and runbooks](ch04-09-troubleshooting-runbooks.md)
+
+- [Developing Metropolis](ch05-00-developing-metropolis.md)
+  - [Checking out and Building](ch05-01-checking-out-and-building.md)
+  - [A Bazel crash course](ch05-02-a-bazel-crash-course.md)
+  - [Codebase Structure](ch05-03-codebase-structure.md)
+  - [Running Tests](chd04-04-running-tests.md)
+  - [Design Process](chd04-05-design-process.md)
diff --git a/metropolis/handbook/src/ch01-00-metropolis-organization.md b/metropolis/handbook/src/ch01-00-metropolis-organization.md
new file mode 100644
index 0000000..e39323f
--- /dev/null
+++ b/metropolis/handbook/src/ch01-00-metropolis-organization.md
@@ -0,0 +1,65 @@
+## Metropolis in your Organization
+
+> *Note*: In this chapter, 'developers' mean product developers, ie. Metropolis **Users**, not Metropolis **Developers**. Whenever you see **User**, **Operator** or **Developer**, think of Metropolis roles. However, whenever you see **developer**, think of product development teams acting as **Users**.
+
+As outlined in [How to use this Handbook](introduction-01-how-to-use.md), Metropolis has at its core the concept of separate Users and Operators of a Metropolis cluster.
+
+This split might, at first glance, seem antithetical to the spirit of 'DevOps'. However, this distinction **doesn't exist to take away operational tasks from software developers** (Users), but to let Metropolis scale to large organizations where developers cannot be expected to be responsible for operations from physical hardware (or a public cloud) up to their product. We believe product teams should be able to focus on the operational aspects specific to their product, and not have to deal with low-level fluff like cluster-level backups, monitoring and security.
+
+This chapter aims to explain and argument the reasoning for such a split, and tie this into how Metropolis expects to be managed in different kinds of organizations.
+
+### Platform Teams
+
+Metropolis allows large organizations to build internal Platform Teams. These exist to bring a 'PaaS'-style experience to multiple internal product development teams. Metropolis neatly fits into this scenario by exposing only a standard Kubernetes API to these development teams (acting as Metropolis Users), while also exposing a powerful but proprietary API for the platform team (Metropolis Operators) that concerns only operational work. The two APIs are separate but do not overlap in functionality.
+
+In the following example, the Platform Team are Metropolis Operators, while Product Teams A and B are Metropolis Users. The Platform Team runs two multi-tenant Metropolis clusters, both of which can be used by any Product Team for any purpose.
+
+```
+.----------------.      Manages     .--------------------------.
+| Product Team A | ---------------> | Product A                |
+'----------------'       (k8s)      '--------------------------'
+                                         |            |
+.----------------.      Manages     .--------------------------.
+| Product Team B | ---------------> | Product B                |
+'----------------'       (k8s)      '--------------------------'
+                                         | |          | |
+                                         | |          | | Runs on
+                                         V V          V V
+.----------------.      Manages     .-----------.  .-----------.
+| Platform Team  | ---------------> | Cluster X |  | Cluster Y |
+|                | -----------------|           |->|           |
+'----------------' (Metropolis API) '-----------'  '-----------'
+```
+
+At large scales, Product Teams benefit from Metropolis by using a product that does not require them to be aware of implementation details below the Kubernetes API layer, and can focus on day-to-day operations of core products. They do not need to coordinate with other Product Teams on sharing the underlying resources, nor do they need to take care of managing or scaling the cluster. Platform Teams likewise benefit from Metropolis having been designed for use in a multi-tenant fashion where product teams can share a cluster safely.
+
+### Smaller Organizations
+
+While the above Platform Team system works great for larger organizations, smaller organizations usually do not benefit from having distinct teams of dozens of people responsible just for clusters and other organizational-wide resources.
+
+In these cases, there is nothing which prevents the lone backend developer at a company from acting both as a Metropolis Operator and User and managing both Metropolis clusters and the actual workloads running on it:
+
+```
+.--------------.       Manages      .---------.
+| Backend Team | -----------------> | Product |
+|              |        (k8s)       |         |
+|              | --.                '---------'
+'--------------'   |                     | Runs on
+                   |                     V
+                   |   Manages      .---------.
+                   '--------------> | Cluster |
+                  (Metropolis API)  '---------'
+```
+
+As the organization grows, Metropolis will continue gently guiding (by way of Users/Operators role separation) workflows of the Backend team to not mix these two roles together. From the beginning, the Product can be deployed only using the Kubernetes API without needing to touch Metropolis-specific APIs. As new products and projects are developed, these can continue to use the existing Metropolis infrastructure without overhead of having each team manage their own production from the ground up.
+
+### Organizational anti-patterns
+
+Monogon believes that organizational issues cannot simply be fixed by applying technical solutions. Thus, Metropolis explicitly avoids supporting usecases that stem from heavy internal siloization of organizations, or the broken incentives of a syadmin-style platform team. We believe that Metropolis can be used as a catalyst to build better teams and workflows, but it is not by itself a fix for organizational problems.
+
+We would like to refer you to the following sources for more information on these organizational patterns and anti-patterns.
+
+1. [Infra teams: good, bad or none at all](https://rachelbythebay.com/w/2020/05/19/abc/), which describes the typical emerging ways organizations deal with infrastructure work. Metropolis leans heavily towards a “Company A” environment.
+1. [The SRE Book](https://sre.google/sre-book/table-of-contents/), which describes Google's “implementation” of DevOps. While the processes described work best for extremely large companies, a significant amount of high-level observations and judgements can be pertinent to even the smallest organizations.
+1. [The SRE Workbook](https://sre.google/workbook/table-of-contents/) chapter [“How SRE Relates to DevOps”](https://sre.google/workbook/how-sre-relates/), which describes an organizational approach to development and operation teams in which Metropolis works best.
+
diff --git a/metropolis/handbook/src/ch03-00-cluster-architecture.md b/metropolis/handbook/src/ch03-00-cluster-architecture.md
new file mode 100644
index 0000000..9f5669c
--- /dev/null
+++ b/metropolis/handbook/src/ch03-00-cluster-architecture.md
@@ -0,0 +1,141 @@
+Introduction
+---
+
+Each Metropolis deployment (Cluster) is fully self-contained and independent from other Clusters. 
+
+A Cluster is made up of Nodes. Nodes are machines (be it physical or virtual) running an instance of Metropolis. A Node can be part of only one Cluster.
+
+
+```
+             Cluster
+ .-----------._.'._.------------.
+ '                              '
+ .--------. .--------. .--------.
+ | Node A | | Node B | | Node C |
+ '--------' '--------' '--------'
+```
+
+Nodes
+---
+
+Each Node runs the Linux kernel and Metropolis userspace. The userspace is comprised of Metropolis code on a signed read-only partition, and of persistent user data on an encrypted read-write partition. The signed read-only filesystem (the System filesystem) is verified by the Linux kernel, which in turn is signed and verified by a Node's firmware (EFI) via Secure Boot.
+
+```
+          
+.--------------------.         .--------------------.
+| Platform Firmware  |-------->| Secure Boot Image  |
+|       (EFI)        | Checks  |--------------------|        
+|--------------------|         |  .--------------.  |        .-------------------.
+|      PK/KEK        |         |  | Linux Kernel |---------->| System FS (erofs) |
+| Signature Database |         |  |--------------|  | Checks |-------------------|
+'--------------------'         |  |  System FS   |  |        |    Node Code      |
+                               |  |  Signature   |  |        '-------------------'
+                               |  |  (dm-verity) |  |        
+                               |  '--------------'  |        
+                               '--------------------'
+
+```
+
+When booting, a Node needs to become part of a cluster (by either Bootstrapping a new one, Registering into an existing one for the first time, or Joining after reboot) to gather all the key material needed to mount the encrypted data partition. One part of the key is stored on the EFI System Partition encrypted by the TPM (sealed), and will only decrypt correctly if the Node's Secure Boot settings have not been tampered with. The other part of the key is stored by the Cluster, enforcing active communication (and possibly hardware attestation) with the Cluster before a Node can boot.
+
+```
+.-------------------.  Measures Secure Boot settings
+| Platform Firmware |<----------.
+'-------------------'           |
+         | Checks               |
+         v                      |
+.-------------------.           |
+| Secure Boot Image |           |
+'-------------------'           |
+         | Checks           .-------.
+         v                  |  TPM  |
+.-------------------.       '-------'
+|     System FS     |           |
+'-------------------'           | Seals/Unseals
+         | Mounts               v
+         |           .---------------------.        .------------------------.
+         | .---------| Node Encryption Key |        |    Running Cluster     |
+         |/          '---------------------'        |------------------------|
+         | .----------------------------------------| Cluster Encryption Key |
+         |/                                         |       (per node)       |
+         |                                          '------------------------'
+         v
+.---------------------------.
+| Data Partition (xfs/LUKS) |
+'---------------------------'
+ 
+```
+
+The Node boot, disk setup and security model are described in mode detail in the [Node](ch03-01-node.md) chapter.
+
+Each Node has the same minimal userland implemented in Go. However, this userland is unlike an usual GNU/Linux distribution, or most Linux-based operating systems for that matter. Metropolis does not have an LSB-compliant filesystem root (no /bin, /etc...) and does not run a standard init system / syslog. Instead, all process management is performed within a supervision tree (where supervised processes are called Runnables), and logging is performed within that supervision tree.
+
+The supervision tree and log tree have some strict properties that are unlike a traditional Unix-like init system. Most importantly, any time a runnable restarts due to some unhandled error (or when it explicitly exits), all subordinate runnables will also be restarted.
+
+In a more practical example, when working with Metropolis, you will see log messages like the following:
+
+```
+root.enrolment                   I0228 13:30:45.996960 cluster_bootstrap.go:48] Bootstrapping: mounting new storage...
+root.network.interfaces          I0228 13:30:45.997359 main.go:252] Starting network interface management
+root.time                        R 2022-02-28T13:30:45Z chronyd version 4.1-monogon starting (NTP RTC SCFILTER ASYNCDNS)
+root.network.interfaces.dhcp     I0228 13:30:46.006082 dhcpc.go:632] DISCOVERING => REQUESTING
+root.network.interfaces.dhcp     I0228 13:30:46.008871 dhcpc.go:632] REQUESTING => BOUND
+```
+
+The first column represents a runnable's Distinguished Name. It shows, for example, that the `DISCOVERING => REQUESTING` log line was emitted by a supervision tree runnable named `dhcp`, which was spawned by another runnable named `interfaces`, which in turn was spawned by a runnable named `network`, which in turn was started by the root of the Metropolis Node code.
+
+The Node runnables axioms, supervision tree and log tree are described in more detail in the [Node Runnables and Logging](ch03-02-node-runnables.md) chapter.
+
+Node roles and control plane
+---
+
+Each Node has a set of Roles assigned to it. These roles include, for example running the cluster control plane, running Kubernetes workloads, etc. At runtime, Nodes continuously retrieve the set of roles assigned to them by the cluster and maintain services which are required to fulfill the roles assigned to them. For example, if a node has a 'kubernetes worker' role, it will attempt to run the Kubelet service, amongst others.
+
+```
+
+   .-----------------------.
+   | Cluster Control Plane |
+   |-----------------------|
+   |  Node Configuration   |
+   |    & Node Status      |
+   '-----------------------'
+ Assigned   |      ^ Status
+    roles   v      | updates
+         .------------.
+         |   Node A   |
+         |------------|
+         |            |
+         |  Kubelet   |
+         |            |
+         '------------'
+     
+```
+
+Nodes which have the 'control plane' role run core cluster services which other nodes depend on. These services make up a multi-node consensus which manages cluster configuration and management state. This effectively makes the cluster self-managed and self-contained. That is, the control plane block pictured above is in fact running on nodes in the same way as the Kubelet.
+
+```
+
+.---------------. .---------------. .---------------.
+|    Node A     | |     Node B    | |    Node C     |
+|---------------| |---------------| |---------------|
+| Control Plane | | Control Plane | |       Kubelet |
+| ^             | | ^     Kubelet | |               |
+'-|-------------' '-|-------------' '---------------'
+  |       |         |       |                  |
+  '-------+---------+-------+------------------'
+           Assigned roles & Status updates
+```
+
+The control plane services are described in more detail in the [Cluster](ch03-04-cluster.md) chapter.
+
+The Control Plane services serve requests from Nodes (like the aforementioned retrieval of roles) and Users/Operators (like management requests) over gRPC, via an API named [Cluster API](ch03-05-cluster-api.md).
+
+Identity & Authentication
+---
+
+When Nodes or Users/Operators contact the Cluster API, they need to prove their identity to the Node handling the request. In addition, nodes handling these requests need to prove their identity to the client. This is performed by a providing both sides of the connection with TLS certificates, and with some early communication (when certificates are not yet available) being performed over self-signed certificates to prove ownership of a key.
+
+The TLS Public Key Infrastructure (CA and certificates) is fully self-managed by the Cluster Control Plane, and Users or Operators never have access to the underlying private keys of nodes or the CA. These keys are also stored encrypted within the Node's data partition, so are only available to nodes that have successfully become part of the Cluster. This model is explained and documented further in the [Identity and Authentication](ch-03-06-identity-and-authentication.md) chapter.
+
+In the future, we plan to implement TPM-based Hardware Attestation as part of the early connections of a Node to a Cluster, allowing full cross-node verification, and optionally connections from a User/Manager to a Cluster.
+
diff --git a/metropolis/handbook/src/chapter_1.md b/metropolis/handbook/src/chapter_1.md
deleted file mode 100644
index 6786da2..0000000
--- a/metropolis/handbook/src/chapter_1.md
+++ /dev/null
@@ -1,2 +0,0 @@
-# Chapter 1
-
diff --git a/metropolis/handbook/src/introduction-00-title.md b/metropolis/handbook/src/introduction-00-title.md
new file mode 100644
index 0000000..ac92fd2
--- /dev/null
+++ b/metropolis/handbook/src/introduction-00-title.md
@@ -0,0 +1,25 @@
+# Metropolis, a cluster operating system
+
+> Note: Metropolis is currently in **heavy development**. This documentation is written to *reflect our goals*, not necessarily the current state of the product. You are welcome to give Metropolis a try, but we cannot recommend running it anywhere near production workloads.
+
+Welcome to the *Metropolis Handbook*, the primary documentation resource for Metropolis. Metropolis is a cluster operating system, meaning its goal is to run on a fleet of machines (be it physical or virtual) and pool their resources together into a unified API for operations and developer teams.
+
+Metropolis stands on the shoulders of giants, and takes the best of battle-tested software like the Linux kernel and Kubernetes to build a cohesive, stable, reliable and secure platform.
+
+## What makes Metropolis unique
+
+ 1. **A self-contained operating system**: Metropolis is a full software stack, including the Linux kernel, userspace code, Kubernetes distribution and cluster management system. In contrast to traditional cluster administration, there are no puzzles to put together from a dozen vendors. The entire stack is tested as a single deployable unit.
+ 1. **Eliminates state**: Metropolis nodes don't have a traditional read-write filesystem, all of their state is contained on a separate partition with clear per-component ownership of data. All node configuration is managed declaratively on a per-node basis, and all cluster operations are all done by gRPC API.
+ 1. **No shell, no one-off hacks, no configuration drift**: Metropolis nodes do not run SSH nor depend on low-level system administration tools for day-to-day operations, even debugging.
+ 1. **Opinionated on production readiness**: Metropolis does not attempt to support every possible software configuration, instead focusing on scenarios that make for a high quality production experience .
+ 1. **Robust**: Metropolis builds upon proven technology and does not take risks. Cluster consensus is maintained using the Raft protocol, user and node communication use well-defined gRPC services, while system services are limited in complexity and purpose-built for Metropolis.
+ 1. **Secure at rest**: Metropolis nodes by default encrypt their data partitions and check the integrity of running code, providing tamper resistance and preventing data exfiltration even if an attacker can access a node's disk drives.
+ 1. **Self-locking**: Metropolis can be configured to use TPM hardware attestation, in which cluster membership is limited to nodes that are running trusted versions of the software on trusted hardware.
+ 1. **Not magic**: Metropolis clusters are complex, distributed systems. Managing any distributed system like Metropolis requires some knowledge of core concepts and components involved, and the Metropolis does not attempt to hide that complexity away. Limited internal abstractions and well documented source code lets anyone easily troubleshoot any deeper issues.
+
+## Kubernetes on Metropolis
+
+While we aim to make Metropolis run various kinds of workloads in the future, Metropolis strongly focuses on using Kubernetes as an application platform for users. Workloads can be scheduled on Metropolis using any Kubernetes tools like kubecfg, Tanka or even Helm.
+
+In comparison to other Kubernetes distributions, *Metropolis does not attempt to simplify Kubernetes* by providing extra wrappers or shortcuts for users. Instead, we believe that users should understand the Kubernetes production model and aim to be proficient in its API, as any high-level wrappers only paradoxically introduce complexity.
+
diff --git a/metropolis/handbook/src/introduction-01-how-to-use.md b/metropolis/handbook/src/introduction-01-how-to-use.md
new file mode 100644
index 0000000..4c82367
--- /dev/null
+++ b/metropolis/handbook/src/introduction-01-how-to-use.md
@@ -0,0 +1,52 @@
+# How to use this Handbook
+
+This handbook is the canonical documentation for Metropolis. It aims to document all aspects of Metropolis, from a quick demo, through production deployment to architecture internals.
+
+> Note: **This section is critical to understand the Handbook structure** and must be read by anyone looking to use Metropolis. At the bottom of this page you will find information about which sections to read next, depending on how you want to use Metropolis.
+
+## Who is this book for?
+
+Throughout this book, we will keep using the following terminology regarding groups of people who interact with Metropolis. Note, that these names *do NOT imply that these groups are disjoint*. Instead, think of them as *different hats* people can wear when using Metropolis.
+
+Metropolis does not enforce these roles explicitly, but is designed and engineered in a way that makes the most sense for this kind of organizational structure.
+
+### Operators
+
+Operators are responsible for **managing Metropolis clusters** - for example, bringing nodes into the cluster and decommissioning old nodes, monitoring resource usage and performing capacity planning, ensuring that the cluster is healthy, and responsible for handling cluster-level outages.
+
+Operators are usually part of an organization-wide 'platform' team which acts as an internal 'as-a-Service' service, providing services to Users (like a workload scheduling system, a database service, a storage service...). They might manage some workloads running on Metropolis themselves, too, usually parts of the platform provided to Users (like running organization-wide database clusters on Metropolis).
+
+Operators mostly act as system administrators, but are expected to also be able to use Metropolis APIs from a programming or scripting language of their choice to automate their work and make the most of Metropolis. Metropolis provides a set of management tools that allow management from a command line, but these are only thin wrappers around the underlying API which should be the main way to think about Metropolis.
+
+### Users
+
+Users **run workloads** on Metropolis clusters, via the Kubernetes API. They might know that a cluster runs Metropolis, but this is generally not something they need to worry about - instead, they should be aware of the abstractions which Kubernetes provides. Some limited amount of interaction with Metropolis-specific APIs might exist for purposes of authentication or accounting, but would also be usually hidden away by Operators as part of some organizational integration code.
+
+Critically, however, Metropolis does not provide Users with some 'friendly' higher-level API or tooling which duplicates functionality of the Metropolis API used by Operators. Instead, the Kubernetes API is provided to both Users and Operators in the same fashion so that any internal tooling built on top can be shared between users and Operators.
+
+In addition, Metropolis makes no attempt to hide that it itself and Kubernetes are distributed systems and applications running on top of clusters need to be engineered to handle such a scenario.
+
+In most organizations, Users will be part of product teams, both developing and operating the organization's product or service.
+
+### Developers
+
+Developers **work on the Metropolis codebase**. Metropolis is an open source project, welcomes external contributions and attempts to have a fully open design process. However, any changes introduced must be carefully reviewed and tested - not only external contributions, but also contributions from Monogon employees.
+
+Metropolis comes with high quality developer tooling to work on the codebase - all tests, including full cluster tests, can be run without any special software straight from a Monogon repository checkout.
+
+People who wish to build Metropolis from source (for security, to reproduce official artifacts, or to apply internal organization patches) are also expected to fall into this category. In the future, purpose-specific documentation might be built for software packagers or people who wish to ensure Metropolis builds are reproducible, but that is not the case yet.
+
+## Which sections should be read, and in what order?
+
+If you just want to try out Metropolis, head over to **2. Demo Cluster** and come back here later.
+
+If you're considering deploying Metropolis, you must read **1. Metropolis in your Organization**. It lays some ground concepts of how Metropolis will fit in your organization, what it's good at, and what it's not. It's aimed towards future **Operators** that wish to better understand the relationship between them and Users, but should also be read by organization management teams that will oversee future Operator and User teams/roles.
+
+**Operators** must read the following chapters:
+
+  - **3. Cluster Architecture**, which describes how Metropolis is designed. The information contained therein is crucial to properly plan, deploy and manage a Metropolis cluster. Individual sections of the chapter will be marked if some information is optional in some kinds of deployments, and these parts might be skipped and read as needed later.
+  - **4. Production Deployment**, which describes the standard procedures used to manage Metropolis clusters, troubleshooting procedures.
+ 
+**Users** do not need to read any Metropolis-specific documentation to use Metropolis, and instead should rely on information provided by cluster Operators and the upstream Kubernetes documentation. However, we encourage users to skim through **3. Cluster Architecture** if they are interested in knowing more about the internal Metropolis architecture.
+
+**Developers** are generally expected to start out as **Operators** and thus have read all relevant documentation for Operators already. In addition to that, they are provided with information on how to develop Metropolis in **5. Developing Metropolis**, which gives an introduction to the Metropolis codebase and how to get started writing code.