metropolis/build: add kube-code-generator

This implements kube-code-generator, a set of Bazel rules for generating
Kubernetes resource APIs based on a Go library, using
k8s.io/code-generator.

Test Plan: Was considering adding a test for this - but this is practically best tested with the next change in the stack, which actually uses it to implement the VM hypervisor kube API.

X-Origin-Diff: phab/D751
GitOrigin-RevId: 31e3b632c2e83282c8b2c415402cddea66d4ce51
diff --git a/metropolis/build/kube-code-generator/BUILD.bazel b/metropolis/build/kube-code-generator/BUILD.bazel
new file mode 100644
index 0000000..52d6327
--- /dev/null
+++ b/metropolis/build/kube-code-generator/BUILD.bazel
@@ -0,0 +1,21 @@
+load("@bazel_skylib//rules:common_settings.bzl", "string_flag")
+
+exports_files(["boilerplate.go.txt"])
+
+# Flag determining whether the current build graph traversal is happening for
+# preprocessing by kube-code-generator ('yes'), or not ('no'). Set by
+# preprocessing_transition.
+string_flag(
+    name = "preprocessing",
+    build_setting_default = "no",
+)
+
+# Config setting on which go_libraries embedding go_kubernetes_libraries
+# potentially forming a cycle (eg. deepcopy, which is embedded in the same
+# go_library from which it is generated) can rely on to break this cycle.
+config_setting(
+    name = "embed_deepcopy",
+    flag_values = {
+        ":preprocessing": "no",
+    },
+)
diff --git a/metropolis/build/kube-code-generator/README.md b/metropolis/build/kube-code-generator/README.md
new file mode 100644
index 0000000..7c18289
--- /dev/null
+++ b/metropolis/build/kube-code-generator/README.md
@@ -0,0 +1,17 @@
+kube-code-generator
+===================
+
+A small Bazel rule library for dealing with k8s.io/code-generators.
+
+See defs.bzl for documentation, and `//metropolis/vm/kube/apis` for an example of usage.
+
+Current Limitations
+-------------------
+
+ - Clientset-gen's `versioned/fake` is not generated.
+ - Only the following generators are ran: deepcopy, clientset, informer, lister.
+ - Bazel BUILDfiles for the generated structure must be crafted manually.
+ - Go packages must follow upstream format (group/version). This influences
+   Bazel target structure, which can then look somewhat awkward in a
+   project-oriented monorepo (eg. //foo/bar/widget/kube/apis/widget/v1 has a
+   'widget' stutter.
diff --git a/metropolis/build/kube-code-generator/boilerplate.go.txt b/metropolis/build/kube-code-generator/boilerplate.go.txt
new file mode 100644
index 0000000..ef05f6f
--- /dev/null
+++ b/metropolis/build/kube-code-generator/boilerplate.go.txt
@@ -0,0 +1 @@
+// Code generated by //metropolis/build/kubernetes-code-generator. Do not commit to source control.
diff --git a/metropolis/build/kube-code-generator/defs.bzl b/metropolis/build/kube-code-generator/defs.bzl
new file mode 100644
index 0000000..294fc11
--- /dev/null
+++ b/metropolis/build/kube-code-generator/defs.bzl
@@ -0,0 +1,544 @@
+#  Copyright 2020 The Monogon Project Authors.
+#
+#  SPDX-License-Identifier: Apache-2.0
+#
+#  Licensed under the Apache License, Version 2.0 (the "License");
+#  you may not use this file except in compliance with the License.
+#  You may obtain a copy of the License at
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+#  limitations under the License.
+
+# Bazel rules for generating Kubernetes-style API types using
+# github.com/kubernetes/code-generator.
+#
+# k8s.io/code-gen generators target a peculiar filesystem/package structure for
+# generated code:
+#
+#    example.com/project/apis/goodbye/v1/doc.go          - hand written
+#                                       /types.go        - hand written
+#                                       /zz_generated.go - generated
+#    example.com/project/apis/hello/v1/doc.go            - hand written
+#                                     /types.go          - hand written
+#                                     /zz_generated.go   - generated
+#    example.com/project/generated/clientset/...         - generated
+#    example.com/project/generated/informers/...         - generated
+#    example.com/project/generated/listers/...           - generated
+#
+# This means, that usually the generated files are both colocated directly
+# with the package (for zz_generated.deepcopy.go, generated by deepcopy)
+# and have their own package (for files generated by clientset, informers,
+# listers).
+#
+# Most importantly, however, multiple Go packages (in the above example,
+# goodbye/v1 and hello/v1) are used to in turn generate multiple output Go
+# packages. This proves problematic when generating code for Bazel, as we have
+# to consume the result of code generation from multiple Bazel targets (each
+# representing a different generated Go package).
+#
+# To handle this, we split up the code generation into four main steps. These
+# need to be manually instantiated by any use who wants to consume these rules.
+#
+#  1. Create a rules_go go_path that contains all hand-written API packages.
+#  2. Parse hand written API packages using go_kubernetes_resource_bundle, via
+#     the generated go_path. This prepares outputs (ie., 'runs' generators),
+#     and creates an internal provider (KubeResourceBundle) that contains
+#     informations about generated Go packages.
+#  3. For every output package, create a go_kubernetes_library target,
+#     specifying the bundle from which it's supposed to be created, and which
+#     particular library from that bundle should be used.
+#  4. Next to every go_kubernetes_library, create a go_library with the same
+#     importpath which embeds the go_kubernetes_library. The split between
+#     go_kubernetes_library and go_library is required to let Gazelle know
+#     about the availability of given importpaths at given Bazel targets
+#     (unfortunately, it seems like Gazelle is unable to parse rules that emit
+#     the same providers as go_library does, instead requiring us to use a full
+#     go_library rule instead).
+#
+# Point 3. is somwhat different for the generated deepcopy code, which has to
+# live alongside (in the same importpath) as the hand-written API go_library,
+# and needs to be embedded into that. Special care has to be taken to not cause
+# a cycle (handwritten API -> go_path -> bundle -> kubernetes_library ->
+# go_library) in this case. This is done via a select() which selectively
+# enables the inclusion of the generated deepcopy code within the hand-written
+# API library, only enabling it for the target build, not the preprocessing
+# done by go_kubernetes_resource_bundle.
+#
+# Or, in graphical form:
+#
+#   .------------.   .------------.
+#   | go_library |   | go_library |
+#   |------------|   |------------|
+#   | goodbye/v1 |   | hello/v2   |
+#   '------------'   '------------'
+#        '------. .--------'
+#           .---------.
+#           | go_path |
+#           '---------'
+#                | (preprocessing transition)
+#  .-------------------------------.
+#  | go_kubernetes_resource_bundle |
+#  '-------------------------------'
+#              |   '--------------------------.
+#  .---------------------------.  .--------------------------. 
+#  |   go_kubernetes_library   |  |   go_kubernetes_library  |
+#  |---------------------------|  |--------------------------| ... others ...
+#  | clientset/verioned/typed  |  | clientset/verioned/fake  |
+#  '---------------------------'  '--------------------------'
+#              |                               |
+#  .---------------------------.  .--------------------------. 
+#  |       go_library          |  |       go_library         |
+#  |---------------------------|  |--------------------------| ... others ...
+#  | clientset/versioned/typed |  | clientset/versioned/fake |
+#  '---------------------------'  '--------------------------'
+#
+
+load("@io_bazel_rules_go//go:def.bzl", "go_context", "GoPath", "GoLibrary")
+
+def _preprocessing_transition_impl(settings, attr):
+    return { "//metropolis/build/kube-code-generator:preprocessing": "yes" }
+
+
+# preprocessing_transition is attached to the incoming go_path in
+# go_kubernetes_resource_bundle, unsets the
+# //metropolis/build/kube-code-generator:embed_deepcopy config setting.
+# This allows go_libraries that make up the handwritten API libraries to only
+# embed the generated deepcopy when they are pulled in for build reasons, not
+# when the graph is being traversed in order to generate the deepcopy itself.
+# This breaks up the cycle that would happen otherwise.
+preprocessing_transition = transition(
+    implementation = _preprocessing_transition_impl,
+    inputs = [],
+    outputs = ["//metropolis/build/kube-code-generator:preprocessing"],
+)
+
+# KubeResourceBundle is emitted by go_kubernetes_resource_bundle and contains
+# informations about libraries generated by the kubernetes code-generators.
+KubeResourceBundle = provider(
+    "Information about the generated Go sources of a k8s.io/code-generator-built library.",
+    fields = {
+        "libraries": "Map from Go importpath to list of Files that make up this importpath.",
+    },
+)
+
+def _go_kubernetes_library_impl(ctx):
+    go = go_context(ctx)
+    bundle = ctx.attr.bundle[KubeResourceBundle]
+    libraries = bundle.libraries
+
+    found_importpaths = [l.importpath for l in libraries]
+
+    libraries = [l for l in libraries if l.importpath == ctx.attr.importpath]
+    if len(libraries) < 1:
+        fail("importpath {} not found in bundle (have {})".format(ctx.attr.importpath, ", ".join(found_importpaths)))
+    if len(libraries) > 1:
+        fail("internal error: multiple libraries with importpath {} found in bundle".format(ctx.attr.importpath))
+    library = libraries[0]
+
+    source = go.library_to_source(go, ctx.attr, library, ctx.coverage_instrumented())
+    return [library, source]
+
+
+# go_kubernetes_library picks a single Go library from a kube_resource_bundle
+# and prepares it for being embedded into a go_library.
+go_kubernetes_library = rule(
+    implementation = _go_kubernetes_library_impl,
+    attrs = {
+        "bundle": attr.label(
+            mandatory = True,
+            providers = [KubeResourceBundle],
+            doc = "A go_kubernetes_resource_bundle that contains the result of a kubernetes code-generation run.",
+        ),
+        "importpath": attr.string(
+            mandatory = True,
+            doc = "The importpath of the library picked from the bundle, same as the importpath of the go_library that embeds it.",
+        ),
+        "deps": attr.label_list(
+            providers = [GoLibrary],
+            doc = "All build dependencies of this library.",
+        ),
+
+        "_go_context_data": attr.label(
+            default = "@io_bazel_rules_go//:go_context_data",
+        ),
+
+    },
+    toolchains = ["@io_bazel_rules_go//go:toolchain"],
+)
+
+# _gotool_run is a helper function which runs an executable under
+# //metropolis/build/gotoolwrap, effectively setting up everything required to
+# use standard Go tooling on the monogon workspace (ie. GOPATH/GOROOT). This is
+# required by generators to run 'go fmt'.
+def _gotool_run(ctx, executable, arguments, **kwargs):
+    go = go_context(ctx)
+    gopath = ctx.attr.gopath[0][GoPath]
+
+    inputs = [
+        gopath.gopath_file,
+    ] + kwargs.get('inputs', [])
+
+    tools = [
+        executable,
+        go.sdk.go,
+    ] + go.sdk.tools + kwargs.get('tools', [])
+
+    env = {
+        "GOTOOLWRAP_GOPATH": gopath.gopath_file.path,
+        "GOTOOLWRAP_GOROOT": go.sdk.root_file.dirname,
+    }
+    env.update(kwargs.get('env', {}))
+
+    kwargs_ = dict([(k, v) for (k,v) in kwargs.items() if k not in [
+        'executable', 'arguments', 'inputs', 'env', 'tools',
+    ]])
+
+    ctx.actions.run(
+        executable = ctx.executable._gotoolwrap,
+        arguments = [ executable.path ] + arguments,
+        env = env,
+        inputs = inputs,
+        tools = tools,
+        **kwargs_,
+    )
+
+
+# _output_directory returns the relative path into which
+# ctx.action.declare_file writes are rooted. This is used as code-generators
+# require a root path for all outputted files, instead of a list of files to
+# emit.
+def _output_directory(ctx):
+    # We combine bin_dir, the BUILDfile path and the target name. This seems
+    # wrong. Is there no simpler way to do this?
+    buildfile_path = ctx.build_file_path
+    parts = buildfile_path.split('/')
+    if not parts[-1].startswith('BUILD'):
+        fail("internal error: unexpected BUILD file path: {}", parts[-1])
+    package_path = '/'.join(parts[:-1])
+    return '/'.join([ctx.bin_dir.path, package_path, ctx.attr.name])
+
+
+# _cg returns a 'codegen context', a struct that's used to accumulate the
+# results of code generation. It assumes all output will be rooted in a
+# generated importpath (with more 'shortened' importpaths underneath the root),
+# and collects outputs to pass to the codegen execution action. It also
+# collects a map of importpaths to outputs that make it up.
+def _cg(ctx, importpath):
+    output_root = _output_directory(ctx)
+
+    return struct(
+        # The 'root' importpath, under which 'shortened' importpaths reside.
+        importpath = importpath,
+        # The prefix into which all files will be emitted. We use the target
+        # name for convenience.
+        output_prefix = ctx.attr.name,
+        # The full relative path visible to the codegen, pointing to the same
+        # directory as output_prefix (just from the point of view of the
+        # runtime filesystem, not the ctx.actions filepath declaration API).
+        output_root = output_root,
+        # The list of outputs that have to be generated by the codegen.
+        outputs = [],
+        # A map of importpath to list of outputs (from the above list) that
+        # make up a generated Go package/library.
+        libraries = {},
+
+        ctx = ctx,
+    )
+
+
+# _declare_library adds a single Go package/library at importpath to the
+# codegen context with the given file paths (rooted in the importpath).
+def _declare_library(cg, importpath, files):
+    importpath = cg.importpath + "/" + importpath
+    cg.libraries[importpath] = []
+    for f in files:
+        output = cg.ctx.actions.declare_file("{}/{}/{}".format(
+            cg.output_prefix,
+            importpath,
+            f,
+        ))
+        cg.outputs.append(output)
+        cg.libraries[importpath].append(output)
+
+
+# _declare_libraries declares multiple Go package/libraries to the codegen
+# context. The key of the dictionary is the importpath of the library, and the
+# value are the file names of generated outputs.
+def _declare_libraries(cg, libraries):
+    for k, v in libraries.items():
+        _declare_library(cg, k, v)
+
+
+# _codegen_clientset runs the clientset codegenerator.
+def _codegen_clientset(ctx):
+    cg = _cg(ctx, ctx.attr.importpath)
+
+    _declare_libraries(cg, {
+        "clientset/versioned": ["clientset.go", "doc.go"],
+        "clientset/versioned/fake": ["register.go", "clientset_generated.go"],
+        "clientset/versioned/scheme": ["register.go", "doc.go"],
+    })
+
+    for api, types in ctx.attr.apis.items():
+        client_name = api.split("/")[-2]
+        _declare_libraries(cg, {
+            "clientset/versioned/typed/{}".format(api): [
+                "doc.go", "generated_expansion.go",
+                "{}_client.go".format(client_name),
+            ] + [
+                "{}.go".format(t) for t in types
+            ],
+            "clientset/versioned/typed/{}/fake".format(api): [
+                "doc.go",
+                "fake_{}_client.go".format(client_name),
+            ],
+        })
+
+    _gotool_run(ctx,
+        mnemonic = "ClientsetGen",
+        executable = ctx.executable._client_gen,
+        arguments = [
+            "--clientset-name", "versioned",
+            "--input-base", ctx.attr.apipath,
+            "--input", ",".join(ctx.attr.apis),
+            "--output-package", cg.importpath + "/clientset",
+            "--output-base", cg.output_root,
+            "--go-header-file", ctx.file.boilerplate.path,
+        ],
+        inputs = [
+            ctx.file.boilerplate,
+        ],
+        outputs = cg.outputs,
+    )
+
+    return cg.libraries
+
+
+# _codegen_deepcopy runs the deepcopy codegenerator (outputting to the apipath,
+# not the importpath).
+def _codegen_deepcopy(ctx):
+    cg = _cg(ctx, ctx.attr.apipath)
+
+    for api, types in ctx.attr.apis.items():
+        _declare_libraries(cg, {
+            api: ["zz_generated.deepcopy.go"],
+        })
+
+    _gotool_run(
+        ctx,
+        mnemonic = "DeepcopyGen",
+        executable = ctx.executable._deepcopy_gen,
+        arguments = [
+            "--input-dirs", ",".join(["{}/{}".format(ctx.attr.apipath, api) for api in ctx.attr.apis]),
+            "--go-header-file", ctx.file.boilerplate.path,
+            "--stderrthreshold", "0",
+            "-O", "zz_generated.deepcopy",
+            "--output-base", cg.output_root,
+            ctx.attr.apipath,
+        ],
+        inputs = [
+            ctx.file.boilerplate,
+        ],
+        outputs = cg.outputs,
+    )
+    return cg.libraries
+
+
+# _codegen_informer runs the informer codegenerator.
+def _codegen_informer(ctx):
+    cg = _cg(ctx, ctx.attr.importpath)
+
+    _declare_libraries(cg, {
+        "informers/externalversions": [ "factory.go", "generic.go" ],
+        "informers/externalversions/internalinterfaces": [ "factory_interfaces.go" ],
+    })
+
+    for api, types in ctx.attr.apis.items():
+        client_name = api.split("/")[-2]
+        _declare_libraries(cg, {
+            "informers/externalversions/{}".format(client_name): [ "interface.go" ],
+            "informers/externalversions/{}".format(api): [
+                "interface.go",
+            ] + [
+                "{}.go".format(t)
+                for t in types
+            ],
+        })
+
+    _gotool_run(
+        ctx,
+        mnemonic = "InformerGen",
+        executable = ctx.executable._informer_gen,
+        arguments = [
+            "--input-dirs", ",".join(["{}/{}".format(ctx.attr.apipath, api) for api in ctx.attr.apis]),
+            "--versioned-clientset-package", "{}/clientset/versioned".format(ctx.attr.importpath),
+            "--listers-package", "{}/listers".format(ctx.attr.importpath),
+            "--output-package", "{}/informers".format(ctx.attr.importpath),
+            "--output-base", cg.output_root,
+            "--go-header-file", ctx.file.boilerplate.path,
+        ],
+        inputs = [
+            ctx.file.boilerplate,
+        ],
+        outputs = cg.outputs,
+    )
+
+    return cg.libraries
+
+
+# _codegen_lister runs the lister codegenerator.
+def _codegen_lister(ctx):
+    cg = _cg(ctx, ctx.attr.importpath)
+
+    for api, types in ctx.attr.apis.items():
+        client_name = api.split("/")[-2]
+        _declare_libraries(cg, {
+            "listers/{}".format(api): [
+                "expansion_generated.go",
+            ] + [
+                "{}.go".format(t)
+                for t in types
+            ]
+        })
+
+    _gotool_run(
+        ctx,
+        mnemonic = "ListerGen",
+        executable = ctx.executable._lister_gen,
+        arguments = [
+            "--input-dirs", ",".join(["{}/{}".format(ctx.attr.apipath, api) for api in ctx.attr.apis]),
+            "--output-package", "{}/listers".format(ctx.attr.importpath),
+            "--output-base", cg.output_root,
+            "--go-header-file", ctx.file.boilerplate.path,
+            "-v", "10",
+        ],
+        inputs = [
+            ctx.file.boilerplate,
+        ],
+        outputs = cg.outputs,
+    )
+
+    return cg.libraries
+
+
+# _update_dict_check is a helper function that updates dict a with dict b,
+# ensuring there's no overwritten keys.
+def _update_dict_check(a, b):
+    for k in b.keys():
+        if k in a:
+            fail("internal error: repeat importpath {}", k)
+    a.update(b)
+
+
+def _go_kubernetes_resource_bundle_impl(ctx):
+    go = go_context(ctx)
+
+    all_gens = {}
+    _update_dict_check(all_gens, _codegen_clientset(ctx))
+    _update_dict_check(all_gens, _codegen_deepcopy(ctx))
+    _update_dict_check(all_gens, _codegen_informer(ctx))
+    _update_dict_check(all_gens, _codegen_lister(ctx))
+
+    libraries = []
+    for importpath, srcs in all_gens.items():
+        library = go.new_library(
+            go,
+            srcs = srcs,
+            importpath = importpath,
+        )
+        libraries.append(library)
+
+    return [KubeResourceBundle(libraries=libraries)]
+
+
+# go_kubernetes_resource_bundle runs kubernetes code-generators on a codepath
+# for some requested APIs, and whose output can be made into Go library targets
+# via go_kubernetes_library. This bundle corresponds to a single Kubernetes API
+# resource group.
+go_kubernetes_resource_bundle = rule(
+    implementation = _go_kubernetes_resource_bundle_impl,
+    attrs = {
+        "gopath": attr.label(
+            mandatory = True,
+            providers = [GoPath],
+            cfg = preprocessing_transition,
+            doc = "A rules_go go_path that contains all the API libraries for which codegen should be run.",
+        ),
+
+        "importpath": attr.string(
+            mandatory = True,
+            doc = """
+                The root importpath of the generated code (apart from deepcopy
+                codegen). The Bazel target path corresponding to this
+                importpath needs to contain the go_kubernetes_library and
+                go_library targets that allow to actually build against the
+                generated code.
+            """,
+        ),
+
+        "apipath": attr.string(
+            mandatory = True,
+            doc = "The root importpath of the APIs for which to generate code.",
+        ),
+        "apis": attr.string_list_dict(
+            mandatory = True,
+            doc = """
+                The APIs underneath importpath for which to generated code,
+                eg. foo/v1, mapping into a list of lowercased types generated
+                from each (eg. widget for `type Widget struct`).
+            """,
+        ),
+
+        "boilerplate": attr.label(
+            default = "//metropolis/build/kube-code-generator:boilerplate.go.txt",
+            allow_single_file = True,
+            doc = "Header that will be used in the generated code.",
+        ),
+
+        "_go_context_data": attr.label(
+            default = "@io_bazel_rules_go//:go_context_data",
+        ),
+
+        "_gotoolwrap": attr.label(
+            default = Label("//metropolis/build/gotoolwrap"),
+            allow_single_file = True,
+            executable = True,
+            cfg = "exec",
+        ),
+
+        "_deepcopy_gen": attr.label(
+            default = Label("@io_k8s_code_generator//cmd/deepcopy-gen"),
+            allow_single_file = True,
+            executable = True,
+            cfg = "exec",
+        ),
+        "_client_gen": attr.label(
+            default = Label("@io_k8s_code_generator//cmd/client-gen"),
+            allow_single_file = True,
+            executable = True,
+            cfg = "exec",
+        ),
+        "_informer_gen": attr.label(
+            default = Label("@io_k8s_code_generator//cmd/informer-gen"),
+            allow_single_file = True,
+            executable = True,
+            cfg = "exec",
+        ),
+        "_lister_gen": attr.label(
+            default = Label("@io_k8s_code_generator//cmd/lister-gen"),
+            allow_single_file = True,
+            executable = True,
+            cfg = "exec",
+        ),
+
+        "_allowlist_function_transition": attr.label(
+            default = "@bazel_tools//tools/allowlists/function_transition_allowlist"
+        )
+    },
+    toolchains = ["@io_bazel_rules_go//go:toolchain"],
+)