Skip to main content

Command Palette

Search for a command to run...

Part III: The Beam Connects

Architecting existence - How the 'Beam' broke the container's isolation to connect it to the world

Updated
51 min read
Part III: The Beam Connects
R
Still learning new tech with the curiosity of a junior developer, but with the battle scars of a 50-year-old senior. Currently letting AI write my boilerplate while I aggressively judge its architecture.

"The world has moved on. But the Tower stands and the Beams hold. How long the Beams will hold I cannot say, but they hold now. And that, for now, is enough."

— Stephen King, Wizard and Glass

"It was the turtle. It had to do with the turtle."

— Stephen King, It

"All things serve the Beam."

— Roland Deschain, The Dark Tower series


Prologue: The Void

The container breathed. That was the achievement of Part II—a container process, alive inside its namespaces, with a correct rootfs mounted, a correct OCI Runtime Spec describing its identity, a correct crun invocation initiating its lifecycle. maestro run alpine:latest echo hello returned hello. maestro run alpine:latest sh -c "cat /etc/os-release" returned the Alpine release information. The container was alive.

But it was alone. Completely, architecturally, involuntarily alone.

In Linux, a container's network namespace is, by default, empty: no interfaces except the loopback, no routes, no DNS, no way to send or receive packets across the boundary of its own namespace. The loopback interface allowed a container to communicate with itself—a server and a client on the same 127.0.0.1, useful for testing but useless for production. The outside world did not exist inside the network namespace. ping 8.8.8.8 returned Network unreachable. curl https://example.com returned Failed to connect. The container was a mind in a sealed room—fully functional by every internal measure, completely cut off from every external one.

The networking subsystem that would correct this was the beam package and its satellite packages: todash (network namespace management), guardian (CNI plugin lifecycle), doorway (port mapping), mejis (rootless networking via pasta or slirp4netns). These packages formed the Beam—one of the great structural pillars of the container runtime, the mechanism by which a container's inner world connected to the host's outer world, and through the host, to the internet, to other containers, to the services that made containerization worthwhile.

The Beams in Stephen King's Dark Tower series hold the Tower upright—twelve directional lines of force extending from the Tower in all directions, each one a pillar of cosmic structure. If the Beams fail, the Tower falls. If the Tower falls, reality ends. The metaphor held: without networking, the container runtime would serve no production purpose. HTTP servers with no inbound connections, databases with no client access, web crawlers with no outbound routes—all were inert. The Beam was not optional infrastructure. It was existence.

And building it was the hardest work of Phase 1.

The difficulty was not in any single component of the network stack. It was in the composition—the fact that correctly connecting a container to the host network required simultaneous correctness across six independent mechanisms: user namespace setup, network namespace creation, the namespace holder process, the pasta invocation sequencing, port forwarding direction, and rootfs permission configuration. Each mechanism could be individually correct while the composition was broken. The integration testing—the nginx smoke test—was the only verification that all six were simultaneously correct.

There was no unit test that could verify the composition. You could test that todash created a network namespace, that guardian correctly invoked CNI plugins, that doorway parsed port specifications, that mejis constructed the correct pasta command line. But the test that said "a request sent to port 8080 on the host reaches port 80 inside the container and nginx responds with the welcome page" was by definition an integration test—it required all six mechanisms to work together in their real kernel environment, with real namespaces, real pasta processes, and real HTTP connections.

This was the lesson that component-level testing alone could not teach: integration failures were qualitatively different from component failures. Component failures were detectable in isolation. Integration failures were invisible in isolation and only appeared when the full stack was assembled and exercised end-to-end. The three smoke tests—alpine echo, volume mount, nginx welcome—were not supplementary to the unit test suite. They were necessary in a way the unit tests were not. Without them, the project would have shipped a 100%-covered binary that couldn't actually run a web server.


Chapter 1: Beam Core — The Architecture of Connection

The beam package's architecture was designed around four distinct concerns, each handled by a dedicated subpackage:

todash managed network namespace creation and entry: creating a new named network namespace for a container, storing its path in the waystation, entering it (with runtime.LockOSThread() discipline, as the Todash bug had taught) before invoking network setup tools, and exiting it cleanly via the cleanup function pattern.

guardian managed the CNI plugin lifecycle: downloading the required CNI plugins if they were not already present (via an HTTP downloader with SHA-256 verification), discovering the correct plugin configuration for a container's requested network mode, and executing the CNI ADD/DEL operations by invoking the plugin binaries with the correct JSON input.

doorway handled port mapping: parsing user-supplied port specifications like 8080:80/tcp into structured PortMapping structs, translating them into the format expected by the pasta networking tool, and recording the mappings in the container state for maestro port inspection.

mejis implemented the rootless networking integration: running the pasta or slirp4netns process that connected the container's network namespace to the host's network via userspace TCP/IP emulation, with correct namespace entry sequencing, correct command-line argument construction for each supported tool's different CLI, and cleanup on container teardown. The mejis name came from the Mejis region in Roland's world—the grasslands and river valleys where Roland spent his adolescence, connected to the larger world by the roads that passed through it. The network namespace was the container's Mejis: bounded, self-contained, but connected to the larger world by the pasta process that threaded through it.

The wiring between these subpackages was orchestrated by the beam.Attach and beam.Detach operations, called by gan during container creation and removal. Attach created the namespace, ran the CNI plugins if configured, started the pasta process, waited for the network interface to appear in the container's namespace (verified via the mountinfo polling pattern learned from the FUSE integration), and recorded the full network state in waystation. Detach ran the CNI DEL operation to clean up any CNI-managed interfaces, sent SIGTERM to the pasta process, waited for it to exit, deleted the network namespace mount, and removed the network state from waystation. The lifecycle was symmetric and complete: every resource created by Attach was destroyed by Detach.

[Engineering Sidebar: The CNI Specification]

The Container Network Interface (CNI) specification is a standard for container networking plugins. A CNI plugin is a binary that receives a JSON configuration on stdin and performs network setup when called with the ADD argument, or network teardown when called with DEL. The configuration describes the network parameters (subnet, gateway, IPAM mode), and the plugin is responsible for creating the necessary virtual network interfaces, assigning IP addresses, setting up routing, and configuring NAT masquerade.

CNI was designed for orchestrated environments (Kubernetes, Nomad) where each container gets a routable IP address within a pod network. In a rootless single-host environment like Maestro's primary use case, full CNI setup (bridge networks, IPAM, routing) is heavier than necessary. The beam package supported CNI primarily as an optional path for advanced networking configurations and for compatibility with environments where CNI was already configured; the primary rootless path used pasta directly, bypassing CNI entirely.

The CNI plugin binaries—bridge, host-local, loopback, portmap, bandwidth—were downloaded by guardian at first run from the official CNI releases, verified by SHA-256 against a pinned checksum, and stored in the Maestro data directory. The download logic used a secure extraction protocol: the release tarball was downloaded to a temporary file, its SHA-256 was verified before extraction began, and extraction was performed with path sanitization to prevent zip-slip attacks (path components starting with ../ or / were rejected before writing).


Chapter 2: The CNI Downloader and Secure Extraction

The guardian package's CNI downloader was the first component in Phase 1 that needed to download and extract arbitrary binary content from the internet—a threat surface that required explicit security engineering rather than casual implementation.

The naive approach: http.Get(url), extract the tarball, run the plugins. This approach was vulnerable to at least four attacks: (1) MITM attacks serving malicious binaries if HTTPS wasn't enforced, (2) hash-mismatch attacks where the download URL was compromised and served modified content, (3) zip-slip attacks where a malicious tarball contained paths like ../../.ssh/authorized_keys that would escape the intended extraction directory, and (4) symlink attacks where an entry in the tarball was a symlink pointing outside the target directory.

The implemented downloader addressed all four:

HTTPS enforcement: The download URL was validated to use https:// before the request was made. HTTP URLs were rejected outright.

SHA-256 verification: The pinned checksum for each supported CNI release version was embedded in the Maestro binary as a compile-time constant. After download, the tarball's SHA-256 was computed and compared to the pinned value before any extraction. A mismatch returned an error and deleted the downloaded file. This meant that a compromised CDN, a compromised DNS entry, or a MITM performing SSL stripping could not serve a malicious binary—not because Maestro checked the TLS certificate (Go's http.Client did that by default), but because even a valid TLS connection to a compromised server could not produce content that matched the pinned hash.

Path sanitization: Each entry extracted from the tarball was checked for ../ path components and for absolute paths (/-prefixed). Any entry containing either was skipped with a warning rather than extracted. The extraction path was constructed using filepath.Join rather than string concatenation, which automatically resolves ../ components away from the intended target. A test case with a tarball containing ../../../etc/cron.daily/pwned verified that this path was silently skipped and no file was written outside the target directory.

Symlink rejection: Tarball entries with type tar.TypeSymlink were rejected entirely during extraction. CNI plugin binaries were plain executable files; there was no legitimate reason for the release tarball to contain symlinks, and symlinks were a well-documented attack vector in malicious archives (a symlink pointing to /dev/null could redirect subsequent writes, a symlink pointing to /etc/passwd could exfiltrate file reads). The rejection was explicit and logged at debug level so that if a future CNI release legitimately included symlinks, the rejection would be visible.

The test suite for the CNI downloader was a comprehensive attack matrix: tarballs with ../ path entries, tarballs with absolute paths, tarballs with symlinks pointing to sensitive locations, servers returning wrong checksums, servers returning HTTP instead of HTTPS, servers returning 404 mid-download, servers closing the connection mid-transfer, servers returning a valid response with a wrong Content-Length header causing truncation. Each scenario had a test that verified the downloader rejected it correctly, without writing any content to the target directory and without leaving temporary files behind. The temporary file cleanup was tested by verifying the os.TempDir state before and after failed downloads.

This test matrix was not hypothetical paranoia. The go get supply chain attack of 2022, where malicious module redirects served hostile code to Go build pipelines, demonstrated the real-world risk of download-and-execute patterns in tooling. The node_modules compromises that periodically affected npm packages were the same class of attack applied to a different ecosystem. Maestro was downloading binaries that would be executed with the user's all namespace capabilities. Getting the security model wrong would mean that maestro run on a compromised network could result in an attacker binary running in the user's session disguised as a CNI plugin.

The decision to pin SHA-256 hashes at compile time—rather than fetching a signature file from the same server or trusting TLS alone—was a defense-in-depth choice: it ensured that even a future compromise of the CNI release infrastructure could not retroactively backdoor Maestro installations that had already downloaded binaries. The pinned hashes changed only when Maestro itself was updated; updating Maestro required rebuilding from source or downloading a new Maestro binary verified by the same mechanisms.


Chapter 3: The Great Deadlock

The first attempt to run a networked container—to invoke beam.Attach during container creation, start the pasta process for network connectivity, and then run the container—produced the most confounding failure of Phase 1:

The maestro run command hung. Not silently, as the early Silent Hang had—this hang produced output. The network namespace was created, pasta started, the CNI ADD completed, the OCI bundle was assembled, the crun process was invoked—and then hung, consuming negligible CPU, making no progress. The container was alive by every measure: maestro list showed it in the creating state, ps aux showed crun running, /proc/crun-pid/status showed it in S (sleeping) state. But nothing was happening.

strace -p <crun-pid> showed crun blocked in read() on a file descriptor. lsof showed the file descriptor was the stdout pipe of the container init process. The container init process was also blocked in a write() call. But nothing was reading from a pipe that a container init process was writing to.

The deadlock was architectural. It emerged from the intersection of Go's goroutine-based pipe management, the OCI create/start lifecycle split, and the Go exec.Cmd API's hidden design choices.

When exec.Cmd is created with a non-nil Stdout field (set to an io.Writer like a bytes.Buffer or os.Stdout), the Go runtime internally creates an OS pipe for that file descriptor and starts a goroutine to copy data from the pipe's read end to the io.Writer. The running command—in this case, crun—writes to the pipe's write end. The goroutine reads from the pipe's read end and copies to the io.Writer. When the process exits, the pipe is closed, the goroutine ends, and exec.Cmd.Wait() returns.

This worked correctly for simple run-style workflows where create and start were called in a single step. But the OCI lifecycle was a two-step process: create put the container in a paused state, and start was a subsequent command that resumed execution. After create completed, the init process was in the kernel-level pause state—alive, holding all its file descriptors open—waiting for start. The init process was alive and holding the write end of the stdout pipe open. On the Go side, the goroutine that was copying stdout was waiting for the pipe to be closed or for the init process to write data. Neither happened, because the init process was paused, waiting for the start signal that would never come.

When start was subsequently called, it also needed to interact with the container via crun. The new crun invocation for start needed to write to crun's own stdout, which was a different pipe. But the Go runtime's goroutine management for the original create invocation was still in flight, holding references to the first pipe. The combined state—two exec.Cmd invocations managing pipes to the same crun / container init process complex—was undefined in the Go runtime's pipeline model and produced the hang.

The deadlock was a specific instance of a general class of problem: Go's exec.Cmd API was designed around the assumption that a subprocess was a complete, atomic, single-invocation unit—you start it, it runs, you wait for it, it exits. The OCI runtime lifecycle broke that assumption by treating create and start as separate, sequential operations on a shared underlying process. The Go API was being asked to manage state across two Cmd instances that shared actual kernel resources (file descriptors, pipes), and it wasn't designed for that scenario.

The fix was mechanical sympathy: instead of passing an io.Writer to exec.Cmd.Stdout, pass an *os.File directly.

// Before — caused deadlock
logBuf := &strings.Builder{}
cmd.Stdout = logBuf  // io.Writer: Go creates internal pipe and goroutine

// After — correct
logFile, _ := os.OpenFile(logPath, os.O_CREATE|os.O_WRONLY|os.O_APPEND, 0600)
defer logFile.Close()
cmd.Stdout = logFile  // *os.File: Go passes fd directly, no goroutine

When exec.Cmd.Stdout is an *os.File, Go does not create an intermediary pipe. It passes the underlying file descriptor to os/exec's platform-specific process setup code, which passes it directly to the child process via fork()/exec()'s file descriptor inheritance. The child process (crun) writes directly to the file. There is no goroutine managing a pipe, no goroutine to block on pipe closure, no shared state between the create and start invocations. Each crun invocation received its own clean set of file descriptors pointing to the log file. The OCI lifecycle proceeded correctly.

The exec.Cmd documentation noted this distinction in a single sentence: *"If Stdout is an os.File, the subprocess can write to it without any copying overhead." The overhead comment was about performance; the correctness implication—that *os.File avoided the goroutine entirely—was not explicitly stated. This was the kind of API subtlety that only revealed itself in unusual usage patterns like the OCI create/start split.

The lesson was simple to state, less simple to internalize: when interfacing Go's high-level subprocess management API with low-level process lifecycle protocols (like the OCI runtime spec's create/start separation), always use the lowest-level API—*os.File—to avoid hidden goroutine state that the high-level wrapper introduces. When in doubt, trace the file descriptor flow from the parent process to the child and verify there are no intermediary buffers in the path.


Chapter 4: The EINVAL Paradox and the EPERM Barrier

With the deadlock resolved, the networking integration proceeded to two further kernel-level errors that blocked correct operation. These errors appeared with no preceding warning, their messages disconnected from any obvious cause, their resolution requiring a deep understanding of how Linux namespace operations interacted with each other across process boundaries.

EINVAL on namespace entry: When todash attempted to enter the container's user namespace before configuring its network interface, the kernel returned EINVAL from the setns(2) call. EINVAL from setns was documented as meaning "the file descriptor doesn't refer to a namespace" or "the thread is in a multi-threaded process that doesn't meet the requirements for entering this namespace." The second condition was the culprit—but the specific restriction was not about multi-threading in general. It was about existing user namespace depth.

Linux imposes a strict rule: a process that is already in a non-initial user namespace cannot enter another non-initial user namespace via setns. The architecture of user namespace nesting required that transitions to a new user namespace happen through fork+exec (the child starts in the new namespace) rather than through setns (the existing process transitions). The rationale was security: allowing setns across user namespace boundaries could create privilege escalation paths if a process could selectively enter a namespace where it had more capability than in its current namespace.

The Maestro process running todash.enterNamespace was already in a non-initial user namespace (the user namespace created for the rootless execution context). Calling setns to enter the container's user namespace—a child of the Maestro user namespace—violated this rule. EINVAL was the kernel saying "you cannot enter a door you cannot reach from here."

The solution was to not attempt to enter the container's user namespace from the Maestro parent process. Instead: the network namespace configuration that required user namespace context was delegated to the container init process itself (which started inside the user namespace by crun's design) or was performed from the host's initial user namespace by the network namespace holder process, which was started before any user namespace was entered.

EPERM on mount propagation: When prim attempted to set up the rootfs bind mount with the MS_BIND | MS_REC | MS_SHARED flags, the kernel returned EPERM. The MS_SHARED flag requested that the bind mount be propagated as a shared mount—meaning that mounts performed inside the container's mount namespace would automatically propagate back to the host mount namespace, and vice versa. This was the correct behavior for some use cases (nested containers that needed to share mount events), but it required CAP_SYS_ADMIN because "shared" propagation affected the host mount namespace, which was outside the user namespace's authority.

The solution was MS_SLAVE propagation instead of MS_SHARED. A slave mount received propagation events from its master (host mounts propagated into the container's view), but its own events did not propagate back up. This was the right model for an isolated container: a container should see new host mounts if the host creates them, but should not be able to affect the host's mount namespace by creating mounts inside itself. MS_SLAVE required only the capabilities available inside the user namespace—no CAP_SYS_ADMIN outside.

Both the EINVAL fix and the EPERM fix were non-obvious. Neither was documented in the Go standard library, neither was mentioned in the OCI Runtime Spec, and neither appeared in a generic Linux tutorial. They came from reading the kernel source, from comparing Maestro's behavior to Podman's (which had solved the same problems), and from the systematic discipline of recording the exact kernel error code, looking it up in the setns(2) and mount(2) man pages, and reading the "ERRORS" section carefully enough to identify which condition was being triggered.

This was the archaeology mode of systems programming: when the error is opaque, go to the primary source. For Linux system calls, the primary source was the kernel documentation and eventually the kernel source itself. The man pages described the rules; the kernel source described the enforcement. When the man page was ambiguous, the kernel source was not.


Traditional container networking relies on CAP_NET_ADMIN to create virtual Ethernet pairs, configure bridge networks, and set up IP masquerade through iptables. Rootless containers don't have these capabilities outside their user namespace. To provide internet access from a rootless container, one of two approaches is typically used:

slirp4netns was the original rootless networking tool, developed as part of the rootless container effort by Podman contributors. It used the QEMU SLiRP implementation—a userspace TCP/IP stack—connected to the container's network namespace via a tap interface. SLiRP emulated IPv4 TCP/UDP connectivity, translating outbound packets into socket calls on the host. The limitation: it processed each packet in userspace with a context switch overhead, making it notably slower than kernel-native networking, and it required a specific activation sequence.

pasta (PAdding-STAte the Linux networking) is the modern successor, developed by Stefano Brivio at Red Hat. Unlike slirp4netns, pasta works at the socket level rather than the packet level: it intercepts the container's socket calls (via namespace-level socket activation) and translates them into matching socket calls on the host. The translation is much more efficient than full packet emulation and supports IPv6 natively. pasta also handles port forwarding directly, removes the need for a separate iptables DNAT rule, and supports more complex networking topologies.

Maestro's mejis.go used pasta as the default with slirp4netns as a fallback. The detection logic checked PATH for pasta, then pasta-netns, then slirp4netns, using whichever was found first. The command-line argument construction differed between the two tools, but both were wrapped behind the same mejis.Start() interface so the rest of the networking code didn't need to know which tool was in use.


Chapter 5: The Nginx Smoke Test War

The final frontier of Phase 1 was the nginx smoke test: maestro run --detach -p 8080:80 nginx:latest, followed by curl -s http://localhost:8080 returning the nginx welcome page HTML. This was the integration test that proved the complete end-to-end stack: image pull, image storage, rootfs mounting, spec generation, namespace configuration, pasta networking, port forwarding, and HTTP traffic through the full userspace networking stack.

The nginx smoke test took more debugging time than any other single test in Phase 1. It was a multi-front war, with five simultaneous problem categories that had to be resolved in sequence before the test could pass. Each front was a real battlefield with real casualties: hours lost to strace output that showed symptoms without causes, to docker comparison testing that showed "Docker can do it, why can't we?", to kernel documentation pages that explained the rule but not the exception that applied in the exact configuration Maestro was using.

5.1 — The Detached Execution Problem

When run with --detach, maestro run needed to continue container operation after the CLI process exited. The initial implementation used the container ID to look up the container state from waystation on each operation. But beam.Attach—the function that started the pasta process and wired up the network namespace—was called from within the gan.CreateContainer flow, with the pasta process stored as a field on the Beam object in memory.

When the CLI process exited, Go's garbage collector eventually collected the Beam object. The pasta process, started as a cmd.Process with no independent persistence mechanism, received SIGHUP from Go's process group cleanup (Go sends SIGHUP to child processes on exit when the parent exits) and terminated. The container's network namespace became disconnected—technically present in the kernel (the mount at /run/user/1000/maestro/ns/<container-id>/net still existed), but with no routing, no masquerade, and no translation from container-space socket calls to host-space socket calls. curl http://localhost:8080 returned Connection refused because no process was listening: pasta had died, the port forward rule pasta had established was gone, and nginx's HTTP traffic had nowhere to go.

The fix required two changes: persisting the pasta process PID in waystation as part of the container's network state, and implementing a pasta.IsAlive(pid) check that the network subsystem ran before each operation that required network connectivity. If pasta was dead, beam.Reattach would restart it, re-establishing the port forwarding and routing. If pasta was alive, the operation proceeded normally.

The waystation record for a container's network state grew to include: the pasta PID, the network namespace path, the port mappings configured, the IP address assigned to the container (by pasta's --config-net), and a timestamp of when the pasta process was last verified alive. This state was sufficient to reconstruct the network connection across CLI invocations and to diagnose networking failures in maestro system check-style diagnostics.

5.2 — The Pasta Namespace Sequencing Bug

The pasta process needed to run attached to the container's user namespace (so it had the correct privilege context for the network namespace operations) but in the host's network namespace (so it could make outbound connections on behalf of the container). The sequencing of namespace entries was critical: enter the wrong namespace in the wrong order, and pasta would either have insufficient privileges for the network operations or would be operating in the wrong network context.

The nsenter invocation to start pasta was initially:

nsenter --user=/proc/<container-init-pid>/ns/user \
        --net=/proc/<container-init-pid>/ns/net \
        pasta --config-net --tcp-fwd 8080:80 ...

This had two problems. First: <container-init-pid> was the PID as seen from inside the container's PID namespace—the container's PID 1. From the host's perspective, the container's init process had a different PID (the host PID, as shown in ps aux). Using the container-namespace PID in a host-side /proc lookup produced either "no such file or directory" (if the container-namespace PID was not a valid host PID) or—perniciously, because the kernel recycles PIDs—looked up a completely different host process that happened to have been assigned that PID value.

Second: pasta needed to enter the container's user namespace to get the correct UID/GID context, but remain in the host's network namespace to be able to make outbound connections. The --net flag in the initial invocation was entering the container's network namespace—exactly the opposite of what was needed.

The correct invocation used the network namespace holder process—a small dedicated process whose sole purpose was to hold the container's network namespace file descriptor open and whose PID was managed entirely in the host's PID namespace:

# holder-pid is recorded in waystation when the network namespace was created
nsenter --user=/proc/<holder-pid>/ns/user \
        pasta \
        --config-net \
        --tcp-fwd 8080:80 \
        --map-gw \
        -T none -U none \
        /proc/<holder-pid>/ns/net

In this invocation: --user=/proc/<holder-pid>/ns/user entered the container's user namespace (via the holder, whose PID was a valid host PID). The final argument /proc/<holder-pid>/ns/net told pasta which network namespace to configure—it was a file path argument to pasta, not an nsenter flag, meaning pasta itself would call setns to enter the container's network namespace after verifying it had the correct user namespace context. The host's network namespace was inherited from the nsenter invocation (which only specified --user, not --net).

This difference—nsenter --net vs pasta's own setns call to the net namespace path—was subtle and underdocumented. The pasta manual page described the file path argument as the "target network namespace," suggesting pasta handled the setns internally with appropriate privilege escalation. The nsenter approach placed pasta directly inside the wrong namespace before it could perform its own namespace management. The holder-process approach put pasta in the right user context with the right host network baseline, then let pasta do its own namespace entry. The behavior difference was dramatic: silent routing failure vs. correct port forwarding.

5.3 — The Port Forwarding Direction

The initial port forwarding argument to pasta was --tcp-ns 8080:80, which in pasta's argument syntax specified "create a listener in the namespace on port 8080 and forward to host port 80"—the wrong direction entirely. Container port forwarding meant "the host listens on port 8080, and traffic is forwarded to port 80 inside the container."

pasta's argument for the correct direction was --tcp-fwd 8080:80: "forward host port 8080 to container port 80." The --tcp-ns and --tcp-fwd flags were pasta's distinction between "listening in the namespace" and "listening on the host." Without reading the pasta changelog and source (the manpage was incomplete on this point), the distinction was easy to get backwards.

The test for this was trivial once the flag was correct: curl http://localhost:8080 returned the nginx welcome page. When the flag was wrong, curl http://localhost:8080 returned Connection refused from the host side (because nothing was listening on the host on port 8080), while inside the container, port 80 was listening. The asymmetry—nginx alive inside, no connection possible from outside—was the diagnostic clue that led to examining the direction semantics of the pasta flags.

5.4 — The rootfs Permissions Split

nginx inside the container ran as UID 101. The rootfs directory itself—created by prim as the target of the OverlayFS mount or the VFS copy—had permissions 0700 and was owned by UID 0 inside the container (the mapped identity of the process that ran prim). Inside the container's user namespace, UID 0 could enter the directory. But after nginx called setgid(101) and setuid(101) to drop to its configured non-root user, UID 101 could not access the rootfs root directory to perform any filesystem operations: reading /etc/nginx/nginx.conf, creating /run/nginx.pid, or accessing any of the dozens of files and directories nginx's startup sequence touched.

The symptom was opaque: nginx would start, print its startup messages to stderr, and then immediately exit with code 1. The error in nginx's own log was stat("/etc/nginx/nginx.conf") fail (13: Permission denied). The "13" was EACCES—access denied. Why would nginx be denied access to its own configuration file, which was correctly owned by root inside the container?

The answer required tracing the permission check path: the kernel's stat() call walked the directory tree from root to the target file, checking execute permission on each component. To access /etc/nginx/nginx.conf, the kernel checked: can UID 101 execute /? Yes—/ was 0755. Can UID 101 execute /etc? Yes—0755. Can UID 101 execute /etc/nginx? Yes—0755. But before any of that: can UID 101 execute the rootfs root directory (the merged overlay mount point itself, the / equivalent in the container's mount namespace)? No—it was 0700. The first check failed.

The fix was to create the rootfs directory with 0755 permissions (rwxr-xr-x), giving all users execute permission on the rootfs root. This did not grant any security-sensitive capability: the 0755 permission on the root directory only allowed traversal (the execute bit), not modification (the write bit remained owner-only). The files and directories within the rootfs retained their original permissions from the image layers, which preserved the intended access control model for the containerized application.

5.5 — The FUSE Zombie and the Foreground Fix

On the development machine where Phase 1 was developed—a Linux machine with kernel 5.15—kernel overlayfs rootless mode was not yet supported (rootless overlayfs required kernel 5.11+ with specific userdata namespace support, but the specific kernel on this machine had it disabled in the distro build). prim correctly fell back to fuse-overlayfs. But fuse-overlayfs introduced its own class of problems.

The initial prim implementation launched fuse-overlayfs as a daemon: calling cmd.Start() without cmd.Wait(), letting the process run in the background. This was the standard pattern for daemons in most contexts. But fuse-overlayfs in daemon mode daemonized itself—it forked a child, the child became the actual FUSE daemon, and the parent (the one prim had started) exited immediately. The cmd.Wait() that prim eventually called completed immediately, because the original process had exited. But the child FUSE daemon was now a child of init (PID 1) rather than of prim, completely outside prim's lifecycle management.

The problem manifested in several ways. First: the FUSE mount wasn't ready when prim returned. The forked daemon needed a few milliseconds to initialize the FUSE mount point, and prim was continuing with rootfs operations before the mount was up, producing spurious "transport endpoint not connected" errors. Second: when prim attempted to unmount the FUSE filesystem during container cleanup, the FUSE daemon was still running (reparented to init), and umount blocked waiting for it to release its file descriptors, eventually timing out with EBUSY.

The fix was two-part. First: run fuse-overlayfs in foreground mode (-f flag), which prevented it from daemonizing. The FUSE process remained a child of prim's goroutine, manageable via *os.Process. Second: use mountinfo polling to wait for the FUSE mount to appear before returning from prim.Mount():

// Wait for FUSE mount to be listed in /proc/self/mountinfo
func waitForMount(target string, timeout time.Duration) error {
    deadline := time.Now().Add(timeout)
    for time.Now().Before(deadline) {
        if isMounted, _ := checkMountInfo(target); isMounted {
            return nil
        }
        time.Sleep(50 * time.Millisecond)
    }
    return fmt.Errorf("timed out waiting for mount at %s", target)
}

/proc/self/mountinfo was the authoritative source of mount state for the current mount namespace. When fuse-overlayfs had successfully mounted, an entry for the target path would appear in mountinfo. The polling loop—checking every 50ms for up to 5 seconds—was sufficient for all tested configurations without introducing significant latency. The cleanup path used process.Signal(syscall.SIGTERM) followed by process.Wait(), ensuring the FUSE daemon exited cleanly and its zombie was reaped by the Go runtime's process group management.

The FUSE process's stderr was drained by a goroutine to a string builder. After the process exited, the stderr contents were filtered for the known lazytime warning before being included in the container log. The filter was explicit and documented:

// fuse-overlayfs emits this warning on kernels without lazytime FUSE support.
// The mount still works correctly; the warning is informational and would
// confuse users who see it in container logs.
const fuseWarningLazytime = "WARNING: FUSE lazytime not supported"

The comment was the important part. A future reader could see exactly why the filter existed, what the warning meant, and when it would appear—allowing them to make an informed decision if they ever needed to change the filtering behavior.


Chapter 6: The Sacred Smoke Tests

After all five fronts of the nginx war were resolved, the smoke tests were formalized. Three test scripts were written—each one a complete, automated verification of a specific capability:

scripts/smoke-test-alpine-echo.sh: The baseline container execution test.

./bin/maestro run --rm alpine:latest echo "smoke-test-ok"

Expected output: exactly smoke-test-ok and exit code 0. Verifies: image pull, rootfs mount, container creation, process execution, output capture, container cleanup.

scripts/smoke-test-alpine-volume-mount.sh: The volume mounting test.

echo "test-content" > /tmp/test-input.txt
./bin/maestro run --rm -v /tmp/test-input.txt:/data/input.txt:ro alpine:latest cat /data/input.txt

Expected output: test-content. Verifies: bind mount parsing, mount specification in the OCI spec, file accessibility inside the container from the host filesystem.

scripts/smoke-test-nginx-welcome.sh: The full integration test.

./bin/maestro run --detach --name smoke-nginx -p 8080:80 nginx:latest
sleep 2  # give pasta time to configure the network interface
RESPONSE=$(curl -s --max-time 5 http://localhost:8080)
./bin/maestro stop smoke-nginx
./bin/maestro remove smoke-nginx
echo "$RESPONSE" | grep -q "Welcome to nginx"

Expected: the nginx welcome page HTML contains "Welcome to nginx". Verifies: detached mode, port forwarding, pasta networking, nginx startup with UID 101, HTTP traffic end-to-end.

Each script included pre-cleanup: removing any container named smoke-nginx if it existed from a previous failed run. This made the tests idempotent—runnable multiple times in the same environment without manual cleanup between runs. Idempotency was a requirement for CI use, where the environment might be reused across runs.

[Engineering Sidebar: The Importance of Smoke Testing]

Smoke tests are a category of integration tests that verify end-to-end functionality of the critical paths through a system without exhaustively testing every edge case. The name comes from hardware testing: when you power on a new circuit board for the first time, the first check is "does it smoke?" If no smoke, proceed to more detailed tests. If smoke, return to the drawing board.

In container runtime development, the end-to-end path is vastly more complex than any single component. A container runtime is the integration of a registry client, a content-addressable storage engine, a filesystem driver, an OCI spec generator, a runtime binary (crun), network namespace management, userspace networking (pasta), port forwarding, process lifecycle management, and state persistence. Each component might be individually correct—all unit tests passing, 100% coverage—while the integration between components contains subtle behavioral mismatches that only appear when the full path is exercised.

The three smoke tests in Maestro covered three distinct integration paths: process execution (testing the core create-start-wait-cleanup loop), filesystem integration (testing the volume mount path from CLI argument to kernel bind mount), and full networking (testing the network namespace, pasta, CNI, and port forwarding chain). Passing all three meant the system was minimally functional as a container runtime. Failing any one meant a regression in a user-visible capability, regardless of the state of the unit test suite.


Chapter 7: The maestro system check Command

Part of the stabilization work was building a preflight diagnostic that could tell users why Maestro wasn't working in their environment. The maestro system check command grew into a comprehensive validator:

$ maestro system check

  Maestro System Check
  ──────────────────────────────────────────────────
  ✓ Kernel version: 6.6.30 (rootless overlayfs supported)
  ✓ User namespaces: supported (max_user_namespaces = 14618)
  ✓ cgroup v2: available at /sys/fs/cgroup
  ✓ Subordinate UIDs: 1000:100000:65536 (/etc/subuid)
  ✓ Subordinate GIDs: 1000:100000:65536 (/etc/subgid)
  ✓ crun: /usr/bin/crun (version 1.19)
  ✓ pasta: /usr/bin/pasta (version 2024.07.01.8)
  ✓ fuse-overlayfs: /usr/bin/fuse-overlayfs (version 1.14)
  ✓ newuidmap: /usr/bin/newuidmap (suid)
  ✓ newgidmap: /usr/bin/newgidmap (suid)
  ──────────────────────────────────────────────────
  All checks passed. Maestro is ready.

Each check in this list had a corresponding failure message with remediation instructions. A missing /etc/subuid entry produced:

  ✗ Subordinate UIDs: no entry found for user 'rodrigo' in /etc/subuid
    Fix: sudo usermod --add-subuids 100000-165535 rodrigo
         sudo sysctl -w user.max_user_namespaces=14618

The system check command was not technically required for any container operation—a user could run maestro run without ever running system check. But in practice, it was the first command any new user ran when something didn't work. Its quality directly determined the user experience of "Maestro doesn't work"—either frustrating dead-ends or clear paths to resolution. The investment in comprehensive diagnostics was an investment in user trust: a tool that diagnosed its own problems was a tool that respected the user's time.


Chapter 8: The P0 Stabilization Sprint

Between the completion of the networking integration and the final release, a dedicated sprint addressed a backlog of P0-category issues—problems that a user might encounter in normal usage that produced confusing or incorrect behavior. The P0 categorization was strict: only issues that affected a core user-facing operation in the standard usage scenario qualified. Performance improvements and feature additions were deferred. Complete list, complete fixes, ship.

The P0 issues resolved in this sprint included:

Container stop reliability: The initial maestro stop implementation sent SIGTERM and immediately returned, without waiting for the process to exit or retrying with a stronger signal. In the happy case—a well-behaved process that handled SIGTERM—this was fine. In the nginx case, nginx caught SIGTERM and performed a graceful shutdown, which took up to 10 seconds. If maestro stop returned immediately, the user would see the container in "running" state for 10 seconds after the stop command returned, which was confusing and incorrect. Worse: if the process didn't respond to SIGTERM (a hung or zombie process, or a process that caught SIGTERM and ignored it), maestro stop would return success while the container process continued running indefinitely. The fix added a configurable timeout (default 10 seconds) with SIGKILL escalation: SIGTERM first, poll for exit via /proc/<pid>/status, send SIGKILL if still alive after timeout, wait for SIGKILL acknowledgment.

Log file cleanup: Container log files were not being cleaned up when a container was removed with maestro remove. On systems that ran many ephemeral containers—CI pipelines that each ran a container to execute a test suite—log accumulation would grow unbounded. A test suite running 1,000 containers in a week would accumulate 1,000 log files in ~/.local/share/maestro/logs/, each potentially megabytes. The fix: maestro remove deleted the log file as part of container cleanup, unless --keep-logs was explicitly specified. The --keep-logs flag was provided for debugging scenarios where the log was needed after removal.

Inspect output format: The initial maestro inspect returned a Go struct serialized with %+v formatting—technically readable but completely impractical for scripting, piping to jq, or comparison against expected values. The fix added JSON output as the default format (encoding/json.MarshalIndent with two-space indent), with a --format flag that accepted Go template strings for field selection. maestro inspect --format '{{.IPAddress}}' returned just the container's IP address, enabling scripting patterns like curl http://$(maestro inspect --format '{{.IPAddress}}' my-container):80.

Port binding conflict detection: If port 8080 was already in use on the host and the user tried to run a container with -p 8080:80, pasta would start, attempt to bind port 8080, fail with a system error about the bind call, and exit with a non-zero status. The error message from pasta was a low-level socket error, not a user-friendly description. The fix added a pre-flight check in the port forwarding setup code: before starting pasta, attempt net.Listen("tcp", ":8080") and immediately close it. If the listen failed, return: port 8080 is already in use on this host; try a different port with '-p <host-port>:80'. The error message was actionable—it told the user exactly what to do—rather than a system error string that required interpretation.

Volume mount permission handling: The initial volume mount implementation passed bind mount paths directly from the user's command line to the OCI spec without checking whether the source path existed or was accessible. A typo in the path (-v /tmp/data:/data) resulted in the OCI runtime returning a confusing error at container start time, rather than a clear error at the maestro run command level. The fix added a pre-flight stat check on all volume source paths before constructing the OCI spec, returning: volume source '/tmp/datta' does not exist or is not accessible.

Each of these five fixes was small in implementation scope but large in user experience impact. The user experience of a container runtime is not primarily determined by whether it can run a container—modern developers assume that will work. It's determined by what happens when something goes wrong: whether the error message is actionable, whether the state is clean, whether the next step is clear. The P0 sprint invested specifically in these failure-mode experiences.


Chapter 9: The Final Quality Accounting

The last step before tagging the release was the complete quality accounting: running every check in the CI pipeline on the full codebase and confirming every target was met.

$ make ci-local
fmt     ✓  (gofmt, goimports: 0 issues)
lint    ✓  (golangci-lint, 57 rules: 0 issues)
test    ✓  (go test -race ./...: all passed)
coverage✓  (all packages: 100% or documented ignore)
build   ✓  (CGO_ENABLED=0 static binary: ok)
smoke   ✓  (alpine-echo, volume-mount, nginx-welcome: all passed)

All checks passed.

The word "all" in All checks passed was carrying weight that the casual reader might miss. It was not "all unit tests passed" or "no lint errors found." It was every check in the pipeline—including the race detector run, including the smoke tests that exercised the full integrated system, including coverage thresholds that would fail if any line lacked either a test or a documented reason for its exclusion.

The test line with -race was the most expensive: running all unit tests under Go's race detector doubled execution time but caught data races that only appeared under concurrent execution. In a codebase with file locking (Khef), goroutines managing pasta processes, concurrent container operations reading and writing the waystation state, and the parallel HTTP mock servers in the shardik test suite, the race detector was not optional—it was the check that verified the concurrency model was actually correct rather than just appearing to work in single-threaded test runs.

The -race flag instrumented every memory access in the test binary with shadow memory checks: every read and write to shared memory was tagged with the goroutine that performed it, and any access to the same memory from two goroutines without an intervening synchronization event (mutex, channel, atomic) triggered a race report. The implementation overhead was significant: 5-20x slower execution, and substantial memory overhead from the shadow memory. But the guarantee it provided was real: if the final build passed -race cleanly, the concurrency model was correct for all paths exercised by the test suite.

Zero races detected. Zero lint issues. Zero coverage gaps without documentation. Zero smoke test failures.

The discipline had held from the first commit to the last. The accounting was complete.


Chapter 10: git tag v0.1.4

The release ceremony was quiet. It happened, as all the meaningful moments in this journey had happened, in a terminal.

$ git log --oneline -10
3a7f1c2 docs: update CHANGELOG for v0.1.4
8d9ef01 fix: clean container log files on remove
5c3ab47 fix: detect port conflicts before pasta start
2f8194c feat: pasta --config-net port forward direction
1a9d4e7 fix: fuse-overlayfs foreground mode, reap zombie
7b3c910 fix: rootfs 0700→0755 top-level permissions
4d2a156 fix: setgroups=deny before newgidmap sequencing
e6f8023 feat: beam.Attach with holder NSenter approach
f1b2d34 fix: exec.Cmd *os.File not io.Writer for OCI create
9c8a7e1 feat: todash LockOSThread for network ns entry

$ git tag -a v0.1.4 -m "Phase 1 MVP: The Gunslinger Stands
  
  Full OCI container lifecycle, rootless-first, daemonless.
  
  - Image pull from OCI compliant registries (shardik + drawing + maturin)
  - Rootless container execution via crun and user namespace
  - Kernel overlayfs and fuse-overlayfs storage drivers
  - Rootless networking via pasta with port forwarding
  - Complete smoke tests: alpine echo, volume mount, nginx HTTP
  - 100% test coverage on all packages
  - Zero lint issues (57 golangci-lint rules)
  - Static binary, CGO_ENABLED=0
  
  The Tower is not yet reached. But the Gunslinger stands and the Beam holds."

$ git push origin v0.1.4

The tag was created on 2026-04-01. Not an April Fool—a timestamp. Timestamps don't lie. Code doesn't lie. Smoke tests don't lie. v0.1.4 was real because it passed the tests that defined what "real" meant for this project.

The tag message was also documentation. It was the permanent, commit-tree-level record of what the release represented, written for the same future audience that the JOURNEY.md and CHANGELOG and narrative served: someone who needed to understand, at a glance, what v0.1.4 was and why it mattered. "The Gunslinger stands and the Beam holds" was not marketing language—it was the project's naming philosophy applied to the release itself. The Tower was not yet reached; honesty about what wasn't done was part of the documentation. The Beam holds; confidence about what was done was the other part.

v0.1.4 was not a complete container runtime. It did not have image build capability, compose-file parsing, a remote API, image signing, vulnerability scanning, or networking between multiple containers. All of those were Phase 2 and beyond—each one a companion waiting to be drawn through a beach door, each one a new set of problems to solve, each one requiring the same discipline that Phase 1 had established.

What v0.1.4 had was a foundation. A foundation that had been:

  • Architecturally honest: Every package boundary reflected an actual domain boundary. shardik knew nothing about container execution; gan knew nothing about registry protocol. The interfaces between packages were narrow and typed.

  • Coverage-complete: Every line of code was either tested or documented as genuinely unreachable. Not "mostly tested." Not "tested except the sad paths." Every line.

  • Dependency-injected: Every dependency on the external world (filesystems, clocks, process execution, OS calls) was injectable, making the code testable without filesystem state, without real processes, without timing dependencies.

  • Lint-zero: Every golangci-lint rule in the 57-rule configuration passed. No cognitive complexity overruns, no function length violations, no error checks skipped, no unused variables, no deprecated API calls, no security vulnerabilities flagged by govulncheck.

  • Smoke-tested: The three smoke tests ran cleanly. A human user could run each script and observe the described output. Not "our unit tests say it works." A request to port 8080 returned the nginx welcome page.

This was the foundation. Phase 2 would build on it, and Phase 3 would build on Phase 2. The discipline established in Phase 1—test everything, cover everything, inject everything, document the battles—was the gift that Phase 1 left to all future phases. Not the code itself, which would be modified and extended, but the culture: the expectation that quality was not a feature to add later but a property to maintain from the first commit.


Chapter 11: Looking to the Forest — The Drawing of the Three

Phase 2 was visible on the horizon like the Dark Tower itself: enormous, complex, its specific shape partly unclear but its general outline known from the roadmap that had been written before a single line of Phase 1 code was committed.

Phase 2 was called "The Drawing of the Three"—a reference to the second Dark Tower volume, in which Roland draws three companions through beach doors, completing his ka-tet. In Maestro's metaphor, the three companions were:

First companion: The full compose lifecycle—maestro compose up/down/ps/logs, reading a subset of Docker Compose syntax and translating it into multiple coordinated Maestro containers with shared networks, volume mounts, environment variable injection, and dependency ordering. This was not a small undertaking. Compose had hundreds of keys, dozens of edge cases, and a community of users with strong opinions about exact behavioral compatibility. Maestro's compose implementation would support a documented subset—the 80% that covered the real-world use cases of development environments and simple production deployments.

Second companion: The remote API—an HTTP API (subset of Docker API, OCI compatible) that allowed external tools to communicate with Maestro. This was needed for IDE integrations, CI tools, and any system that expected a container daemon rather than a CLI tool. Phase 2 would add an optional daemon mode where Maestro ran as a background process accepting API calls, without sacrificing the zero-daemon default for users who didn't need it. The API design had already been sketched in the openspec/ directory during the planning phase—a complete OpenAPI specification for every operation Maestro would expose, which was both documentation and a contract that the implementation would be held to.

Third companion: The image build system—a maestro build command that consumed a Dockerfile (or Containerfile) and produced an OCI image without requiring Docker or Buildah. Container build was a full domain with layers, caching, context transfer, .dockerignore processing, multi-stage builds, and security considerations around untrusted Dockerfiles. Phase 2's build system would target the common cases—single-stage builds, basic FROM/RUN/COPY/EXPOSE/CMD instructions—with more complex cases deferred to Phase 3.

These three companions would complete the core platform: a container runtime that could pull, build, run, compose, and expose an API—covering the use cases that drove 90% of container demand in development and production.

But the forest was dense and the companions were not gentle. Compose parsing required a YAML schema that avoided arbitrary code execution from malicious compose files—a security constraint that ruled out the naive approach of using Go's yaml.Unmarshal with interface{} and trusting the result. The remote API required authentication even in the single-user case, because a listening TCP socket on the local machine was accessible to any process on that machine, and a container runtime API that could start arbitrary processes was a significant privilege escalation vector if unprotected. The build system required understanding RUN instruction isolation—how to safely execute apt-get install during image build without contaminating the build environment, without requiring root, and without leaking state between build stages.

Phase 1 had answered none of these questions. Phase 1 had established the answers to the earlier, simpler questions—how to run an existing image, how to connect it to the network—that had to be answered before the Phase 2 questions became well-defined. This was the nature of iterative system building: each phase made the next phase's questions visible by answering the current phase's questions. You couldn't see the compose parsing problem until you had container networking. You couldn't see the build caching problem until you had image storage. The Tower revealed itself one floor at a time.

The Gunslinger stood at the forest's edge with three things he hadn't had when he started the desert crossing: working guns (the binary), a ka-tet principle (the discipline), and knowledge of the forest's structure from the roadmap. He didn't know every tree. He knew the direction. That was enough to begin.


Chapter 12: The Gunslinger Stands

The Dark Tower series ends—and then continues, and then ends again—with Roland reaching the Tower. He opens the door at its base and finds himself back on the beach of the first page, about to begin the journey again, with one difference: this time, he has with him an object he didn't have the first time, an object that will change the journey. The implication is that the cycle is not endless repetition—it is iterative improvement, each cycle leaving Roland slightly more prepared, slightly more wise, slightly closer to the answer that the Tower holds.

Container runtimes are the same. v0.1.4 was not a conclusion. It was a restart with accumulated knowledge.

The months of Phase 1 had produced more than a binary. They had produced a methodology: how to design package boundaries that made testing real rather than ceremonial, how to measure coverage without gaming the metric, how to inject dependencies without over-engineering into abstraction for its own sake, how to name things in ways that carried meaning forward rather than becoming semantic cruft. They had produced the make ci-local wheel—a local CI process that could be trusted because it was identical to the remote CI process, running in the same mode, with the same linter configuration, the same test flags, the same coverage thresholds. No "it passes locally" excuses. Either it passed locally and remotely, or it didn't pass.

They had produced documentation. Not incidentally, as an afterthought, but as a first-class deliverable. The JOURNEY.md log recorded each milestone as it was achieved—the war, the specific problems, the sequence of discoveries. The CHANGELOG recorded the user-visible changes at each version. The openspec change documents recorded the architectural decisions and their rationale at each significant evolution. The design document recorded the system architecture and the reasoning behind each structural choice.

And now, this narrative: a longer form, a different key, a record of the war from the inside. Not just what was built, but how it felt to build it. The Silent Hang, experienced in real time, was a technical problem with a technical solution. Narrated afterward, it became a lesson: that the OCI Runtime Specification was not documentation for an API but a specification for a complete virtual operating environment, and that every item in the default mount list was essential, not ornamental. The lesson, documented, survived beyond the immediate debug session.

Documentation as a form of respect—for the future contributor who would open the repository late at night debugging a problem in the networking stack and need to understand why setns was called with runtime.LockOSThread() first, and what the Todash concurrency bug had been. That future contributor might be a stranger. It might be the same engineer who wrote the code, twelve months later with the specific context long eroded from memory. Either way, the documentation made the conversation possible across time.

The Phase 1 codebase, released as v0.1.4, was ready to be that foundation. Every package was tested. Every boundary was intentional. Every name carried meaning. Every decision was documented. The next contributor—whether that was the same person continuing Phase 2 or someone new arriving at the repository for the first time—would find a codebase that explained itself, that had earned its own confidence through the discipline of testing and coverage and lint compliance, and that was prepared to be extended without being torn apart.

The Gunslinger had his guns back.

He stood at the forest edge. The Tower was still on the horizon—distant, inevitable, real. Behind him, the crossed desert: Phase 1, complete. Ahead, the forest: Phase 2, beginning. The wind from the north smelled of iron and change, the way wind always smelled before something new began.

Roland checked his guns one last time. Full cylinders. Clean barrel. Ready.

The work had been real. The tests were real. The binary was real. The containers ran, the network namespace connected, the port forwarding worked, the nginx welcome page loaded in a browser pointed at localhost:8080. These were not abstractions. They were observable, reproducible, documented facts about a system that existed where it had not existed before. v0.1.4 was not just a git tag; it was the crystallization of a hundred design decisions, a thousand tests, and a methodology that could be extended indefinitely without losing its structural integrity.

That was the achievement. Not the binary itself—binaries were ephemeral, always to be superseded—but the method. The discipline embodied in make ci-local. The culture of 100% coverage with documented exceptions. The naming philosophy that made packages readable across months. The injection pattern that made tests real rather than mocked facades. These were portable, transferable, extendable. They would survive Phase 2, Phase 3, and the versions beyond.

The journey forward would be harder. Phase 2 was always harder than Phase 1, because Phase 2 built on Phase 1's foundation while expanding scope in three new directions simultaneously. But the foundation was solid—tested, typed, documented, linted, covered. It would not crack under Phase 2's weight. It had been built for weight.

Day one of Phase 2 was tomorrow.

Long days and pleasant nights, dear reader. May your namespaces stay clean and your pasta process never die.


Afterword: Technical Inventory of Phase 1

For the reader who wants the full accounting—names, packages, milestones, decisions—the complete technical inventory of Maestro Phase 1:

Packages Built

Package Dark Tower Name Function
shardik The Great Bear Guardian OCI registry client, Docker Hub and generic authentication
maturin The Ancient Turtle Content-addressable image storage, SHA-256 integrity
drawing The Act of Drawing Image pull orchestration
gan The Creator Container lifecycle: create, start, stop, remove, inspect
eld The Blood of Arthur Eld OCI runtime abstraction: crun, runc, youki
prim The Primordial Ocean Storage driver: overlayfs, fuse-overlayfs, VFS
specgen OCI Runtime Spec generator
beam The Great Beam Container networking subsystem
todash The Todash Darkness Network namespace creation and entry
guardian The Beam Guardian CNI plugin lifecycle, downloader
doorway The Magical Door Port mapping parser and state
mejis The Mejis Grasslands Rootless networking: pasta/slirp4netns
white The White (purity) Security configuration: capabilities, seccomp, setgroups
waystation The Way Station Container and image state persistence
tower The Dark Tower Configuration system

Milestones Completed

Milestone Name Achievement
1.1 Tower Rises Scaffold: CLI, CI, linter, waystation, tower config
1.2 The Drawing Image pull: shardik + maturin + drawing, 100% coverage
1.3 Gan Creates Container execution: gan + eld + prim + specgen
1.4 Beam Connects Container networking: beam + todash + guardian + doorway
1.5 Calla Stands Rootless completion: mejis + white + subuid/subgid
v0.1.4 Phase 1 MVP Release tag: 2026-04-01

End of Part III.


End of the Phase 1 Narrative Trilogy.


3 views

Maestro: Building an OCI Container Manager in Go

Part 1 of 5

Notes from a dev stepping out of his comfort zone. This series chronicles the raw, unpretentious journey of building "Maestro," a custom OCI v1.1-compatible container manager written entirely in Go. From deciphering low-level Linux namespaces and cgroups to navigating the Open Container Initiative specifications, this is a deep dive into system architecture and advanced Go interactions. We are not just blindly writing code here; we are aiming with the mind.

Up next

Part II: The Drawing of the Image

Defining the Boundaries - Registry, Runtime, and the Breath of Life