Design Decisions
Source:
openspec/changes/rock64-ab-image/design.md
This chapter documents the key architectural decisions made during the design of AtomicNix, including the rationale, alternatives considered, and known trade-offs.
Context
AtomicNix is a greenfield project targeting thousands of Rock64 (RK3328, aarch64) devices deployed remotely as network gateways. The devices have 16 GB eMMC storage and must comply with EN18031 security requirements. The previous Debian-based system had a ~3% update failure rate from power loss and partial writes, bricking devices in the field.
Decision 1: RAUC over SWUpdate
Choice: RAUC for A/B slot management.
Rationale: RAUC has native U-Boot integration, well-documented slot configuration, and a straightforward NixOS module. SWUpdate offers more flexibility (scripted handlers, delta updates) but adds complexity that isn’t needed for the current use case.
Trade-off: RAUC’s update model is image-based (full slot writes), which means no delta updates. A full rootfs write (~300 MB) takes longer than a delta, but is simpler and more reliable.
Decision 2: Squashfs rootfs
Choice: Read-only squashfs root filesystem with OverlayFS (tmpfs upper layer).
Rationale: Squashfs eliminates runtime drift – every boot starts from a known-good state. It compresses well (zstd,
1 MB blocks), fitting the NixOS closure into the 1 GB slot with room to spare. A single OverlayFS (squashfs lower +
tmpfs upper) set up in the initrd provides a unified writable root, which is required for systemd’s mount namespace
sandboxing (PrivateTmp, ProtectHome, etc.) to work correctly. Writable state lives on /data (f2fs).
Trade-off: Any runtime state not explicitly persisted to /data is lost on reboot. This is intentional for an
appliance but requires careful placement of writable directories.
Decision 3: Per-slot boot partitions
Choice: Each A/B slot has its own boot partition (vfat) containing the kernel, initrd, DTB, and boot script.
Rationale: Pairing boot and rootfs in the same slot ensures they are always consistent. If kernel and rootfs were in different slot pairs, a failed update could leave mismatched versions.
Alternative considered: Single shared boot partition with both kernels. Rejected because it creates a single point of failure and complicates the U-Boot boot script.
Decision 4: eMMC partition layout
Choice: Fixed layout: 16 MB raw U-Boot, 128 MB boot A/B, 1 GB rootfs A/B, remaining space for /data.
Rationale: 128 MB per boot slot provides ample space for the kernel (~25 MB compressed), initrd, DTB, and boot
script. 1 GB per rootfs slot gives 2-3x headroom over the current squashfs size (~300-400 MB). The /data partition
(~13.3 GB) holds containers, logs, and configuration.
Risk: If the NixOS closure grows beyond 1 GB, the rootfs slot size must be increased, which reduces /data space
and requires re-provisioning all devices.
Decision 5: U-Boot from nixpkgs
Choice: Use pkgs.ubootRock64 from nixpkgs rather than a custom U-Boot build.
Rationale: The nixpkgs U-Boot package is tested, reproducible, and tracks upstream releases. Custom patches are applied via the kernel config (not U-Boot patches), keeping the build simple.
Trade-off: Limited to the U-Boot version and configuration in nixpkgs. The current version (2025.10) lacks
setexpr, requiring a manual if/elif chain for boot counter decrement.
Decision 6: Watchdog strategy
Choice: defer active systemd hardware watchdog enforcement while keeping 30s runtime / 10min reboot timeouts as the target settings.
Rationale: Rock64 boot reliability validation is not complete. The target values remain documented, but the current
release leaves systemd.settings.Manager = { } to avoid watchdog-triggered reset loops during development.
Integration: Once enabled, watchdog reboots feed directly into the boot-count rollback path.
Decision 7: Local health-check (no phone-home)
Choice: os-verification.service runs local checks only. No external server is contacted for update confirmation.
Rationale: The device must be self-sufficient. If the WAN is down after an update, the device should still be able to commit the slot (or roll back) based on local service health. Phoning home would create a dependency on network availability during the critical confirmation window.
Decision 8: Nixstasis-hosted remote management
Choice: Move remote web management out of the device image and treat Nixstasis as the primary control plane.
Rationale: The Nixstasis client already establishes reverse tunnels and receives short-lived SSH credentials from the server. Hosting remote web management and the auth layer in Nixstasis removes first-boot registry pulls, reduces device complexity, and keeps the device focused on local gateway and update responsibilities.
Trade-off: Remote management now depends on successful Nixstasis enrollment and tunnel establishment. Local recovery falls back to SSH rather than an on-device HTTPS UI.
Decision 9: OpenVPN in rootfs
Choice: Include OpenVPN in the root filesystem (not as a container).
Rationale: OpenVPN provides a recovery tunnel for remote management. If it ran as a container and the container runtime failed, there would be no remote access. Including it in the rootfs ensures it survives container-layer failures.
Decision 10: Network isolation (no IP forwarding)
Choice: Disable IP forwarding at the kernel level. The nftables FORWARD chain drops all packets.
Rationale: EN18031 requires a hard network boundary. LAN devices must not be able to reach the internet. WAN application or VPN ports are opened only by provisioned firewall state. Packet forwarding between WAN and LAN stays disabled.
Decision 11: NIC naming via .link files
Choice: Use systemd .link files for deterministic NIC naming rather than udev rules.
Rationale: .link files are the native systemd-networkd mechanism and are processed earlier in boot than udev
rules. They match on stable platform paths (e.g., platform-ff540000.ethernet for the onboard GMAC), ensuring eth0 is
always the onboard Ethernet regardless of USB enumeration order.
Decision 12: nftables firewall
Choice: nftables with per-interface rules, replacing iptables.
Rationale: nftables is the modern Linux firewall framework with better performance and a cleaner rule syntax. The
NixOS networking.nftables module provides native integration.
Decision 13: hawkBit-ready architecture
Choice: Design the update system to be swappable between polling and hawkBit push models.
Rationale: The initial deployment uses simple HTTP polling (os-upgrade.service). As the fleet scales, migration to
hawkBit can provide centralized update management, rollout policies, and device inventory. The os-upgrade.useHawkbit
option currently reserves this path and installs the package, but AtomicNix does not configure an operational hawkBit
service yet.
Decision 14: QEMU testing target
Choice: Provide a rock64-qemu NixOS configuration that shares the full service stack with the hardware target but
uses virtio devices and a file-based RAUC backend.
Rationale: Hardware testing is slow and requires physical devices. QEMU testing validates all software behavior (RAUC lifecycle, firewall rules, health checks) in CI-friendly VMs. The custom RAUC backend simulates U-Boot’s slot selection using files.
Decision 15: EN18031 authentication
Choice: no default passwords, locked local root/admin passwords, SSH key-only access, serial break-glass recovery, and Nixstasis-hosted remote management.
Rationale: The base image does not host the web management/authentication stack. SSH key-only access and locked passwords prevent brute-force attacks on the device, while Nixstasis handles remote management credentials outside the device image.
Decision 16: Squashfs closure optimization
Choice: Aggressive closure size reduction through overlays, disabled features, and stripped dependencies.
Techniques applied:
crunwithout CRIU (removes python3, saving ~102 MB)- Disabled: documentation, man pages, fonts, XDG, sudo, bash completion
- Emptied
defaultPackagesandfsPackages - Disabled: bcache, kexec, LVM
Result: Approximately 27% reduction in closure size compared to a default NixOS system with the same services.
Decision 17: Two-tier runtime logging model
Choice: Use tmpfs-first journald during runtime for host and container log ingress, then drain it through an
rsyslog RAM queue that appends buffered logs to /data/logs.
Rationale: Making the full journal always persistent would increase steady-state eMMC wear. The selected design keeps
runtime logging memory-first, caps journal memory use, routes Podman logs through the same path, and still retains
broader diagnostics durably on /data/logs in larger sequential batches instead of many small writes.
Trade-off: This is a bounded-loss durability model rather than an always-durable one. Sudden power loss can still drop the newest in-memory journal or rsyslog queue entries, but routine runtime writes remain much friendlier to eMMC than fully persistent journal storage.
Risks and Trade-offs
| Risk | Mitigation |
|---|---|
| eMMC wear from frequent writes | /data uses f2fs (wear-leveling aware); squashfs slots are written only during updates |
| U-Boot env corruption | Single-copy environment storage; corruption is handled through normal recovery and reprovisioning flows |
| 1 GB rootfs slot too small | Current closure is ~300-400 MB; aggressive optimization keeps headroom |
| Missing or empty health-required list | first-boot.service commits only when RAUC is enabled; os-verification uses gateway health checks alone unless /data/config/health-required.json names additional required units |
| Provisioned application failure | OpenVPN in rootfs and SSH key-only access provide alternate recovery paths |
| No delta updates | Full-image updates are ~300 MB; acceptable on broadband WAN connections |
| No automatic WAN SSH | Deliberate security constraint; manual flag file required |