Planned Features
Project Overview
AtomixOS is a secure, reproducible operating system for single-board computers, built on
NixOS with atomic A/B OTA updates, automatic rollback, and a container-based application
deployment model. The system uses a read-only squashfs rootfs and operator-provisioned
Quadlet containers on a persistent /data partition.
Goals
- Ship a complete, reproducible embedded gateway firmware with zero default credentials
- Provide atomic, rollback-safe over-the-air updates for thousands of remote devices
- Allow operators to provision application containers, networks, and volumes via a
single
config.tomlwithout touching the base image - Support EN18031 compliance for network isolation, authentication, and audit
- Support optional Nixstasis-based remote management through enrollment and tunnels
- Deliver a working reference stack (Caddy + AuthCrunch + Cockpit-ws) demonstrating OIDC-authenticated device management through config.toml
Non-Goals
- Desktop or server NixOS distribution
- Multi-architecture support beyond aarch64 (Rock64 RK3328)
- Container orchestration (Kubernetes, Swarm) – Quadlet is the runtime
- Delta OTA updates (full image writes are the current model)
- On-device web management UI in the base image (remote management can be provided through optional Nixstasis integration)
- General-purpose firewall/router functionality (no IP forwarding, ever)
Global Constraints
- 16 GB eMMC with fixed A/B partition layout; rootfs slot is 1 GB max
- Squashfs root is read-only; all mutable state lives on
/data(f2fs) - EN18031: no default credentials, no IP forwarding, key-only SSH
- Provisioned containers must go through the Quadlet safety boundary (rootful=host network, rootless=pasta with loopback publish rewrites)
config.tomlis the single operator input; schema changes must not break existing configs- RAUC bundles are signed; only CA-signed updates are accepted
- Hardware watchdog enforcement is deferred until boot-reliability validation completes
Cross-Cutting Decisions
POST /api/configis the programmatic provisioning endpoint; same validation as the web console- Fresh first-boot
POST /api/configis intentionally tokenless for programmatic provisioning; the bootstrap token is a Boot UI CSRF control for/apply, not operator authentication - Provisioned re-apply requires SSH signature authentication;
/api/validatealso requires SSH authentication - Bootstrap exposure is WAN/LAN before initial provisioning and LAN-only after
successful provisioning; runtime socket rebinding must use
/run/systemd/systemdrop-ins because the rootfs is read-only quadlet-runtime.jsontracks all rendered units (containers, networks, volumes) with mode (rootful/rootless) for sync-quadlet- Network and volume Quadlet units are always rootful
${CONFIG_DIR}and${FILES_DIR}tokens in Quadlet values are substituted at render time to/data/configand/data/config/filesrespectively- Bundle imports support
files/directory for operator payload files - Re-apply uses authentication, not a reset token
- Full
/datawipe is separate from config re-apply - WAN TCP
8080is reserved for bootstrap exposure and cannot be configured as a provisioned WAN inbound rule - The repository development RAUC CA is an explicit development convenience only; production fail-closed keyring enforcement remains planned
Open Questions
- Cockpit-podman host integration:
cockpit-podmanmust be installed on the host (not in the cockpit-ws container) and communicates via cockpit-bridge. On AtomixOS the rootfs is read-only squashfs, so cockpit-podman would need to be in the NixOS closure. This means the base image must include it, which crosses the “no on-device web management” non-goal boundary. Alternative: treat cockpit-podman as an optional NixOS module that operators can enable. - hawkBit integration:
useHawkbitoption exists but no operational service is configured. Needs server configuration, credentials, and verification tests before promotion. - Nixstasis client: Enrollment, tunnel lifecycle, and credential rotation are documented but not implemented.
- USB WiFi: Kernel WiFi/Bluetooth stacks are disabled. Hardware selection needed before enablement.
- Active watchdog enforcement: Deferred pending Rock64 boot-reliability validation.
- Additional
[network]properties: Evaluate addingdns_servers,dns_search_domains,default_gateway, andinterfacesto the[network]section for operator-controlled DNS, default route, and NIC configuration. These keys are not currently consumed but may be needed for multi-NIC or custom DNS setups. - User shell configuration: Allow operators to set
shell = "zsh"orshell = "bash"per user in[users.<name>]. Currently admin users default to/bin/zshand system accounts to/bin/sh, with no config override. - Additional
[activation]options: Evaluate adding activation controls beyondrequired, such astimeout_secondsfor max wait/check windows,rollback_on_failurefor whether to restore previous config,restartfor an explicit ordered service restart list,settle_secondsbefore checking health,allow_degradedfor services allowed to fail without rollback, andstrategy = "rollback" | "keep-failed" | "manual-confirm".
Resolved Questions
- Cockpit-ws authentication boundary: Resolved by placing Cockpit behind
Caddy/AuthCrunch and running cockpit-ws with
--local-session. Caddy is the only public authentication and authorization boundary;/cockpit/*is restricted toauthp/admin. - Provisioning API foundation: Resolved by replacing the monolithic
first-boot provisioner with the
atomixos-provisionPython package, Litestar API service, SSH signature authentication, single-flight apply jobs, live OpenAPI schema, crash-safe config promotion, activation health checks, and rollback handling. Future changes should build on the same validate, render, promote, activate, and rollback pipeline instead of adding parallel mutation paths. - Bootstrap API and UI auth split: Resolved by keeping programmatic first-boot
/api/configunauthenticated while requiring the Boot UI bootstrap token for browser form submission. After provisioning, unauthenticated mutation routes are unavailable and re-apply requires SSH signatures. - Bootstrap exposure lifecycle: Resolved by keeping WAN bootstrap exposure only until initial provisioning completes, then rebinding the bootstrap socket to LAN through runtime systemd drop-ins and preserving WAN exposure while an initial promotion marker is pending.
Feature Map
caddy-authcrunch-cockpit-tutorial
- Status: completed
- Overview: Provides a comprehensive tutorial section in the documentation with a
fully working
config.tomlbundle deploying Caddy with the AuthCrunch plugin for Microsoft Entra OIDC authentication, JWT token generation with OIDC group-to-role mapping, and Cockpit-ws for container management. The tutorial demonstrates the full power of the config.toml provisioning system including containers, networks, volumes, and bundle files. - Requirements:
- Working
config.tomlwith all required sections (users, network, health, containers) - AuthCrunch container (
ghcr.io/authcrunch/authcrunch) as rootful with host networking - Caddyfile configuring Microsoft Entra OIDC provider, authentication portal, and authorization policies
- OIDC group mapping to local roles:
authp/admin(sudoless admin) andauthp/user(generic user) based on Entra security group membership - JWT token generation with configurable lifetime and signing key
- Cockpit-ws container (
quay.io/fedora/fedora) for device/container management, built from a custom Containerfile that adds Cockpit management modules - Caddy-gated Cockpit local session: Caddy restricts
/cockpit/*toauthp/admin, and cockpit-ws runs--local-sessionbehind the proxy – eliminates double authentication - Quadlet
.buildsupport for building custom container images from Containerfiles - Podman module integration so operators can manage provisioned pods from Cockpit
- Quadlet network definition for inter-container communication
- Quadlet volume definition for persistent Caddy state
- Bundle
files/directory with Caddyfile and cockpit.conf - Clear documentation of Azure App Registration prerequisites
- Clear documentation of how to swap the Caddyfile identity provider block for Google or another OIDC provider
- Clear documentation of the authentication flow and role-based access
- Working
- Constraints:
- Must use only config.toml features that exist today or are added as part of this
feature (containers, networks, volumes, builds, bundle files,
${CONFIG_DIR}/${FILES_DIR}tokens) - Caddy must be rootful (needs host network for ports 80/443)
- Cockpit-ws uses
--local-sessionbehind Caddy/AuthCrunch (no double auth) - Must not require changes to the AtomixOS base image or schema beyond
.buildsupport - Tutorial values (tenant ID, client ID, domain) must use obvious placeholders
- Must use only config.toml features that exist today or are added as part of this
feature (containers, networks, volumes, builds, bundle files,
- Non-goals:
- Modifying the AtomixOS base image to include Cockpit or cockpit-podman
- Production-hardening the example (certificate pinning, secret rotation, HA)
- SAML providers (tutorial focuses on OIDC)
- Success criteria:
- An operator can copy the tutorial config, substitute their Azure/domain values, flash a device, and have a working OIDC-authenticated Caddy + Cockpit stack
- The tutorial config passes
first-boot-provision validate - Role mapping is demonstrated: Entra group A gets admin, group B gets user
- The tutorial clearly explains the powerful host socket mounts used by the admin Cockpit container
- Risks and tradeoffs:
- Cockpit local-session risk: Cockpit does not perform a second login. Caddy must
remain the only public entry point and
/cockpit/*must remain admin-only. - AuthCrunch version churn: AuthCrunch/caddy-security evolves rapidly; Caddyfile syntax may change between versions.
- Entra group claim configuration: Requires Azure portal configuration (Token Configuration > Add groups claim) that is outside AtomixOS control.
- Cockpit package drift: Container-installed Cockpit modules may not match host service versions exactly; native host packaging can be added later if needed.
- Cockpit local-session risk: Cockpit does not perform a second login. Caddy must
remain the only public entry point and
- Dependencies:
- Network and volume Quadlet support (completed:
85ec53c) - Bundle file support with
${FILES_DIR}token substitution (completed) - Container, network, volume rendering and sync (completed)
- Quadlet
.buildsupport (completed)
- Network and volume Quadlet support (completed:
- Suggested validation:
first-boot-provision validateon the tutorial config.toml- NixOS VM test importing the tutorial bundle and verifying rendered Quadlet files
- Manual verification with a real Entra tenant (cannot be automated)
- Delivered in:
docs/src/tutorials/oidc-device-management.mdandexample/caddy-oidc/
nixstasis-client
- Status: planned
- Overview: Implement the Nixstasis enrollment client that registers the device with the Nixstasis management server, establishes reverse tunnels, and manages short-lived SSH credentials.
- Requirements:
- Device identifies itself via eth0 MAC address
- Server checks MAC against approved inventory
- Approved devices receive and persist a registration key on
/data - Client establishes reverse tunnel for remote SSH sessions
- Credential rotation for the registration key
- Constraints:
- Must survive container-layer failures (lives in rootfs, not a container)
- Must work with key-only SSH authentication model
- Must not require default credentials
- Non-goals:
- Hosting web management UI on the device
- Fleet orchestration logic (server-side concern)
- Success criteria:
- Device enrolls with Nixstasis server using MAC-based eligibility
- Registration key persists across reboots and updates
- Reverse tunnel enables remote SSH access
- NixOS VM test covers enrollment and tunnel lifecycle
- Risks and tradeoffs:
- Depends on Nixstasis server API being stable and documented
- Tunnel reliability on unstable WAN connections
- Dependencies: None (can start independently)
- Suggested validation:
- NixOS VM test with mock Nixstasis server
- Integration test with real Nixstasis instance
- Suggested first workflow command:
/start-feature nixstasis-client
hawkbit-updates
- Status: planned
- Overview: Configure the
rauc-hawkbit-updaterservice for server-push OTA updates, replacing the simple HTTP polling model for fleet-scale deployments. - Requirements:
- Define hawkBit server configuration and credential provisioning
- Create systemd unit for
rauc-hawkbit-updater - Integrate with existing RAUC slot management
- Add
config.tomlsupport for hawkBit server URL and credentials
- Constraints:
- Must coexist with polling mode (operator chooses one)
- Must not break existing
os-upgrade.servicebehavior - Credentials must not be embedded in the base image
- Non-goals:
- Running a hawkBit server (server-side concern)
- Delta updates
- Success criteria:
- Device registers with hawkBit server and receives push updates
- RAUC install and slot management work identically to polling mode
- NixOS VM test with mock hawkBit server
- Risks and tradeoffs:
- hawkBit server availability becomes a deployment dependency
- Additional credential management complexity
- Dependencies: None
- Suggested validation: NixOS VM test with mock hawkBit DDI API
- Suggested first workflow command:
/start-feature hawkbit-updates
rauc-production-keyring-policy
- Status: planned
- Overview: Make RAUC production images fail closed unless a production keyring is configured, while keeping development and test images explicit about using the repository development CA.
- Requirements:
- Default production behavior must require
atomixos.rauc.keyringCert - Development/test images must explicitly opt into the repository development CA
- VM tests must set the development opt-in where needed
- Documentation must show production and development keyring examples
- Default production behavior must require
- Constraints:
- Must not break local VM development workflows
- Must preserve RAUC signed-bundle verification
- Must keep release image configuration auditable from Nix options
- Non-goals:
- Replacing RAUC
- Managing production CA issuance or rotation server-side
- Success criteria:
- A release image without
keyringCertfails evaluation or build - Development images continue to build only with an explicit dev-keyring opt-in
- Docs clearly state that the repository dev CA is never acceptable for production OTA
- A release image without
- Risks and tradeoffs:
- Existing ad hoc test images may need option updates
- Operators need a documented CA provisioning workflow before release builds
- Dependencies: RAUC module options from provisioning API service foundation
- Suggested validation: Nix evaluation tests for both fail-closed and dev opt-in modes
- Suggested first workflow command:
/start-feature rauc-production-keyring-policy
provisioning-api-privilege-separation
- Status: planned
- Overview: Split the network-facing provisioning API process from privileged host mutation helpers. The web process should run unprivileged and call a narrow, auditable helper for config promotion, service activation, firewall changes, and socket rebinding.
- Requirements:
- Run the Litestar/uvicorn service as an unprivileged user
- Define a minimal privileged helper interface for apply/recover/activate actions
- Preserve single-flight apply semantics and job progress reporting
- Preserve first-boot bootstrap behavior and SSH-signed reapply behavior
- Ensure helper inputs are validated and scoped to
/data/config
- Constraints:
- Must work with read-only rootfs and mutable
/data - Must avoid adding DB, Redis, or heavyweight IPC dependencies
- Must not regress first-boot operator workflow
- Must work with read-only rootfs and mutable
- Non-goals:
- Full multi-tenant authorization model
- Remote fleet orchestration
- Success criteria:
- Compromise of the HTTP process does not directly grant root shell or arbitrary filesystem mutation
- Apply/recover/rollback paths still pass existing Python and Nix VM tests
- Systemd hardening is documented and enforced in the service unit
- Risks and tradeoffs:
- Helper boundary adds implementation and test complexity
- Progress reporting may need a simple IPC contract
- Dependencies: Provisioning API foundation
- Suggested validation: VM test proving unprivileged service can provision via helper
- Suggested first workflow command:
/start-feature provisioning-api-privilege-separation
provisioning-api-live-schema-contract
- Status: planned
- Overview: Treat the live OpenAPI schema exposed by the provisioning service as a supported client contract, not incidental framework output.
- Requirements:
- Keep API routes documented with accurate request bodies, headers, responses, and error shapes
- Exclude Boot UI/static routes from the API schema unless deliberately documented
- Add tests that assert schema coverage for new API endpoints
- Preserve operation IDs and domain tags for client generation
- Constraints:
- Live schema exposure is intentional for online clients
- Must not expose inaccurate write-only implementation routes
- Must keep schema generation dependency-light
- Non-goals:
- Replacing
config.tomlas the canonical import/export artifact - Adding OAuth/JWT solely for docs access
- Replacing
- Success criteria:
- Generated clients can submit config, poll jobs, validate config, and handle errors using the live schema
- CI fails when a new API route lacks schema assertions
- Risks and tradeoffs:
- Litestar defaults may need explicit overrides for raw binary endpoints
- Schema tests add maintenance cost but prevent client drift
- Dependencies: Provisioning API foundation
- Suggested validation: Python tests against
/schema/openapi.json - Suggested first workflow command:
/start-feature provisioning-api-live-schema-contract
typed-partial-provisioning-api
- Status: planned
- Overview: Add typed partial configuration endpoints for common operations while
preserving
config.tomland bundles as the canonical import/export/backup format. Partial changes must always produce a full desired state and reuse the existing validate, render, promote, activate, and rollback pipeline. - Requirements:
- Add typed endpoints for users, network/LAN settings, container services, volumes, and firewall inbound rules in priority order
- Load current desired state, apply the typed patch, validate the full result, render a candidate, promote atomically, activate, and roll back on failure
- Return async jobs with progress just like full config submission
- Preserve config export/backup semantics after partial changes
- Constraints:
- Must not mutate derived files directly under
/data/config - Must not introduce a database or divergent state store
- Must keep full config import behavior authoritative
- Must not mutate derived files directly under
- Non-goals:
- Arbitrary JSON patch over internal rendered state
- Fleet-level orchestration
- Success criteria:
- Partial updates and full config imports converge on the same on-disk desired state
- Failed partial updates roll back identically to failed full imports
- Live OpenAPI accurately documents each typed endpoint
- Risks and tradeoffs:
- More API surface increases schema and validation maintenance
- Some edits may require restart ordering or health semantics not yet modeled
- Dependencies: Provisioning API foundation, live schema contract
- Suggested validation: Python tests for typed patch-to-full-state conversion plus VM tests for at least one user and one container partial update
- Suggested first workflow command:
/start-feature typed-partial-provisioning-api
boot-ui-htmx
- Status: planned
- Overview: Redesign the first-boot Boot UI as a small server-rendered HTMX interface while preserving the current upload/paste provisioning flow and bootstrap CSRF token controls.
- Requirements:
- Keep first-boot UI available only before provisioning completes
- Preserve upload and paste config paths
- Show async job progress using the returned job URL
- Reuse server-rendered fragments; no SPA/Vite dependency
- Maintain Host/Origin/Referer protections and bootstrap token checks
- Constraints:
- Must fit embedded rootfs constraints
- Must not add a separate frontend build pipeline unless justified
- Must not introduce unauthenticated post-provision mutation paths
- Non-goals:
- Full on-device management UI
- Replacing programmatic
/api/config
- Success criteria:
- Operator can provision from desktop and mobile browsers
- UI reflects validation/apply progress and final forwarding URL
- UI tests cover first-boot only exposure and CSRF failure paths
- Risks and tradeoffs:
- More UI affordances increase bootstrap attack surface if not carefully scoped
- HTMX fragments must stay aligned with API/job behavior
- Dependencies: Provisioning API foundation
- Suggested validation: Python route tests and manual browser test in VM
- Suggested first workflow command:
/start-feature boot-ui-htmx
watchdog-enforcement
- Status: deferred
- Overview: Enable hardware watchdog enforcement with
RuntimeWatchdogSec=30sandRebootWatchdogSec=10minon Rock64. - Requirements:
- Complete Rock64 boot-reliability validation
- Enable systemd manager watchdog settings
- Verify watchdog-triggered reboots feed into boot-count rollback
- Constraints:
- Must not cause false-positive reboot loops during normal operation
- Must be validated on physical hardware before enabling
- Non-goals: Software-only watchdog
- Success criteria:
- Watchdog reboots device within 30s of systemd hang
- 3 consecutive watchdog reboots trigger automatic slot rollback
- No false triggers during normal 72-hour soak test
- Risks and tradeoffs:
- Aggressive timeout may cause false triggers on slow boots
- Cannot be fully validated in QEMU
- Dependencies: Physical hardware availability for soak testing
- Suggested validation: 72-hour soak test on physical Rock64
- Suggested first workflow command:
/start-feature watchdog-enforcement
usb-wifi
- Status: deferred
- Overview: Enable WiFi support for selected USB WiFi hardware.
- Requirements:
- Select supported USB WiFi chipset and firmware
- Enable kernel WiFi and Bluetooth stacks
- Add WiFi NIC to systemd
.linknaming - Define WiFi role (WAN backup? LAN extension?)
- Constraints:
- Must not increase rootfs closure beyond 1 GB slot limit
- Firmware must be redistributable
- Non-goals: Access point mode (initially)
- Success criteria: WiFi interface comes up and connects to configured network
- Risks and tradeoffs:
- Firmware blob licensing and size
- WiFi reliability on embedded hardware
- Unclear network role
- Dependencies: Hardware selection
- Suggested validation: Hardware test with selected adapter
- Suggested first workflow command:
/start-feature usb-wifi
config-reapply-improvements
- Status: planned
- Overview: Harden the existing config re-apply path (
POST /api/configon the always-running bootstrap server) with authentication, atomic replacement, and rollback-on-failure. The basic re-apply mechanism already works: any POST overwrites/data/configand triggersquadlet-syncto restart services. - Requirements:
- Authentication guard on the re-apply endpoint (not a reset token)
- Atomic replacement of
/data/config(write to temp, swap on success) - Validate new config before replacing old config
- Rollback to previous config if new config’s services fail to start
- Constraints:
- Must not touch
/dataoutside of/data/config - Must not break the existing unguarded first-provision flow on fresh devices
- Authentication mechanism must work on LAN-local without external dependencies
- Must not touch
- Non-goals:
- Full
/datawipe (separate operation) - Partial config updates (always full replacement)
- Changing the existing provisioning flow for fresh devices
- Full
- Success criteria:
- Unauthenticated POST to
/api/configis rejected on an already-provisioned device - Authenticated POST atomically replaces config and restarts services
- Crash during replacement leaves previous config intact
- Failed service startup triggers automatic rollback to previous config
- Unauthenticated POST to
- Risks and tradeoffs:
- Container state (volumes, data) may be inconsistent after rollback
- Service downtime during re-apply is unavoidable
- Authentication mechanism choice affects operational complexity
- Dependencies: None (existing mechanism works; this is hardening)
- Suggested validation: NixOS VM test with sequential config imports, crash simulation, and rollback verification
- Suggested first workflow command:
/start-feature config-reapply-improvements