Config Reapply Improvements
Summary
Harden the existing config.toml re-apply path and formalize the configuration contract. The feature keeps first-time
provisioning local and unblocked, but requires authenticated, validated, atomic replacement for already-provisioned
devices. It also moves OS/device settings into explicit top-level sections and nests all container-related Quadlet config
under [containers].
Project Plan Source
This feature is seeded from docs/src/planned-features.md entry config-reapply-improvements, plus the feature request
to restructure config.toml around [users], [network], and [containers], and to introduce an official schema.
Goals
- Reject unauthenticated
POST /api/configrequests on already-provisioned devices. - Validate
config.tomlagainst an official schema before any persistent state is replaced. - Replace
/data/configatomically enough that crashes do not leave partially imported state. - Roll back to the previous config when service activation fails after re-apply.
- Reserve top-level
config.tomlsections for OS/device configuration. - Move all container, network, volume, and build Quadlet configuration under
[containers]. - Introduce structured top-level
[users]and[network]sections. - Manage local users declared under
[users.<name>]. - Preserve the fresh-flash provisioning path from
/boot, USB, and the bootstrap UI.
Non-Goals
- Full
/datawipe or factory reset behavior. - Partial config updates; re-apply remains a full replacement operation.
- Changing the A/B update model or RAUC slot confirmation semantics beyond checking re-applied services.
- Adding remote fleet management or Nixstasis integration.
- Making
config.tomla general-purpose Linux distribution configuration format.
Current Behavior
scripts/first-boot-provision.py owns config parsing, import, bootstrap UI, and POST /api/config. The current format
uses top-level [admin], [firewall], [activation], optional [lan], optional [os_upgrade], and top-level Quadlet
tables such as [container.<name>], [network.<name>], [volume.<name>], and [build.<name>].
The existing import path writes derived state under /data/config, including:
config.tomladmin-signersssh-authorized-keys/adminfirewall-inbound.jsonlan-settings.jsonos-upgrade.jsonquadlet/quadlet-runtime.json
The base image currently sets users.mutableUsers = false and declares only fixed service users such as appsvc.
OpenSSH already reads authorized keys from /data/config/ssh-authorized-keys/%u, but arbitrary config-declared
users do not exist unless a runtime apply step materializes them.
The planned feature states that basic re-apply already works by accepting a POST, overwriting /data/config, and running
Quadlet sync. This feature narrows that behavior into a safer state-machine.
Config Contract
Top-Level Sections
Top-level sections are reserved for OS/device configuration:
[users][network][activation][os_upgrade][containers]
The prior top-level [admin], [firewall], [lan], [container], [network] as Quadlet networks, [volume], and
[build] tables are rejected by the new schema. The schema uses version = 2 for this intentionally breaking config
shape. AtomixOS is still unreleased and in design/testing, so this feature does not need a compatibility or migration
path for earlier test configs. Existing examples and docs must be updated in the same unit of work.
Users
[users] contains named local users. The implementation must manage declared users, including creating or updating local
accounts and their SSH authorized keys from the config.
Example:
[users]
[users.admin]
isAdmin = true
ssh_key = "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCt5v7m8X9Zl5n"
[users.guest]
isAdmin = false
ssh_key = ""
Rules:
isAdmindefaults tofalse.ssh_keydefaults to an empty string.- At least one admin user with a non-empty SSH public key is required before first boot can complete.
- Empty SSH keys are ignored, not written as authorized key lines.
- Admin users are members of
wheelandpodman(when those groups exist); non-admin users are not. - Removed users from a re-applied config are disabled or locked rather than silently retaining access.
- Usernames must be validated against a narrow safe pattern and must not collide with protected system users or unmanaged existing local accounts.
- The existing password-locked, key-only SSH model remains mandatory for all managed users.
Because the root filesystem is an immutable squashfs with an ephemeral overlay, managed users must be derived from
persisted config on every boot or re-apply. The import path should write normalized user state under /data/config, and
a dedicated runtime apply step should materialize those users and groups before SSH access is expected. The apply step
must preserve fixed system users such as root and appsvc; operator accounts, including an optional admin username,
come from [users.<name>] config.
Network
[network] contains device networking, DNS, dnsmasq, and firewall configuration.
The implemented schema covers the LAN gateway and firewall controls this feature wires into runtime services:
- dnsmasq enablement and dnsmasq LAN configuration.
- upstream NTP servers for chrony, defaulting to Cloudflare NTP.
- Firewall rules equivalent to the current provisioned firewall model.
DNS servers, DNS search domains, arbitrary interface configuration, and default gateway configuration are deferred until runtime support is implemented.
The default network behavior remains the current LAN gateway design:
eth0is WAN.eth1is LAN.- dnsmasq is enabled by default.
- LAN gateway defaults to
172.20.30.1/24. - DHCP serves the existing
172.20.30.10through172.20.30.254range unless overridden. - DHCP option 3, 6, and 42 point at the gateway IP.
- DNS remains gateway-local by default.
- NTP is served to LAN clients by chrony.
- IP forwarding remains disabled.
Containers
[containers] is the only top-level section for operator-provisioned Quadlet config. It contains nested sections for
container units and supporting units.
The canonical structure should be:
[containers.container.example]
privileged = false
[containers.container.example.Container]
Image = "docker.io/library/nginx:latest"
[containers.network.app]
[containers.network.app.Network]
Subnet = "10.89.0.0/24"
[containers.volume.data]
[containers.volume.data.Volume]
Driver = "local"
[containers.build.custom]
[containers.build.custom.Build]
File = "${FILES_DIR}/Containerfile"
ImageTag = "localhost/custom:latest"
Rules:
- Container units continue to use the existing rootful/rootless safety boundary.
- Network and volume Quadlet units remain rootful.
${CONFIG_DIR}and${FILES_DIR}substitution behavior remains unchanged.quadlet-runtime.jsonremains the authoritative runtime metadata for sync.
Official Schema
The repository already has schemas/config.schema.json and a small in-repo schema validator in
scripts/first-boot-provision.py. This feature must replace that schema with the new canonical config.toml contract and
keep the in-repo validator unless implementation proves it cannot express a required rule. The schema should produce clear
path-specific errors such as network.interfaces.eth1.address must be a CIDR string.
Schema requirements:
- Validate allowed and required keys.
- Validate types, defaults, enums, and port ranges.
- Validate cross-field constraints, such as DHCP range matching the LAN subnet.
- Validate that required service names reference rendered Quadlet units.
- Validate that at least one admin SSH key exists.
- Reject legacy top-level config sections rather than silently accepting or migrating them.
- Be usable by
first-boot-provision validateand by the bootstrap API before persistent writes.
Avoid adding a third-party schema dependency unless the in-repo validator cannot support a required rule within a small, auditable implementation.
Reapply Flow
Fresh provisioning remains unauthenticated because the device has no prior operator credential. Re-apply on an already provisioned device must require authentication before accepting config bytes.
Proposed flow:
- Receive
config.tomlor supported config bundle. - If
/data/config/config.tomlalready exists, require LAN-local authentication. - Unpack and validate the candidate config in a temporary directory outside active
/data/configstate. - Render all derived state into a candidate config directory.
- Snapshot or rename the previous
/data/configinto a rollback location. - Atomically promote the candidate directory into
/data/config. - Apply LAN, firewall, and Quadlet sync using the same services as boot.
- Confirm required services become healthy.
- Delete or age out rollback state only after successful apply.
- Restore the previous config and re-apply it if activation fails.
Authentication uses an SSH-key challenge-response with an existing admin SSH key. The device issues a nonce for a short validity window, and the operator signs a request-bound message containing the nonce, target path, and SHA-256 digest of the raw request body. The device verifies the signature against active admin signer keys before accepting or processing request content. This keeps re-apply LAN-local, avoids default credentials, and reuses the existing key-only operator trust model.
Failure Handling
- Invalid TOML or schema errors return a non-2xx response and leave active config untouched.
- Failed candidate rendering leaves active config untouched.
- Failed authentication returns a non-2xx response before parsing or processing request content.
- Crash before promotion leaves active config untouched.
- Crash after promotion but before confirmation must be recoverable on next boot or next apply by detecting incomplete re-apply state.
- Failed service activation restores previous config and reports the failed services.
- Rollback must not delete container volumes or arbitrary
/datacontent.
Documentation Impact
Likely affected pages:
docs/src/provisioning.mddocs/src/provisioning/lan-range.mddocs/src/data-flow.mddocs/src/runtime-boundaries.mddocs/src/tutorials/oidc-device-management.mddocs/src/specs/lan-gateway.mddocs/src/specs/update-confirmation.mddocs/src/code-reference/scripts.mddocs/src/code-reference/modules.mddocs/src/features/caddy-authcrunch-cockpit-tutorial/design.mdschemas/config.schema.jsonmodules/base.nixmodules/first-boot.nixexample/caddy-oidc/config.toml
Validation Plan
- Unit tests for schema validation, defaults, and path-specific error messages.
- Unit tests for legacy top-level config tables being rejected.
- Tests for
[users]admin key extraction and empty-key handling. - Tests for managed user creation/update/disable behavior.
- Tests for managed users being re-materialized from
/data/configafter reboot. - Tests for
[network]defaults matching current LAN gateway behavior. - Tests for SSH-key challenge-response authentication success and failure paths.
- Tests for candidate config rendering without touching active
/data/config. - VM test for authenticated re-apply success.
- VM test for unauthenticated re-apply rejection on an already-provisioned device.
- VM test for invalid config preserving previous state.
- VM test for activation failure rolling back to previous config.
Risks
- Restructuring
config.tomlintentionally breaks earlier test configs; examples and docs must be updated with the code. - Runtime user management conflicts with the current
users.mutableUsers = falseposture unless implemented as an explicit apply service that safely materializes/data/configuser state on each boot. - Authentication design can become too complex for local recovery if it depends on external services.
- Atomic directory replacement on
/datamust be implemented carefully on f2fs. - Service rollback can restore config files but cannot guarantee application-level container data consistency.
- Adding a third-party schema dependency may increase image closure size.