Feature: durable-journald-logs
Overview
Why
- The current root filesystem uses a tmpfs-backed overlay, so host logs disappear on reboot and power loss. That makes boot failures, failed updates, watchdog resets, and rollback events hard to reconstruct after the device recovers.
- At the same time, the image intentionally favors tmpfs-backed runtime state to reduce eMMC write amplification. The logging design needs to preserve that wear benefit while still keeping the most important forensic breadcrumbs durable.
What Changes
- Keep general host journald tmpfs-first during runtime as the log ingress point, without allowing journald itself to write routine logs directly to persistent media
- Add an
rsyslogRAM queue behind volatile journald so general host and container logs are collected in memory and written to/data/logsin large, infrequent, sequential batches rather than in many small writes - Flush the in-memory buffered log queue to
/data/logsduring orderly shutdown so the last clean shutdown captures the latest buffered host diagnostics - Align Podman container logging with the same journald-plus-rsyslog buffering policy so routine application logs also remain memory-first during runtime while following the same large-batch persistent append path
- Document retention, durability guarantees, and the boundary between durable host forensics and buffered general host/application logs
Capabilities
Modified Capabilities
durable-journald-logs: Tmpfs-first host journald feeding an in-memoryrsyslogbatch queue, with large sequential appends to/data/logs, orderly shutdown flush, and Podman logging aligned to the same buffering model
Impact
- Affected code: boot/update services, host logging configuration in the
base system, and the
/datalog export path - Affected docs: partition layout, boot/update architecture, and operational debugging guidance
- Operational impact: Devices keep normal host and application logging
memory-first during runtime and append broader diagnostics to
/data/logsin large, sequential batches plus orderly shutdown flushes
This change resolves the broader tmpfs-first journald plus /data durable
logging path explicitly as a RAM-queued rsyslog batch append path to
/data/logs rather than timer-driven journal checkpoints.
Design
Context
The Rock64 image runs from a read-only squashfs with a tmpfs-backed overlay for mutable root state. That keeps the runtime root clean on every boot and minimizes steady-state eMMC wear, but it also means host logs are currently volatile unless they are copied elsewhere.
The failures this project cares about most are lifecycle failures: boot regressions, failed update confirmation, watchdog resets, rollback decisions, networking bring-up issues, and first-boot provisioning problems. Those events need post-reboot forensic visibility, especially when a device recovers only after a power cycle or slot rollback.
The device already has two different writable surfaces with different roles:
/bootis mounted from the active slot’s FAT boot partition and is available as part of the boot/update path/datais the main persistent mutable partition for provisioned state and application data
Applications are expected to run in Podman, with container storage already
rooted on /data. That means this change does not need to invent a general
persistent application-data model. The problem here is host-platform forensics
under tight write-wear constraints.
Goals / Non-Goals
Goals:
- Preserve a bounded record of critical host lifecycle events across reboot, rollback, and as many power-loss cases as practical
- Keep normal host logging memory-first so the image retains the write-reduction benefits of its tmpfs-backed runtime model
- Reserve a fixed forensic budget on each boot slot with deterministic retention behavior
- Make the durability guarantees explicit: critical mirrored events are durable-first, general runtime logs are not
Non-Goals:
- Shipping a remote log collection pipeline or external log forwarding
- Making the full host journal power-loss durable
- Defining long-term application log retention policy beyond establishing the boundary with Podman logging
- Changing the overall partition layout or introducing a dedicated log partition
Decisions
1. Use a three-tier logging model
Choice: Split logging into three tiers with different durability and wear profiles.
Tier 0: Slot-local forensic black box on /boot
Tier 1: Volatile host journal in memory
Tier 2: RAM-queued batched log appends to /data/logs
Tier 0 exists for the small set of events that must survive reboot and should
be made as power-loss resistant as practical. Tier 1 keeps normal host logging
in RAM so the device does not continuously write routine logs to eMMC. Tier 2
captures the bounded /data export path for richer general diagnostics,
including Podman log traffic that is routed through journald and then queued in
RAM before being appended to /data/logs in large sequential batches.
Alternatives considered:
- Single persistent journald store: simpler conceptually, but undermines the memory-first wear model by making all host logging durable.
- No distinction between forensic and buffered logs: blurs the durability boundary and makes scope creep likely.
2. Reserve 28 MiB per boot slot for Tier 0 forensic storage
Choice: Dedicate up to 28 MiB of each 128 MiB boot partition to bounded forensic storage.
Rationale: The current kernel/initrd/DTB payload is well below the full
boot partition size, and the user explicitly wants a slot-local forensic
reserve that survives with the slot. 28 MiB is large enough for a substantial
lifecycle event history while still preserving generous headroom for future
kernel and initrd growth.
This storage should behave like a black box, not a general-purpose filesystem for arbitrary logs. A fixed budget makes retention deterministic and prevents forensic artifacts from crowding out boot assets.
Alternatives considered:
- Store Tier 0 only on
/data: simpler long-term store, but loses the advantage of slot-local, early-available forensic state. - Use a smaller budget: safer for boot growth but less useful for field debugging.
- Use a larger budget: possible, but starts trading too much future boot payload headroom for logs.
3. Runtime durability should come from buffered /data/logs appends
Choice: Keep journald volatile and use rsyslog with an in-memory queue
to append larger batches to /data/logs.
Rationale: The active runtime goal is to preserve the eMMC-wear benefits of
tmpfs-first logging while still making broader host and container diagnostics
available after clean operation and orderly shutdown. Buffered appends to
/data/logs provide a simpler path than a separate slot-local Tier 0 store and
match the logging model now used by the runtime services.
Alternatives considered:
- Make journald persistent directly: simpler pipeline, but increases write amplification during steady-state operation.
- Keep a separate slot-local forensic ring: more durable for a narrow set of events, but adds another logging path and extra boot-partition complexity.
with optional fields such as:
resulttarget_slotreasonversiondeviceserviceattemptdetail
The boot_id + seq pairing is intentional:
boot_idgroups events by boot session for cleaner forensic readingseqprovides strict ordering even when timestamps are coarse, identical, or corrected later by NTP
This is stronger than timestamps alone and simpler than a single global persistent sequence spanning all boots.
Alternatives considered:
- JSON lines: more machine-friendly, but more verbose and more awkward to emit robustly from shell-heavy boot paths.
- Binary records: more deterministic, but far less debuggable in the field.
- Timestamp-only ordering: too weak for early boot and near-simultaneous events.
5. Mirror only critical lifecycle events into Tier 0
Choice: Limit Tier 0 to a narrow event taxonomy instead of trying to mirror the full journal.
Rationale: The goal is post-failure reconstruction, not durable storage for all log chatter. A smaller event vocabulary keeps write volume low and makes the black-box log more useful during triage.
The initial event taxonomy should include these stage names:
initrdbootfirstbootraucverifyrollbackwatchdogshutdown
The initial event set should include:
- initrd and boot progression markers
- active slot and rootfs selection markers
/datamount success or failurefirst-bootstart and completion- RAUC install start, success, and failure
- update-confirmation start, success, and failure
- rollback detection
- watchdog-related reset markers or inferred reboot cause markers
- orderly shutdown flush begin and end
Representative event names include:
boot-startlowerdev-selectedrootfs-mount-okrootfs-mount-faileduserspace-startdata-mount-okdata-mount-failedboot-completestartcompletefailedinstall-startinstall-completeinstall-failedmark-good-startmark-good-completemark-good-faileddetectedslot-fallbackboot-attempt-exhaustedarmedreboot-inferredflush-beginflush-endreboot-requestedpoweroff-requested
Alternatives considered:
- Mirror the whole journal: too write-heavy and defeats the purpose of volatile-first logging.
- Log only RAUC events: too narrow; boot and watchdog failures would still be opaque.
6. Tier 0 events should be written with durable-first semantics
Choice: Treat each Tier 0 write as an immediate durability event and flush it explicitly.
Rationale: The whole point of Tier 0 is surviving the cases where Tier 1 volatile logs disappear. Each critical event should therefore be written and flushed in a way that minimizes exposure to power loss.
This does not create a theoretical guarantee against every possible corruption mode, but it does create the strongest practical durability semantics in the current storage model.
Alternatives considered:
- Batch writes for efficiency: lower write overhead, but directly weakens the power-loss guarantee.
- Rely on periodic journal export only: leaves exactly the most important events vulnerable.
7. Keep Tier 1 journald tmpfs-first with bounded loss
Choice: Keep normal host journald storage in tmpfs during runtime, bound
its runtime usage with an explicit cap, and treat it as the ingestion point for
an in-memory rsyslog queue rather than as the persistent log store.
Rationale: This preserves the eMMC-wear benefits of the tmpfs-backed system while reducing the blast radius of abrupt power loss for general diagnostics. Tier 0 still carries the always-durable lifecycle breadcrumbs, while Tier 1 provides the live message stream without allowing journald itself to emit many small persistent writes.
Alternatives considered:
- Fully persistent journald on
/data: simpler durability story, but keeps routine host logging write-heavy all the time. - Purely volatile journald forever: preserves wear benefits, but discards too much general diagnostic history on power loss and reboot.
- No journald cap: risks memory pressure from noisy services.
8. Use rsyslog as the Layer 2 RAM queue and batch writer
Choice: Introduce rsyslog behind volatile journald and configure it with
an in-memory queue that appends buffered log data to /data/logs in large,
infrequent, sequential writes.
Rationale: The point of the durable host log path is not merely to keep
logs in RAM longer. It is to transform many small writes into much larger,
more sequential writes that are friendlier to eMMC wear characteristics.
rsyslog provides a mature queueing model for this that journald alone does
not expose as clearly.
The batch queue should remain RAM-backed during normal operation. The
persistent path should be append-oriented, size-bounded, and rotated in larger
chunks under /data/logs. Orderly shutdown should flush queued log data so the
latest clean shutdown preserves the most recent buffered diagnostics.
This change does not introduce log2ram; the system already runs on a
tmpfs-backed overlay, so adding another general /var/log RAM shim would be
redundant complexity for this design.
Alternatives considered:
- Timer-driven journal checkpoints: better than fully persistent journald, but still less explicit about batching policy and sequential write behavior.
- Direct continuous journald persistence: simpler, but worse for write amplification.
log2ramplus ad hoc file syncing: redundant with the existing overlay model and less targeted than an explicit queued logging layer.
9. Align Podman logging with the buffered journald policy
Choice: Set Podman’s container log_driver to journald so routine
application stdout and stderr follow the same tmpfs-first journald ingestion,
RAM-queued rsyslog buffering, large sequential /data/logs append path, and
shutdown-flush behavior as host logs.
Rationale: Applications run in Podman and their durable state already lives
on /data, but application stdout/stderr retention is a different question
from host lifecycle forensics. Pinning the log driver avoids drift in Podman
defaults and keeps application log behavior aligned with the explored logging
boundary instead of creating a separate file-backed log path with different
durability semantics.
Alternatives considered:
- Make app logs part of Tier 0: too broad and too write-heavy.
- Ignore app logs entirely: leaves an important design boundary undocumented.
Tier 1 / Tier 2 Resolution
The recovered explore session was most settled on the Tier 0 /boot forensic
model. The broader journald-to-/data path was clearly part of the intended
architecture, but some details remained less fully pinned down at explore time.
This change resolves the main Layer 2 open question explicitly:
- use volatile journald as the entry point
- use an in-memory
rsyslogqueue as the batching layer - append to
/data/logsin large sequential writes - keep shutdown flush as a secondary durability improvement
The remaining implementation-time decisions are narrower:
- exact queue sizing and dequeue batch thresholds
- exact rotation and retention policy under
/data/logs - exact journald filtering and rate-limiting thresholds
- whether
/datashould also gain mount options such asnoatime
Risks / Trade-offs
- FAT boot storage is not a perfect forensic medium -> Mitigate by keeping the Tier 0 format simple, bounded, append-oriented, and tolerant of a torn final line.
- Immediate durable writes still create some wear -> Acceptable because Tier 0 is intentionally tiny and event-limited.
- Slot-local boot logs may not follow the active slot after rollback -> This is partly a feature, because each slot preserves its own recent history; docs should make that mental model clear.
- Application log volume could pressure the buffered journal path ->
Mitigate by pinning Podman to
journald, keeping the host runtime cap explicit, and bounding thersyslogRAM queue plus/data/logsrotation budget. - An extra logging daemon increases moving parts -> Acceptable because it provides explicit queueing and batching behavior that directly serves the eMMC longevity goal.
- Metadata corruption could obscure the active segment -> Keep metadata minimal and recoverable by scanning segment files if needed.
Post-Review Hardening
The initial implementation satisfied the core change goals, but a later review found a small set of durability and correctness gaps that were fixed before closing validation.
- The active-slot forensic mount helper now verifies an actual mount via
findmntinstead of assuming directory existence implies durable boot storage. This prevents Tier 0 writes from silently falling back to tmpfs. - The initrd forensic helper now fails explicitly on missing boot-device or mount prerequisites instead of silently succeeding. This keeps early lifecycle markers aligned with the change’s durable-first intent.
- The update-confirmation path now logs
mark-good-completeonly on real success and logsmark-good-failedplusverify failedon failure in both the first-boot fallback and post-health-check confirmation paths. - RAUC status parsing was corrected to read the keyed slot structure returned
by
rauc status --output-format=json, avoiding false “already good” or incorrect current-version decisions. - Routine polling and “no update” style upgrade chatter was removed from Tier 0 so the durable forensic budget stays focused on high-value lifecycle evidence.
- Regression coverage was extended to include mount-selection behavior and an
explicit negative
mark-goodconfirmation path inrauc-confirm, including test harness steps needed to avoid stale cached RAUC state across phases.
Migration Plan
Existing devices pick up the new configuration on the next deployed image. The boot partitions gain a reserved forensic directory within the existing slot budget, and host lifecycle services begin mirroring critical events into that bounded store.
Rollback is straightforward: remove the Tier 0 writer and return to the prior general journald policy. No data migration is required for Tier 0 because the forensic store is bounded, slot-local, and self-contained.
Final Scope Notes
- The explored design did not choose always-persistent journald on
/data. Instead it chose tmpfs-first journald feeding a RAM-queuedrsyslogbatch writer to/data/logs, plus orderly shutdown flush, while relying on the bounded Tier 0/bootrecorder for the critical always-durable lifecycle evidence. - Tier 0 remains the power-loss-first forensic layer.
- Tier 1 remains memory-first during runtime.
- Tier 2 trades some immediate durability for much lower write amplification by
appending buffered logs to
/data/logsin larger sequential writes. - Podman logging is pinned to
journaldso container log traffic follows the same buffered host logging path. - Queue sizing, rate limits, and
/data/logsretention remain tunable implementation details rather than core architecture changes.
Requirements
durable-journald-logs
ADDED Requirements
Requirement: Host journald is tmpfs-first during runtime
The system SHALL configure host journald to keep general runtime logs in volatile storage during normal runtime so routine host logging remains memory-first rather than continuously writing to persistent media.
Scenario: Runtime host logs stay memory-first
- WHEN the device writes a non-critical host journal entry during normal runtime
- THEN that entry is written into the volatile runtime journal rather than
directly to persistent journal storage on
/data
Requirement: Runtime journal usage is explicitly bounded
The system SHALL apply an explicit runtime journal size cap so memory-first logging does not grow without bound.
Scenario: Runtime journal stays within the configured cap
- WHEN runtime journal usage reaches the configured storage cap
- THEN journald rotates or removes older runtime journal data before exceeding that cap
Requirement: General logs are written to /data/logs in large sequential batches
The system SHALL use a RAM-queued batching layer behind volatile journald so
general host log data is appended to persistent storage under /data/logs in
large, infrequent, sequential writes rather than in many small direct writes.
Scenario: Buffered host logs are appended during runtime buffering flushes
- WHEN the device continues normal runtime logging and the buffering layer reaches its configured write threshold or flush interval
- THEN buffered general host journal data is appended to persistent storage
under
/data/logsin a large sequential write
Requirement: Buffered general logs are flushed to /data/logs on orderly shutdown
The system SHALL flush the current buffered general log state to persistent
storage under /data/logs during orderly shutdown so the latest clean shutdown
retains the most recent buffered host diagnostics.
Scenario: Orderly shutdown persists buffered host logs
- WHEN the device performs an orderly reboot or poweroff
- THEN the buffered general log queue is flushed to persistent storage
under
/data/logsbefore shutdown completes
Requirement: Container logs follow the same buffered journald boundary
The system SHALL configure Podman to use the journald log driver so routine
container stdout and stderr are recorded through journald instead of file-based
container logs, and SHALL keep those logs inside the same tmpfs-first,
RAM-queued, batched-append, and shutdown-flushed pipeline as other non-Tier 0
logs.
Scenario: Container logs are sent to journald
- WHEN a container writes to stdout or stderr during normal operation
- THEN that log traffic is emitted through journald and follows the same
buffered runtime retention policy and batched
/data/logsappend path as other non-Tier 0 logs
forensic-log-durability
ADDED Requirements
Requirement: Critical lifecycle events are mirrored to slot-local forensic storage
The system SHALL mirror critical host lifecycle events into a bounded forensic store on the active boot slot so they remain available after reboot, slot rollback, and as many power-loss scenarios as practical. The initrd portion of this path remains incomplete until the early-boot persistence design is revised to avoid fragile direct boot-partition mounts during normal initrd execution.
Scenario: Critical boot event is retained after reboot
- WHEN the device records a critical boot or update lifecycle event and then reboots
- THEN that event remains available from the slot-local forensic store after the reboot
Scenario: Failed update leaves forensic evidence
- WHEN an update attempt fails and the device later rolls back to a previous slot
- THEN the affected slot retains its recent mirrored lifecycle events for forensic inspection
Requirement: Slot-local forensic storage is strictly bounded
The system SHALL cap slot-local forensic storage at 28 MiB per boot slot.
The system SHALL represent that budget as seven 4 MiB segment files plus
minimal metadata, and SHALL rotate or overwrite the oldest forensic records
when that limit is reached.
Scenario: Forensic store reaches capacity
- WHEN new mirrored lifecycle events would exceed the
28 MiBstorage budget on a boot slot - THEN the system retains newer events and removes or overwrites the oldest retained forensic records within that slot
Scenario: Segment rollover preserves bounded retention
- WHEN the active
4 MiBsegment fills during normal operation - THEN the system advances to the next segment, reuses the oldest segment
when necessary, and continues writing without exceeding the
28 MiBslot budget
Requirement: Tier 0 records use boot-scoped ordering
The system SHALL encode each Tier 0 forensic record as a single-line key/value
record. Each record SHALL include boot_id, seq, ts, slot, stage, and
event. The system SHALL reset seq at the start of each new boot_id.
Scenario: Events within a boot are strictly ordered
- WHEN multiple Tier 0 events are written during the same boot session
- THEN their
seqvalues increase monotonically within thatboot_id
Scenario: New boot starts a new forensic sequence
- WHEN the device reboots into a new boot session on the same slot
- THEN the device writes records with a new
boot_idand restartsseqfrom the beginning for that boot session
Requirement: Tier 0 event scope is limited to high-value lifecycle records
The system SHALL limit slot-local forensic storage to high-value lifecycle
events. Allowed Tier 0 stages SHALL include initrd, boot, firstboot,
rauc, verify, rollback, watchdog, and shutdown. Tier 0 events SHALL
cover boot progression, slot selection, /data mount outcome, update
lifecycle events, update-confirmation outcome, rollback detection,
watchdog-related reset markers, orderly shutdown flush markers, and managed
reboot or poweroff request markers where those flows are part of the system.
The initrd stage specifically requires redesign before this requirement can be
considered complete.
Scenario: Noisy routine logs are excluded from Tier 0
- WHEN ordinary service or application log traffic is emitted during normal runtime
- THEN that traffic is not mirrored wholesale into the slot-local forensic store
Scenario: Failed slot keeps its own forensic history
- WHEN the device boots into an updated slot, fails, and later rolls back to the previous slot
- THEN the failed slot retains its own recent Tier 0 forensic records on its boot partition for later inspection
Requirement: General host logging remains memory-first outside Tier 0
The system SHALL keep general host journald logging memory-first during runtime and SHALL reserve the slot-local forensic store for critical lifecycle evidence rather than for general-purpose log persistence.
Scenario: Runtime host logs are not automatically durable
- WHEN a non-critical host log entry is written only to the general runtime journal
- THEN that entry is not guaranteed to survive an abrupt power loss
Scenario: Tier 0 remains focused on critical evidence
- WHEN routine host or application log traffic is emitted during normal operation
- THEN that traffic is handled through the general volatile-journald plus RAM-queued batch logging path rather than being mirrored wholesale into the slot-local forensic store
Source Metadata
schema: spec-driven
created: 2026-04-25
Source
Converted from openspec/changes/durable-journald-logs/ during the OpenSpec-to-feature-spec migration.