Console Claude Code sandbox drift
URL: https://mkdocs.justinsforge.com/memory/handoffs/console-sandbox-drift-2026-05-04/
The problem¶
Claude Code sessions on Console inherit different mount namespaces depending on how they were launched. This causes intermittent EROFS / read-only failures that look like fleet bugs but are actually sandbox drift.
Today's evidence¶
| Time | Session | What worked / failed |
|---|---|---|
| 11:49 | original CLI session | Successfully wrote /etc/systemd/system/forge-reap-orphan-claudes.{service,timer} and ran daemon-reload. |
| 11:25 | coordinator bot brain | Failed spawn_remote_session with EROFS on /tmp/tmux-1000/. Root cause: systemd unit ProtectSystem=strict + no /tmp in ReadWritePaths. Fixed via drop-in. |
| 12:00+ | resume-cfdc85c8 (this session) |
Cannot write /etc/systemd/system/. Mount namespace mnt:[4026532315]. cat /proc/self/mountinfo confirms /etc is bind-mounted ro within this namespace. Host /etc is fine. |
| 12:00+ | this session | systemctl --user fails with "No medium found" — user DBUS bus inaccessible from this namespace. |
Working theory¶
The Claude Code binary (or its /resume skill, or the spawn / remote-bridge wrapper) sets up its own mount namespace before exec'ing the child. Different entry points use different sandbox profiles:
- CLI direct (claude from a normal shell): looser, can write /etc.
- /resume <uuid>: tighter, /etc bind-mounted ro, user dbus blocked.
- Coordinator bot's brain (running under systemd ProtectSystem=strict): inherits the unit-level sandbox.
- /spawn produced sessions: TBD, possibly looser than /resume.
What to investigate¶
- Where is the mount namespace created?
cat /proc/<pid>/mountinfofor various claude PIDs, diff them.- Strace
claudestartup (strace -e mount,unshare -f claude --version) to see if/when it callsunshare(CLONE_NEWNS)ormount(). -
Check
~/.claude/plugins/and.claude/settings.jsonfor anysandbox:config. -
Is the
/resumepath uniquely tighter? -
Spawn a fresh session via
claudedirectly vsclaude --resume <uuid>, compare mountinfo. -
Can we make it consistent?
- If a settings flag controls it (e.g.
sandbox.profile: "loose" | "strict"), set it consistently. -
If it's hard-coded per entry point, may need a wrapper script that re-execs in a known-good namespace.
-
How does today's installed reaper see /etc?
- The reaper runs as the
justinwiebuser via system-level systemd, so it has the host's mountinfo, not Claude's. That's why earlier writes worked from inside Claude (the original looser namespace) and the reaper itself runs cleanly. The drift only matters for Claude-driven writes.
Root cause (2026-05-04, 13:45 CT)¶
The drift is systemd-side, not Claude Code-side. claude's sandbox.enabled:false setting is irrelevant.
Per-PID mount-namespace audit:
| Service | cgroup | mnt ns | /etc state |
|---|---|---|---|
| forge-tmux-anchor.service (home-base) | forge-tmux-anchor.service | 4026531841 (host) | rw |
| forge-remote-bridge.service (all /spawn, /resume, bot-driven sessions) | forge-remote-bridge.service | 4026532315 (private) | ro |
Why: the bridge unit sets ProtectSystem=strict + ReadWritePaths=.... The drop-in loosen-spawn.conf lowered it to ProtectSystem=full — but full still bind-mounts /etc read-only (per systemd docs: "the /usr/, /boot/ and /etc/ directories are mounted read-only"). And any non-empty ReadWritePaths= forces a private mount namespace regardless. Tmux daemonizes inside that namespace; child claudes inherit it; the namespace lives until the last attached process dies.
The anchor unit has none of these directives, so its tmux runs in the host namespace and its claudes can write /etc/systemd/system/.
ProtectSystem=full was theatrical: the bridge runs claude with --dangerously-skip-permissions and full sudo anyway.
Fix¶
Corrected drop-in staged at forge/infra/systemd/forge-remote-bridge.service.d/loosen-spawn.conf (sets ProtectSystem=no, ProtectHome=no, ReadWritePaths= empty, NoNewPrivileges=false).
Apply from an unrestricted shell (regular SSH terminal on Console, not any current Claude session — they're all in the trapped namespace and can't write /etc):
sudo install -m 0644 \
/home/justinwieb/forge/infra/systemd/forge-remote-bridge.service.d/loosen-spawn.conf \
/etc/systemd/system/forge-remote-bridge.service.d/loosen-spawn.conf
sudo systemctl daemon-reload
sudo systemctl restart forge-remote-bridge.service
# Kill all stale tmux servers so new ones start in the host namespace:
tmux ls 2>/dev/null | awk -F: '{print $1}' | xargs -I{} -r tmux kill-session -t {} || true
pkill -u justinwieb tmux # last resort if servers stick
# Verify on next-spawned claude:
readlink /proc/self/ns/mnt # should equal /proc/1/ns/mnt
Also do the canary-timer steps from "What needs to happen next" #1 in that same shell.
- forge_canary.py is built and tested PASS (3/3 checks). Located at
forge/scripts/forge_canary.py. - Canary user-systemd unit files staged at
~/.config/systemd/user/forge-canary.{service,timer}— NOT yet enabled. - Sandbox drift blocks Justin's current session from running
systemctl --user enable --now forge-canary.timer.
What needs to happen next¶
- Justin runs from an unrestricted shell (regular SSH terminal, not Claude Code):
- Investigate sandbox drift root cause (this handoff's main purpose).
- If a settings fix exists: apply it, document in
reference_claude_code_sandbox.md, add an eval check that fails if mount namespace looks restrictive.
Reference files¶
forge/scripts/forge_canary.pyforge/scripts/forge_reap_orphan_claudes.py(v2)~/.config/systemd/user/forge-canary.{service,timer}/etc/systemd/system/forge-reap-orphan-claudes.{service,timer}/etc/systemd/system/forge-lifeos-coordinator.service.d/override.conf(drop-in adding /tmp)forge/memory/daily/2026-05-04.mdfor full chronological context