Skip to content

Spawn-session cgroup isolation

forge_spawn_session.sh and the /resume skill wrap tmux new-session with systemd-run --user --scope --quiet --unit=forge-spawn-<name> --collect. This migrates the spawned tmux server (and every claude inside it) out of the caller's service cgroup into a transient scope under user.slice.

Why

Without the wrap, every spawned tmux inherits its parent service cgroup. With systemd's default KillMode=control-group, systemctl restart <bot> SIGTERMs the entire cgroup and kills every Remote Control session in one shot. 2026-05-04: a bridge restart for an unrelated sandbox-drift fix wiped 9 active sessions.

Required state

  • loginctl enable-linger justinwieb (so user.slice survives logout). Linger is set, audit with loginctl show-user justinwieb | grep Linger.
  • Bot units that call the spawn script have KillMode=mixed drop-ins:
  • /etc/systemd/system/forge-remote-bridge.service.d/killmode-mixed.conf
  • /etc/systemd/system/forge-lifeos-coordinator.service.d/killmode-mixed.conf
  • Any future bot script that calls forge_spawn_session.sh needs the same drop-in. Check with systemctl show <unit> -p KillMode --value.
  • XDG_RUNTIME_DIR defaults to /run/user/1000; spawn script + /resume skill set it explicitly so calls from minimal-env contexts (cron, systemd-run --pipe) still find the user bus.

Side effect: memory accounting

The bridge memory-cap.conf cgroup memory cap no longer covers spawned claudes (they live in user.slice now). This is the desired tradeoff: the 2026-05-03 OOM/manual-reboot was driven by accumulated spawn memory inflating the bridge cgroup. Bot itself is still capped; runaway spawned claudes now cap at user.slice (effectively uncapped on a single-user box, fine for now).

Verification

# After a bridge restart, spawned sessions should still be alive:
systemctl restart forge-remote-bridge.service
sleep 2
tmux ls   # expect every spawn-* / resume-* session still listed

Fallback

If systemd-run is unavailable or the user bus is unreachable, the spawn script and /resume skill fall back to bare tmux -L <socket> new-session (legacy behavior). Spawned sessions in that fallback path WILL die on parent service restart.

[Claude Code]