Google Drive Mount Cleanup, 2026-05-01¶

Status: Done. Console-side rclone removed, NFS pass-through in place, docs updated.

Trigger¶

Boot briefing showed RECOVERY FAILED: /mnt/pve/workspace/Google-Drive still unreachable from forge_monitor_mount_watchdog.sh at 07:20 CDT after Finn's overnight reboots. Investigation revealed an architectural smell: Drive was mounted twice, once on Finn (canonical) and a second redundant rclone overlay on Console at the NFS pass-through path. The Apr 27 journal showed 14,000+ restart loops on Console's rclone unit, confirming chronic instability.

Decision¶

Single source of truth: Finn. Console accesses Drive via NFS pass-through plus the API-direct forge_gdrive_* scripts. No local FUSE on Console.

Changes Made¶

Action	Where	Detail
Killed rclone process	Console	PID 2088, `systemctl kill --signal=SIGKILL`
Stopped + disabled service	Console	`systemctl stop/disable rclone-gdrive.service`
Removed unit file	Console	`/etc/systemd/system/rclone-gdrive.service` deleted, `daemon-reload`
Force-unmounted FUSE	Console	`fusermount -uz` + `umount -l`
Verified NFS pass-through	Console	Write+read+delete cycle on `/mnt/workspace/Google-Drive/.forge-nfs-test` succeeded
Confirmed readdir limitation	Console	`ls /mnt/workspace/Google-Drive/` returns empty; documented as expected FUSE-over-NFS behavior
Updated `system-map/google-drive.md`	Forge	New architecture table + decision tree
Updated `reference_storage_policy.md`	Memory	Console column reflects NFS pass-through caveat
Updated `reference_console_state.md`	Memory	Added "No local rclone-gdrive" note + sudo-via-ssh-loopback sandbox tip

What Still Works¶

/save-to-drive (uses forge_gdoc_to_drive.sh which is API-direct since Phase 4.7, no mount needed)
forge_gdrive_search.py, forge_gdrive_read.py, forge_gdrive_write.py, forge_gdrive_move.py (all API-direct)
Direct file writes/reads from Console at /mnt/workspace/Google-Drive/<known-path> via NFS
Watchdog (forge_monitor_mount_watchdog.sh) still pings Finn's mount every 10 min and auto-recovers

What Doesn't Work, By Design¶

ls /mnt/workspace/Google-Drive/ from Console (returns empty). Use forge_gdrive_search.py instead.
Traversing into subdirs from Console via the NFS path (e.g. cat /mnt/workspace/Google-Drive/Business/foo.docx may return "No such file"). Use forge_gdrive_read.py "Business/foo.docx" instead.

Finn Unit Hardening (added 08:19 same day)¶

Root cause of the post-reboot wedges turned out to be: network-online.target activates immediately on Finn because systemd-networkd-wait-online is disabled (Proxmox uses ifupdown/networking.service, not networkd). So the unit's existing After=network-online.target was a no-op. rclone would launch before DNS / TCP / OAuth were actually ready, fail to mount cleanly, then wedge.

Hardened unit (/etc/systemd/system/rclone-gdrive.service on Finn, backup at .bak.2026-05-01):

Change	Why
`ExecStartPre=until timeout 8 rclone about gdrive: do sleep 5; done`	Real connectivity probe: forces DNS + TCP + OAuth + Drive API roundtrip before mounting. Replaces the lying `network-online.target`.
`After=network-online.target nss-lookup.target` + matching `Wants=`	Belt-and-suspenders ordering, even though precheck is the real gate.
`ExecStop=/bin/fusermount -uz` (was `-u`)	Lazy unmount. Was causing stop-timeouts when FUSE wedged ("Device or resource busy" in journal).
`TimeoutStartSec=180`	Bounds the precheck loop. If Drive API is dead 3+ minutes at boot, fail and let watchdog page.
`TimeoutStopSec=15`	Don't hang shutdown indefinitely.
`Restart=on-failure`, `RestartSec=15` (was 10)	Slightly less aggressive retry.
`StartLimitIntervalSec=600`, `StartLimitBurst=5`	Caps to 5 starts per 10 min. Prevents the Apr 27 14k-restart-loop pattern. After 5 fails, systemd gives up and the watchdog (10-min cron) takes over.

Verified: restarted live, ExecStartPre exited 0, mount populated (48 entries), 16s start time (precheck doing real work).

Open Threads¶

Old log file /var/log/rclone-gdrive.log on Console (599K, last write 07:49) is now orphaned. Logrotate will eventually trim; safe to leave or rm manually.
Notification cleanup. History log entries at 01:40 and 07:20 CDT remain in notifications/history.log for audit.
Validate the hardening across an actual reboot. No way to know for sure until Finn next power-cycles. Watch for clean rclone-gdrive start in journal post-boot.

Verification Snapshot¶

Finn: rclone-gdrive.service active (running), PID 1257, 14h uptime, ls works.
Console: no rclone process, no unit file, NFS mount intact, write+read+delete OK, ls empty.
Watchdog: last run "all monitored mounts healthy" at 07:52 (after auto-recovery).

[Claude Code]