Google Drive Mount Cleanup, 2026-05-01¶
Status: Done. Console-side rclone removed, NFS pass-through in place, docs updated.
Trigger¶
Boot briefing showed RECOVERY FAILED: /mnt/pve/workspace/Google-Drive still unreachable from forge_monitor_mount_watchdog.sh at 07:20 CDT after Finn's overnight reboots. Investigation revealed an architectural smell: Drive was mounted twice, once on Finn (canonical) and a second redundant rclone overlay on Console at the NFS pass-through path. The Apr 27 journal showed 14,000+ restart loops on Console's rclone unit, confirming chronic instability.
Decision¶
Single source of truth: Finn. Console accesses Drive via NFS pass-through plus the API-direct forge_gdrive_* scripts. No local FUSE on Console.
Changes Made¶
| Action | Where | Detail |
|---|---|---|
| Killed rclone process | Console | PID 2088, systemctl kill --signal=SIGKILL |
| Stopped + disabled service | Console | systemctl stop/disable rclone-gdrive.service |
| Removed unit file | Console | /etc/systemd/system/rclone-gdrive.service deleted, daemon-reload |
| Force-unmounted FUSE | Console | fusermount -uz + umount -l |
| Verified NFS pass-through | Console | Write+read+delete cycle on /mnt/workspace/Google-Drive/.forge-nfs-test succeeded |
| Confirmed readdir limitation | Console | ls /mnt/workspace/Google-Drive/ returns empty; documented as expected FUSE-over-NFS behavior |
Updated system-map/google-drive.md |
Forge | New architecture table + decision tree |
Updated reference_storage_policy.md |
Memory | Console column reflects NFS pass-through caveat |
Updated reference_console_state.md |
Memory | Added "No local rclone-gdrive" note + sudo-via-ssh-loopback sandbox tip |
What Still Works¶
/save-to-drive(usesforge_gdoc_to_drive.shwhich is API-direct since Phase 4.7, no mount needed)forge_gdrive_search.py,forge_gdrive_read.py,forge_gdrive_write.py,forge_gdrive_move.py(all API-direct)- Direct file writes/reads from Console at
/mnt/workspace/Google-Drive/<known-path>via NFS - Watchdog (
forge_monitor_mount_watchdog.sh) still pings Finn's mount every 10 min and auto-recovers
What Doesn't Work, By Design¶
ls /mnt/workspace/Google-Drive/from Console (returns empty). Useforge_gdrive_search.pyinstead.- Traversing into subdirs from Console via the NFS path (e.g.
cat /mnt/workspace/Google-Drive/Business/foo.docxmay return "No such file"). Useforge_gdrive_read.py "Business/foo.docx"instead.
Finn Unit Hardening (added 08:19 same day)¶
Root cause of the post-reboot wedges turned out to be: network-online.target activates immediately on Finn because systemd-networkd-wait-online is disabled (Proxmox uses ifupdown/networking.service, not networkd). So the unit's existing After=network-online.target was a no-op. rclone would launch before DNS / TCP / OAuth were actually ready, fail to mount cleanly, then wedge.
Hardened unit (/etc/systemd/system/rclone-gdrive.service on Finn, backup at .bak.2026-05-01):
| Change | Why |
|---|---|
ExecStartPre=until timeout 8 rclone about gdrive: do sleep 5; done |
Real connectivity probe: forces DNS + TCP + OAuth + Drive API roundtrip before mounting. Replaces the lying network-online.target. |
After=network-online.target nss-lookup.target + matching Wants= |
Belt-and-suspenders ordering, even though precheck is the real gate. |
ExecStop=/bin/fusermount -uz (was -u) |
Lazy unmount. Was causing stop-timeouts when FUSE wedged ("Device or resource busy" in journal). |
TimeoutStartSec=180 |
Bounds the precheck loop. If Drive API is dead 3+ minutes at boot, fail and let watchdog page. |
TimeoutStopSec=15 |
Don't hang shutdown indefinitely. |
Restart=on-failure, RestartSec=15 (was 10) |
Slightly less aggressive retry. |
StartLimitIntervalSec=600, StartLimitBurst=5 |
Caps to 5 starts per 10 min. Prevents the Apr 27 14k-restart-loop pattern. After 5 fails, systemd gives up and the watchdog (10-min cron) takes over. |
Verified: restarted live, ExecStartPre exited 0, mount populated (48 entries), 16s start time (precheck doing real work).
Open Threads¶
- Old log file
/var/log/rclone-gdrive.logon Console (599K, last write 07:49) is now orphaned. Logrotate will eventually trim; safe to leave orrmmanually. - Notification cleanup. History log entries at 01:40 and 07:20 CDT remain in
notifications/history.logfor audit. - Validate the hardening across an actual reboot. No way to know for sure until Finn next power-cycles. Watch for clean rclone-gdrive start in journal post-boot.
Verification Snapshot¶
Finn: rclone-gdrive.service active (running), PID 1257, 14h uptime, ls works.
Console: no rclone process, no unit file, NFS mount intact, write+read+delete OK, ls empty.
Watchdog: last run "all monitored mounts healthy" at 07:52 (after auto-recovery).
[Claude Code]