Skip to content

Google Drive Mount Cleanup, 2026-05-01

Status: Done. Console-side rclone removed, NFS pass-through in place, docs updated.

Trigger

Boot briefing showed RECOVERY FAILED: /mnt/pve/workspace/Google-Drive still unreachable from forge_monitor_mount_watchdog.sh at 07:20 CDT after Finn's overnight reboots. Investigation revealed an architectural smell: Drive was mounted twice, once on Finn (canonical) and a second redundant rclone overlay on Console at the NFS pass-through path. The Apr 27 journal showed 14,000+ restart loops on Console's rclone unit, confirming chronic instability.

Decision

Single source of truth: Finn. Console accesses Drive via NFS pass-through plus the API-direct forge_gdrive_* scripts. No local FUSE on Console.

Changes Made

Action Where Detail
Killed rclone process Console PID 2088, systemctl kill --signal=SIGKILL
Stopped + disabled service Console systemctl stop/disable rclone-gdrive.service
Removed unit file Console /etc/systemd/system/rclone-gdrive.service deleted, daemon-reload
Force-unmounted FUSE Console fusermount -uz + umount -l
Verified NFS pass-through Console Write+read+delete cycle on /mnt/workspace/Google-Drive/.forge-nfs-test succeeded
Confirmed readdir limitation Console ls /mnt/workspace/Google-Drive/ returns empty; documented as expected FUSE-over-NFS behavior
Updated system-map/google-drive.md Forge New architecture table + decision tree
Updated reference_storage_policy.md Memory Console column reflects NFS pass-through caveat
Updated reference_console_state.md Memory Added "No local rclone-gdrive" note + sudo-via-ssh-loopback sandbox tip

What Still Works

  • /save-to-drive (uses forge_gdoc_to_drive.sh which is API-direct since Phase 4.7, no mount needed)
  • forge_gdrive_search.py, forge_gdrive_read.py, forge_gdrive_write.py, forge_gdrive_move.py (all API-direct)
  • Direct file writes/reads from Console at /mnt/workspace/Google-Drive/<known-path> via NFS
  • Watchdog (forge_monitor_mount_watchdog.sh) still pings Finn's mount every 10 min and auto-recovers

What Doesn't Work, By Design

  • ls /mnt/workspace/Google-Drive/ from Console (returns empty). Use forge_gdrive_search.py instead.
  • Traversing into subdirs from Console via the NFS path (e.g. cat /mnt/workspace/Google-Drive/Business/foo.docx may return "No such file"). Use forge_gdrive_read.py "Business/foo.docx" instead.

Finn Unit Hardening (added 08:19 same day)

Root cause of the post-reboot wedges turned out to be: network-online.target activates immediately on Finn because systemd-networkd-wait-online is disabled (Proxmox uses ifupdown/networking.service, not networkd). So the unit's existing After=network-online.target was a no-op. rclone would launch before DNS / TCP / OAuth were actually ready, fail to mount cleanly, then wedge.

Hardened unit (/etc/systemd/system/rclone-gdrive.service on Finn, backup at .bak.2026-05-01):

Change Why
ExecStartPre=until timeout 8 rclone about gdrive: do sleep 5; done Real connectivity probe: forces DNS + TCP + OAuth + Drive API roundtrip before mounting. Replaces the lying network-online.target.
After=network-online.target nss-lookup.target + matching Wants= Belt-and-suspenders ordering, even though precheck is the real gate.
ExecStop=/bin/fusermount -uz (was -u) Lazy unmount. Was causing stop-timeouts when FUSE wedged ("Device or resource busy" in journal).
TimeoutStartSec=180 Bounds the precheck loop. If Drive API is dead 3+ minutes at boot, fail and let watchdog page.
TimeoutStopSec=15 Don't hang shutdown indefinitely.
Restart=on-failure, RestartSec=15 (was 10) Slightly less aggressive retry.
StartLimitIntervalSec=600, StartLimitBurst=5 Caps to 5 starts per 10 min. Prevents the Apr 27 14k-restart-loop pattern. After 5 fails, systemd gives up and the watchdog (10-min cron) takes over.

Verified: restarted live, ExecStartPre exited 0, mount populated (48 entries), 16s start time (precheck doing real work).

Open Threads

  1. Old log file /var/log/rclone-gdrive.log on Console (599K, last write 07:49) is now orphaned. Logrotate will eventually trim; safe to leave or rm manually.
  2. Notification cleanup. History log entries at 01:40 and 07:20 CDT remain in notifications/history.log for audit.
  3. Validate the hardening across an actual reboot. No way to know for sure until Finn next power-cycles. Watch for clean rclone-gdrive start in journal post-boot.

Verification Snapshot

Finn: rclone-gdrive.service active (running), PID 1257, 14h uptime, ls works.
Console: no rclone process, no unit file, NFS mount intact, write+read+delete OK, ls empty.
Watchdog: last run "all monitored mounts healthy" at 07:52 (after auto-recovery).

[Claude Code]