Pure Phoenix Phase 4.2, Bot Redesign (sub-design pass)¶

This is the design handoff that gates Phase 4.2 execution. Phase 4.2 in the master plan was originally scoped as rename-and-port; Justin signaled 2026-04-28 that the bot fleet needs a full redo. This document captures the inventory, recommended architecture, and seven open decisions that need Justin's sign-off before any code changes.

Plan: ~/.claude/plans/yes-lets-go-into-pure-phoenix.md Section 4.2. Doctrine: FORGE-DOCTRINE.md Sections 3, 5, 6, 7.

Current State Inventory¶

Bot identities (5 active)¶

Bot	Role	Service	Code	Token at
`@jw_inbox_bot`	low-friction capture, voice + text	`forge-telegram-inbox.service`	`forge_telegram_inbox_bot.py` (407 lines)	`~/.forge-secrets/telegram-inbox.env`
`@Ava_JForgeBot`	lifeos coordinator, deep synthesis	`forge-telegram-ava.service`	`forge_telegram_lifeos_coordinator_bot.py` (207)	`~/.forge-secrets/telegram-ava.env`
(inbox webhook)	iOS Shortcut endpoint, port 7400	`forge-inbox-webhook.service`	`forge_telegram_inbox_webhook.py` (172)	(uses inbox token)
`@Manager_JForgeBot`	push-only output for `notify.sh`	n/a (push only)	n/a	`~/.forge-secrets/telegram-manager.env`
`@jw_updates_bot`	push-only output for heartbeat + morning-brief	n/a (push only)	n/a	`~/.forge-secrets/telegram-updates.env`

Brain layer (Sonnet 4.6)¶

Brain	LOC	Model	Tools	Notes
`forge_telegram_inbox_brain.py`	1120	claude-sonnet-4-6 via `claude -p`	28 (full surface)	The mothership. Notion (16 DBs), Calendar, Gmail (personal + business), wellness, knowledge, habits, scheduled nudges, push, spawn_worker, drafts.
`forge_telegram_lifeos_coordinator_brain.py`	235	claude-sonnet-4-6 via `claude -p`	imports `inbox_brain.TOOLS_JSON`, identical 28	Same tool surface, different system prompt that emphasizes synthesis and spawn_worker for heavy work.

Functional difference between the two brains today is prompt-only. Both have access to the identical tool registry. The "lifeos vs inbox" split is a tone choice, not a capability choice.

Supporting infrastructure¶

Component	Role
`forge_telegram_transcribe.py`	faster-whisper base.en CPU int8, voice-to-text
`forge_telegram_push.sh`	unified push helper, takes `chat\\|updates\\|inbox\\|ava\\|manager` selector
`forge_telegram_nudge_fire.py`	every-minute cron, fires scheduled nudges as push messages to the updates bot
`data/inbox-context.jsonl`	rolling conversation context for inbox brain
`data/ava-context.jsonl`	rolling conversation context for lifeos brain

Doctrine compliance issues today¶

Section	Violation	Severity
3 (no persona names for bots)	`@Ava_JForgeBot`, `@Manager_JForgeBot`	known, scheduled for retirement here
5 (inbox is low-friction capture only, downstream workers do heavy lifting)	inbox brain has `spawn_worker`, full Gmail tools, Notion CRUD on 16 DBs. It is NOT low-friction; it does heavy lifting.	high, scope creep
7 (per-bot API usage tracking)	zero tracking exists for any of the 5 bots	unimplemented from day one

Recommended Architecture¶

One brain module, two personas¶

Collapse inbox_brain and lifeos_brain into a single forge_telegram_brain.py parameterized by persona:

def handle(text: str, *, persona: str, prior_messages: list[dict] | None = None) -> str:
    """persona in {'capture', 'coordinator'}"""

Persona selects: - system prompt template - tool subset (capture: 8 tools; coordinator: 28+) - default done behavior (capture: done:true after action; coordinator: multi-pass allowed) - context log file (data/capture-context.jsonl vs data/coordinator-context.jsonl) - per-call cost budget guard

Each bot's polling layer becomes thin:

# forge_telegram_capture_bot.py
import forge_telegram_brain as brain
reply = brain.handle(text, persona="capture")

Eliminates ~200 lines of duplication. Future personas (e.g. "alert-triage", "morning-recap") drop in without new brains.

Capture vs coordinator tool split (per Section 5)¶

Tool	capture	coordinator
save_to_inbox	yes	yes
create_task (quick)	yes	yes
save_knowledge (quick)	yes	yes
schedule_nudge	yes	yes
log_habit	yes	yes
wellness_now (read-only)	yes	yes
create_calendar_event (quick add)	yes	yes
voice transcribe	yes	yes
query_notion	NO (capture is write-only)	yes
update_task	NO	yes
list_calendar_events	NO	yes
update_calendar_event	NO	yes
delete_calendar_event	NO	yes
Gmail tools (search, read, draft, archive, label, etc.)	NO	yes
email_to_task	NO	yes
spawn_worker	NO	yes
push_updates	NO	yes

Capture stays a fire-hose for "throw thoughts in"; coordinator handles "go do something with these thoughts." Aligns with Section 5.

Cost discipline (Section 7)¶

New shared module forge_telegram_brain_metrics.py: - Wraps every claude -p call: records timestamp, persona, persona-call-id, prompt token estimate, response token estimate, latency_ms, success bool. - Writes JSONL at forge/data/telegram-cost/<persona>-YYYY-MM.jsonl. - Daily cron at 04:30 aggregates to summary.json (per-persona daily/weekly totals + cost estimates using Sonnet 4.6 pricing). - Optional notify when daily spend exceeds a threshold; default $5/day per persona, configurable in eval.json.

Token counting: use prompt char count / 4 as the conservative input-token estimate (until we wire the Anthropic SDK for exact counts; Phase 4.2b).

New bot identities (Apple-Dictation friendly + doctrine-compliant)¶

Doctrine Section 3 rules: no persona names for bots; Apple-Dictation friendly; cycle and destroy retired names. Long compound names like forge_inbox_capture_bot are doctrine-compliant but Apple stumbles on them.

Old	New	Function
`@jw_inbox_bot`	`@forge_capture_bot`	capture persona (write-only Notion + calendar quick-add + voice)
`@Ava_JForgeBot`	`@forge_assist_bot`	coordinator persona (full tool surface, deep synthesis)
`@Manager_JForgeBot`	`@forge_alert_bot`	push-only (notify.sh output)
`@jw_updates_bot`	merge into `@forge_alert_bot`	retire as separate identity; alert handles both

Result: 3 bot identities (down from 5). All single-syllable function words after forge_. Apple Dictation handles "forge capture bot," "forge assist bot," "forge alert bot" cleanly.

If Justin wants to keep updates separate (e.g. wellness-flavored daily push doesn't pollute alert priority queue), he can keep @jw_updates_bot as @forge_status_bot for a fourth identity. Recommend NOT splitting; one alert bot keeps the surface narrow.

Service topology¶

Service	Replaces	Runs
`forge-capture-bot.service`	`forge-telegram-inbox.service`	long-poller for `@forge_capture_bot`
`forge-assist-bot.service`	`forge-telegram-ava.service`	long-poller for `@forge_assist_bot`
`forge-inbox-webhook.service`	unchanged name (port 7400)	iOS Shortcut endpoint, calls capture persona
(no service for alert)	manager + updates retired	push-only, no listener needed

Three long-pollers down from two-plus-webhook. Service names doctrine-compliant.

Token strategy¶

Each new bot gets a fresh BotFather token. Old tokens revoked. Old bot identities deleted via @BotFather as the final step. Cost: Telegram chat history with the old bots is not migrated. Justin loses the visible chat log from @jw_inbox_bot and @Ava_JForgeBot. The forge inbox-context.jsonl and ava-context.jsonl log files survive (forge-side memory continuity).

Alternative: rotate tokens within existing identities (rename via @BotFather). Keeps chat history. Loses doctrine cleanliness. Recommend fresh-tokens path; doctrine Section 3 explicitly says "cycle and destroy old names."

Seven Open Decisions (need sign-off)¶

#	Decision	Recommendation	Cost of choosing differently
1	Functional split: keep capture + coordinator as two bots, or consolidate into one with mode commands?	Keep two bots. Visual UX matters; inbox-as-low-friction is doctrine.	Consolidate = one less service, one less token, but you lose the visual separation in Telegram between "capture" and "act".
2	Brain unification: one shared brain module with `persona` param vs two distinct brain modules?	One module, two personas. Drops ~200 lines of duplication.	Two brains lets you diverge prompts/tools faster but pays maintenance tax.
3	Tool surface for capture: shrink to capture-only (8 tools), or keep full surface?	Shrink to capture-only. Aligns with Section 5.	Full surface lets you act fast from Telegram on anything but breaks the "downstream workers do heavy lifting" principle.
4	Token strategy: fresh BotFather tokens (lose chat history) or rotate within existing identities?	Fresh tokens. Doctrine "cycle and destroy" wins; chat history of voice notes isn't load-bearing.	Rotate keeps chat history but feels like rebrand-not-redo and conflicts with Section 3.
5	Bot names: long doctrine-strict (`@forge_inbox_capture_bot`) vs short Apple-Dictation-friendly (`@forge_capture_bot`)?	Short. Both are doctrine-compliant; Apple Dictation matters daily.	Long names are unambiguous but you'll fight your phone every time you say them.
6	Updates bot fate: merge into alert bot or keep separate as `@forge_status_bot`?	Merge. Two push-only bots is unnecessary surface.	Keep separate if you want morning-brief / heartbeat in a different chat thread from critical alerts.
7	Cost discipline scope for v1: ship with conservative char-count token estimates (cheap, fast), or wait to wire Anthropic SDK for exact token counts?	Ship char-count. Phase 4.2b can add exact counts.	Wait for SDK = no cost data for weeks; ship now = cost data within hours, accuracy ±20%.

Execution Plan (after sign-off)¶

Wave 4.2.A: greenfield brain 1. forge_telegram_brain.py with persona-aware handle(). Migrate inbox brain logic in. 2. forge_telegram_brain_metrics.py shared cost wrapper. 3. forge/data/telegram-cost/ directory + daily cron. 4. Unit tests against transcripts (replay inbox-context.jsonl with new brain).

Wave 4.2.B: new bot identities 5. Justin creates new bots via @BotFather, drops tokens at ~/.forge-secrets/telegram-{capture,assist,alert}.env. 6. Update forge_telegram_push.sh selector mapping. 7. Update notify.sh to point at @forge_alert_bot.

Wave 4.2.C: new pollers 8. forge_telegram_capture_bot.py (thin polling wrapper, calls brain.handle(text, persona="capture")). 9. forge_telegram_assist_bot.py (same shape, persona="coordinator"). 10. forge_telegram_inbox_webhook.py updated to call capture persona. 11. New systemd units forge-capture-bot.service, forge-assist-bot.service. daemon-reload + start.

Wave 4.2.D: cutover 12. Stop + disable old services (forge-telegram-inbox, forge-telegram-ava). 13. Migrate any active context (inbox-context.jsonl to capture-context.jsonl, etc.). 14. Justin deletes old bots via @BotFather (irreversible; final-confirm step). 15. Update eval harness whitelists; remove forge-telegram-{ava,inbox}.service from forge_eval_check_service_names.sh; remove Ava_JForgeBot / Manager_JForgeBot references from forge_eval_check_persona_code.sh if any. 16. Update MEMORY.md telegram section + system-map fleet.md + CLAUDE.md system mental model. 17. Smoke test: voice note to capture bot lands in Notion Inbox; text question to assist bot returns coordinated answer; notify.sh warning ... arrives at alert bot.

Wave 4.2.E: post-soak 18. After one week of clean operation, close the LESSONS.md persona-name violation; tighten eval check severity for service-name and persona-code from warning/error to fatal.

Risks¶

iOS Shortcut breakage. The Shortcut hits https://inbox.justinsforge.com/... which proxies to the inbox webhook on port 7400. If we change the webhook path or bot wiring, the Shortcut breaks. Plan: keep the webhook URL stable, change only the brain it calls.
Voice transcription latency. faster-whisper on CPU takes 2-5s for a 30s clip. Capture bot returns ack ("got it") immediately, then runs the brain async. Coordinator can wait inline since the user is in conversation mode.
Cost surprise. No tracking today means no baseline. First day of cost-discipline tracking might show the bots are spending more than expected. Set a generous initial threshold ($10/day per persona) and tighten after seeing real numbers.
Telegram rate limits. 30 messages/sec/bot, way above any plausible forge usage.

Decision Trigger¶

When Justin signs off on the seven decisions above, this design becomes execution-ready. The execution waves (4.2.A through 4.2.E) become a fresh sub-handoff pure-phoenix-phase-4-2-execution-2026-XX-XX.md covering the actual code changes.

Sign-off Locked 2026-04-28T22:55¶

#	Decision	Locked answer
1	Functional split	Three bot identities: capture (long-poll), coordinator (long-poll), one merged push bot (combines today's @Manager + @jw_updates)
2	Brain unification	One shared `forge_telegram_brain.py` module with `persona` param
3	Capture tool surface	Shrink to capture-only (8 tools); grow capture-specific tools later if friction surfaces
4	Token strategy	Fresh BotFather tokens; retire old bots after cutover
5	Bot names	LONG doctrine-strict (override Apple-Dictation pragmatism). Final names: `@forge_inbox_capture_bot`, `@forge_lifeos_coordinator_bot`, `@forge_notify_outbound_bot`
6	Updates bot fate	Merged into `@forge_notify_outbound_bot` (drops separate `@jw_updates_bot` identity)
7	Cost discipline scope	System-wide quota observability (NOT just bots). See expanded scope below.

Expanded scope: system-wide claude quota observability¶

Justin reframed Q7: the goal is to understand "what actions, automations, chats, development is costing me in my Pro Max quota." That's a forge-wide observability concern, not a bot-only metric. Phase 4.2 ships the metrics MODULE; full instrumentation is Phase 4.7 below.

Existing telemetry surfaces¶

Surface	What it has	Loaded via
`~/.claude/stats-cache.json`	Daily token counts per model, message counts, session counts, tool-call counts. Native Claude Code statistics.	populated by Claude Code on session end
`scripts/prompt-counter.sh`	Per-session prompt counter, fires every `UserPromptSubmit` hook. Currently only a checkpoint reminder; could record per-prompt metadata.	UserPromptSubmit hook
individual forge scripts	nothing today	—

Forge scripts that call `claude -p` (9 confirmed)¶

Script	Frequency
`forge_telegram_inbox_brain.py` (the new capture brain)	per Telegram message
`forge_telegram_lifeos_coordinator_brain.py` (the new coordinator brain)	per Telegram message
`forge_dispatcher.sh`	per pipe-mode worker task
`forge_memory_auto_capture.py`	per session end (Stop hook)
`forge_memory_auto_dream.py`	nightly cron 04:00
`morning-brief.py`	daily cron 07:00 (12:00 UTC)
`forge_wellness_daily_summary.py`	daily cron 03:00
`heartbeat.py`	midday + evening + night-cap (3x/day)
`forge_tmux_anchor_session.sh`	once per boot (low frequency)

Phase 4.2 scope (the metrics module)¶

Build forge/scripts/forge_quota_tracker.py. Single function:

def record(invoker: str, model: str, prompt_chars: int, response_chars: int,
           latency_ms: int, success: bool, extra: dict | None = None) -> None:
    """Append one record to forge/data/claude-quota/<YYYY-MM>.jsonl."""

Used by the new bot brains as part of Phase 4.2. Hourly + daily aggregator runs from cron, writes summary to forge/data/claude-quota/summary.json.

Surface in the existing /recall index? No, the JSONL grows daily and would dominate embeddings. Kept as raw data behind a CLI.

Phase 4.7 (NEW, system-wide quota observability)¶

Sub-handoff pure-phoenix-phase-4-7-quota-observability-2026-XX-XX.md covers:

Instrument the other 7 forge scripts (one-line forge_quota_tracker.record(...) per claude -p call site).
Daily aggregator that MERGES ~/.claude/stats-cache.json (interactive sessions) + forge/data/claude-quota/*.jsonl (forge-script invocations) into one unified summary.
Optional weekly digest pushed to @forge_notify_outbound_bot showing "this week your Pro Max quota was X% spent on bots, Y% on automations, Z% on interactive sessions."
Auto-back-off behavior: if call rate spikes above threshold, bots queue messages or shrink prompt context. Not in scope for 4.2.

Phase 4.7 is ENABLED by Phase 4.2 (the metrics module is the foundation). Recommend executing them in series: 4.2 ships the new bot fleet + tracker module; 4.7 ships the system-wide instrumentation + aggregator.

Updated Execution Plan (locked, three bots, long names)¶

Wave 4.2.A: greenfield brain + tracker 1. forge_telegram_brain.py with persona-aware handle(). Migrate inbox brain logic in. 2. forge_quota_tracker.py shared metrics module. 3. forge/data/claude-quota/ directory + daily aggregator cron at 04:30. 4. Replay test: feed data/inbox-context.jsonl snippets through new brain, verify same tool calls.

Wave 4.2.B: new bot identities 5. Justin creates new bots via @BotFather: - @forge_inbox_capture_bot - @forge_lifeos_coordinator_bot - @forge_notify_outbound_bot 6. Justin drops tokens at: - ~/.forge-secrets/telegram-inbox-capture.env - ~/.forge-secrets/telegram-lifeos-coordinator.env - ~/.forge-secrets/telegram-notify-outbound.env 7. Update forge_telegram_push.sh selector mapping to new env files. 8. Update forge_notify.sh to point at @forge_notify_outbound_bot (token replaces the current Manager bot wiring). 9. Update morning-brief.py, heartbeat.py, forge_telegram_nudge_fire.py to push to @forge_notify_outbound_bot (currently they push to @jw_updates_bot).

Wave 4.2.C: new pollers + webhook 10. forge_telegram_inbox_capture_bot.py (thin polling wrapper, calls brain.handle(text, persona="capture")). 11. forge_telegram_lifeos_coordinator_bot.py (same shape, persona="coordinator"). 12. forge_telegram_inbox_capture_webhook.py (renamed from inbox_webhook, now points at capture persona). 13. New systemd units: - forge-inbox-capture.service (replaces forge-telegram-inbox.service) - forge-lifeos-coordinator.service (replaces forge-telegram-ava.service) - forge-inbox-capture-webhook.service (renamed from forge-inbox-webhook.service; iOS Shortcut endpoint stays at port 7400 to keep Justin's Shortcut working) 14. sudo systemctl daemon-reload + start.

Wave 4.2.D: cutover 15. Stop + disable old services (forge-telegram-inbox, forge-telegram-ava, forge-inbox-webhook). 16. Migrate context: inbox-context.jsonl -> capture-context.jsonl, ava-context.jsonl -> coordinator-context.jsonl. 17. Justin deletes old bots via @BotFather: @jw_inbox_bot, @Ava_JForgeBot, @Manager_JForgeBot, @jw_updates_bot (irreversible, final-confirm step). 18. Update eval harness whitelists in forge_eval_check_service_names.sh and forge_eval_check_persona_code.sh. After 4.2 ships, those whitelists are EMPTY. 19. Update MEMORY.md telegram section + system-map fleet.md + CLAUDE.md system mental model. 20. Smoke tests: - voice note to @forge_inbox_capture_bot lands in Notion Inbox - text question to @forge_lifeos_coordinator_bot returns coordinated answer - forge_notify.sh warning ... arrives at @forge_notify_outbound_bot - morning-brief pushes to @forge_notify_outbound_bot - quota tracker records calls in forge/data/claude-quota/2026-04.jsonl

Wave 4.2.E: post-soak 21. After one week clean, close LESSONS.md persona-name violation. 22. Tighten eval check severity: naming-taxonomy-services from warning to error; no-persona-names-in-code allowlist removes Ava/Manager. 23. Trigger Phase 4.7 sub-handoff for system-wide quota instrumentation.

[Claude Code, Pure Phoenix Phase 4.2 design pass; sign-off 2026-04-28T22:55]