Handoff: /recall v2, three-layer retrieval¶

URL: https://mkdocs.justinsforge.com/memory/handoffs/recall-v2-three-layer-retrieval-2026-05-03/

Date: 2026-05-03 Author: [Claude Code] Status: design only, not implemented

Why¶

Today's /recall (impl: forge/scripts/forge_search_index.py, fastembed BGE-small + sqlite-vec) returns full chunks immediately on every query. Multi-query research sessions (5-10 /recall calls) burn 30-80k tokens of context to surface ~20% of what's returned.

The 2026-05-03 deep-research report (memory/research/superpowers-context-mode-claudemem-vs-forge-stack-2026-05-03.md) found that thedotmack/claude-mem solves this with a 3-layer retrieval API and reports ~10x token savings on multi-result queries. Worth porting the design (not the tool, see report for skip rationale).

The 3-layer pattern¶

Layer	Tool	Returns	Token cost per result
1, search	`/recall <query>`	Compact index: ID, title, date, 1-line summary, score	~50-100 tokens
2, timeline	`/recall --timeline <id>`	N entries chronologically before+after the ID	~150-300 tokens
3, details	`/recall --get <id>[,id2,...]`	Full chunk text for explicit IDs	~500-1500 tokens

Layer 1 is the default. Claude (or Justin) decides which IDs are worth pulling at full fidelity, then asks for them. Most queries never need Layer 3.

Concrete API sketch¶

# Default: compact index, top 10
/recall how does the dispatcher route workers
# →
# [a3f2] reference_task_queue.md         2026-04-15  | dispatcher v2 pipe-mode...        score=0.91
# [b8c1] handoff/dispatcher-rewrite      2026-04-12  | switched from polling to pipe...  score=0.87
# [c4d9] feedback_subscription_over_api  2026-03-28  | decision tree includes /spawn...  score=0.71
# ...

# Pull full content for one or more IDs
/recall --get a3f2,b8c1
# → full chunk text for both

# See chronological context around an ID (e.g., 5 entries before/after)
/recall --timeline c4d9 --window 5
# → c4d9 plus 5 nearest-by-date entries from the same source file

# JSON for piping (unchanged)
/recall --json how does the dispatcher route

Implementation outline¶

Add a stable ID to every chunk in the sqlite-vec index. Currently chunks are addressed by (file_path, chunk_offset); need short opaque IDs (4-char hex of sha1(file_path + offset), collision-checked at index time). Stored in a new chunk_id column.
Add a summary column populated at index time. Cheapest: first 80 chars of the chunk, ellipsized at word boundary. Better: 1-line semantic summary via Sonnet at index time (one-shot, batch). Decision: start with the cheap version; upgrade to Sonnet summaries only if they materially improve the layer-1 UX.
Refactor forge_search_index.py to support three modes:
--mode index (default): compact-format output, no chunk text
--mode get --ids a3f2,b8c1: chunk text for explicit IDs
--mode timeline --id c4d9 --window N: ordered chunks from same file +/- N
Update the /recall skill at forge/.claude/skills/recall/SKILL.md with the new flag set + decision flowchart for Claude.
Backward compat: old /recall <query> form must still work; just changes the output shape from full chunks to compact index. Keep --full as an opt-in escape hatch for "give me everything inline like the old behavior."
Tests: smoke test that --mode get returns identical content to old default, that compact-index summaries are <100 chars each, that timeline window returns the right neighbors.

Migration plan¶

Bump index schema version, full re-index on first run after deploy (~10 min for current corpus per reference_semantic_search.md).
Land behind FORGE_RECALL_V2=1 env var for one week of dogfooding before flipping default.
Log token counts of layer-1 vs old behavior to data/recall-bench/ to validate the ~10x claim against forge's own corpus.

Out of scope for v2¶

Cross-encoder reranking (yoloshii/ClawMem pattern from research report). Worth considering for v3.
Sonnet-generated summaries beyond the cheap first-line version. Wait for usage data.
Replacing fastembed BGE-small. Still the right embedding for this corpus.

Effort estimate¶

Schema + indexer changes: 1-2 hours
CLI refactor: 1 hour
Skill rewrite: 30 min
Bench harness + dogfood week: passive
Total active build: ~3 hours

Pickup checklist¶

Read research report: memory/research/superpowers-context-mode-claudemem-vs-forge-stack-2026-05-03.md (sections 3 + Recommendations + Synthesis)
Read current /recall: forge/scripts/forge_search_index.py, forge/.claude/skills/recall/SKILL.md, forge/memory/general/reference_semantic_search.md
Run /feature-plan recall v2 three-layer retrieval to get spec approved before any code

DeepResearch report, source for this design
Semantic search, current /recall internals
feature-plan skill, the gate this work should go through