Handoff: /recall v2, three-layer retrieval¶
URL: https://mkdocs.justinsforge.com/memory/handoffs/recall-v2-three-layer-retrieval-2026-05-03/
Date: 2026-05-03 Author: [Claude Code] Status: design only, not implemented
Why¶
Today's /recall (impl: forge/scripts/forge_search_index.py, fastembed BGE-small + sqlite-vec) returns full chunks immediately on every query. Multi-query research sessions (5-10 /recall calls) burn 30-80k tokens of context to surface ~20% of what's returned.
The 2026-05-03 deep-research report (memory/research/superpowers-context-mode-claudemem-vs-forge-stack-2026-05-03.md) found that thedotmack/claude-mem solves this with a 3-layer retrieval API and reports ~10x token savings on multi-result queries. Worth porting the design (not the tool, see report for skip rationale).
The 3-layer pattern¶
| Layer | Tool | Returns | Token cost per result |
|---|---|---|---|
| 1, search | /recall <query> |
Compact index: ID, title, date, 1-line summary, score | ~50-100 tokens |
| 2, timeline | /recall --timeline <id> |
N entries chronologically before+after the ID | ~150-300 tokens |
| 3, details | /recall --get <id>[,id2,...] |
Full chunk text for explicit IDs | ~500-1500 tokens |
Layer 1 is the default. Claude (or Justin) decides which IDs are worth pulling at full fidelity, then asks for them. Most queries never need Layer 3.
Concrete API sketch¶
# Default: compact index, top 10
/recall how does the dispatcher route workers
# →
# [a3f2] reference_task_queue.md 2026-04-15 | dispatcher v2 pipe-mode... score=0.91
# [b8c1] handoff/dispatcher-rewrite 2026-04-12 | switched from polling to pipe... score=0.87
# [c4d9] feedback_subscription_over_api 2026-03-28 | decision tree includes /spawn... score=0.71
# ...
# Pull full content for one or more IDs
/recall --get a3f2,b8c1
# → full chunk text for both
# See chronological context around an ID (e.g., 5 entries before/after)
/recall --timeline c4d9 --window 5
# → c4d9 plus 5 nearest-by-date entries from the same source file
# JSON for piping (unchanged)
/recall --json how does the dispatcher route
Implementation outline¶
- Add a stable ID to every chunk in the sqlite-vec index. Currently chunks are addressed by
(file_path, chunk_offset); need short opaque IDs (4-char hex of sha1(file_path + offset), collision-checked at index time). Stored in a newchunk_idcolumn. - Add a
summarycolumn populated at index time. Cheapest: first 80 chars of the chunk, ellipsized at word boundary. Better: 1-line semantic summary via Sonnet at index time (one-shot, batch). Decision: start with the cheap version; upgrade to Sonnet summaries only if they materially improve the layer-1 UX. - Refactor
forge_search_index.pyto support three modes: --mode index(default): compact-format output, no chunk text--mode get --ids a3f2,b8c1: chunk text for explicit IDs--mode timeline --id c4d9 --window N: ordered chunks from same file +/- N- Update the
/recallskill atforge/.claude/skills/recall/SKILL.mdwith the new flag set + decision flowchart for Claude. - Backward compat: old
/recall <query>form must still work; just changes the output shape from full chunks to compact index. Keep--fullas an opt-in escape hatch for "give me everything inline like the old behavior." - Tests: smoke test that
--mode getreturns identical content to old default, that compact-index summaries are <100 chars each, that timeline window returns the right neighbors.
Migration plan¶
- Bump index schema version, full re-index on first run after deploy (~10 min for current corpus per
reference_semantic_search.md). - Land behind
FORGE_RECALL_V2=1env var for one week of dogfooding before flipping default. - Log token counts of layer-1 vs old behavior to
data/recall-bench/to validate the ~10x claim against forge's own corpus.
Out of scope for v2¶
- Cross-encoder reranking (yoloshii/ClawMem pattern from research report). Worth considering for v3.
- Sonnet-generated summaries beyond the cheap first-line version. Wait for usage data.
- Replacing fastembed BGE-small. Still the right embedding for this corpus.
Effort estimate¶
- Schema + indexer changes: 1-2 hours
- CLI refactor: 1 hour
- Skill rewrite: 30 min
- Bench harness + dogfood week: passive
- Total active build: ~3 hours
Pickup checklist¶
- Read research report:
memory/research/superpowers-context-mode-claudemem-vs-forge-stack-2026-05-03.md(sections 3 + Recommendations + Synthesis) - Read current /recall:
forge/scripts/forge_search_index.py,forge/.claude/skills/recall/SKILL.md,forge/memory/general/reference_semantic_search.md - Run
/feature-plan recall v2 three-layer retrievalto get spec approved before any code
Related¶
- DeepResearch report, source for this design
- Semantic search, current /recall internals
- feature-plan skill, the gate this work should go through