Skip to content

Handoff: /recall v2, three-layer retrieval

URL: https://mkdocs.justinsforge.com/memory/handoffs/recall-v2-three-layer-retrieval-2026-05-03/

Date: 2026-05-03 Author: [Claude Code] Status: design only, not implemented

Why

Today's /recall (impl: forge/scripts/forge_search_index.py, fastembed BGE-small + sqlite-vec) returns full chunks immediately on every query. Multi-query research sessions (5-10 /recall calls) burn 30-80k tokens of context to surface ~20% of what's returned.

The 2026-05-03 deep-research report (memory/research/superpowers-context-mode-claudemem-vs-forge-stack-2026-05-03.md) found that thedotmack/claude-mem solves this with a 3-layer retrieval API and reports ~10x token savings on multi-result queries. Worth porting the design (not the tool, see report for skip rationale).

The 3-layer pattern

Layer Tool Returns Token cost per result
1, search /recall <query> Compact index: ID, title, date, 1-line summary, score ~50-100 tokens
2, timeline /recall --timeline <id> N entries chronologically before+after the ID ~150-300 tokens
3, details /recall --get <id>[,id2,...] Full chunk text for explicit IDs ~500-1500 tokens

Layer 1 is the default. Claude (or Justin) decides which IDs are worth pulling at full fidelity, then asks for them. Most queries never need Layer 3.

Concrete API sketch

# Default: compact index, top 10
/recall how does the dispatcher route workers
# →
# [a3f2] reference_task_queue.md         2026-04-15  | dispatcher v2 pipe-mode...        score=0.91
# [b8c1] handoff/dispatcher-rewrite      2026-04-12  | switched from polling to pipe...  score=0.87
# [c4d9] feedback_subscription_over_api  2026-03-28  | decision tree includes /spawn...  score=0.71
# ...

# Pull full content for one or more IDs
/recall --get a3f2,b8c1
# → full chunk text for both

# See chronological context around an ID (e.g., 5 entries before/after)
/recall --timeline c4d9 --window 5
# → c4d9 plus 5 nearest-by-date entries from the same source file

# JSON for piping (unchanged)
/recall --json how does the dispatcher route

Implementation outline

  1. Add a stable ID to every chunk in the sqlite-vec index. Currently chunks are addressed by (file_path, chunk_offset); need short opaque IDs (4-char hex of sha1(file_path + offset), collision-checked at index time). Stored in a new chunk_id column.
  2. Add a summary column populated at index time. Cheapest: first 80 chars of the chunk, ellipsized at word boundary. Better: 1-line semantic summary via Sonnet at index time (one-shot, batch). Decision: start with the cheap version; upgrade to Sonnet summaries only if they materially improve the layer-1 UX.
  3. Refactor forge_search_index.py to support three modes:
  4. --mode index (default): compact-format output, no chunk text
  5. --mode get --ids a3f2,b8c1: chunk text for explicit IDs
  6. --mode timeline --id c4d9 --window N: ordered chunks from same file +/- N
  7. Update the /recall skill at forge/.claude/skills/recall/SKILL.md with the new flag set + decision flowchart for Claude.
  8. Backward compat: old /recall <query> form must still work; just changes the output shape from full chunks to compact index. Keep --full as an opt-in escape hatch for "give me everything inline like the old behavior."
  9. Tests: smoke test that --mode get returns identical content to old default, that compact-index summaries are <100 chars each, that timeline window returns the right neighbors.

Migration plan

  • Bump index schema version, full re-index on first run after deploy (~10 min for current corpus per reference_semantic_search.md).
  • Land behind FORGE_RECALL_V2=1 env var for one week of dogfooding before flipping default.
  • Log token counts of layer-1 vs old behavior to data/recall-bench/ to validate the ~10x claim against forge's own corpus.

Out of scope for v2

  • Cross-encoder reranking (yoloshii/ClawMem pattern from research report). Worth considering for v3.
  • Sonnet-generated summaries beyond the cheap first-line version. Wait for usage data.
  • Replacing fastembed BGE-small. Still the right embedding for this corpus.

Effort estimate

  • Schema + indexer changes: 1-2 hours
  • CLI refactor: 1 hour
  • Skill rewrite: 30 min
  • Bench harness + dogfood week: passive
  • Total active build: ~3 hours

Pickup checklist

  • Read research report: memory/research/superpowers-context-mode-claudemem-vs-forge-stack-2026-05-03.md (sections 3 + Recommendations + Synthesis)
  • Read current /recall: forge/scripts/forge_search_index.py, forge/.claude/skills/recall/SKILL.md, forge/memory/general/reference_semantic_search.md
  • Run /feature-plan recall v2 three-layer retrieval to get spec approved before any code