Skip to content

Recall v2 Scope Investigation

URL: https://mkdocs.justinsforge.com/memory/investigations/recall-v2-scope-2026-05-04/

Date: 2026-05-04 Author: [Claude Code] Source handoff: recall-v2-three-layer-retrieval-2026-05-03

Current State

Implementation is the v1 single-shot design described in the handoff. Nothing from the v2 three-layer plan has landed.

Component Path LOC Status
Indexer forge/scripts/forge_search_index.py 263 v1
Query CLI forge/scripts/forge_search_query.py 102 v1
Skill forge/.claude/skills/recall/SKILL.md n/a v1, supports <query>, -k, --json only
Index DB forge/data/search/index.db n/a sqlite + sqlite-vec, BGE-small 384-dim

Reference doc memory/general/reference_semantic_search.md cites stale paths (scripts/search/index.py, scripts/search/query.py); actual files are forge_search_*.py flat in scripts/. Reference doc is 10 days old and needs an update either way.

Index DB schema (verified live)

files(path PK, hash, mtime, indexed_at)
chunks(id PK AUTOINCREMENT, path FK, heading, chunk_idx, body)
chunk_vecs vec0(embedding float[384])  -- sqlite-vec virtual

No chunk_id (stable opaque), no summary column. Chunks are addressed by autoincrement id + (path, chunk_idx) only.

Gap Analysis vs v2 Design

v2 Requirement Present? Notes
Stable opaque chunk IDs (4-hex) NO Need new chunk_id column + sha1(path+chunk_idx)[:4] with collision check
Per-chunk summary column NO Add column; cheap version = first 80 chars to word boundary
--mode index compact output NO Default today is full chunk body
--mode get --ids ... NO Net new
--mode timeline --id --window N NO Net new; needs ordering by chunk_idx within file (already there) and date for cross-file (mtime / parsed date in path)
--full escape hatch NO Equivalent to today's default; trivial flag
FORGE_RECALL_V2=1 gate NO Net new
Schema version bump + auto-rebuild on upgrade NO Indexer has no schema_version table today
Token-count bench harness data/recall-bench/ NO Net new

Bottom line: zero v2 work has been started. v1 is healthy and correct, just expensive in tokens.

Riskiest Unknown

Timeline semantics across files. Layer-2 timeline is well-defined inside one file (chunk_idx ordering). Across files "5 entries before/after" requires a global temporal axis. Options:

  1. Use file mtime, fast but bad for handoffs/daily logs whose semantic date is in the filename, not mtime.
  2. Parse YYYY-MM-DD from filename (handoffs, daily, investigations all carry it), fall back to mtime.
  3. Per-file timeline only (collapse cross-file ambiguity by scoping window to same file).

Handoff doc reads as if (3) is intended ("from the same source file"), which sidesteps the risk. Confirm with Justin before coding; if cross-file timeline is wanted, option 2 is the right call but doubles indexer work. Default assumption: same-file only, matches handoff.

Secondary risk: full re-index time. Reference doc claims 2-3 min for ~100 files / 1500 chunks; handoff claims ~10 min. Real corpus is bigger now. Worth measuring before committing to a "rebuild on first run" deploy story; if it's >5 min, gate behind explicit --rebuild instead of auto.

Build Sequence

  1. Schema migration in indexer (~45 min)
  2. Add chunk_id TEXT UNIQUE, summary TEXT to chunks.
  3. Add meta(key PK, value) table; write schema_version=2.
  4. On startup, if missing or v1 → drop chunks + chunk_vecs, full re-embed.
  5. Indexer chunk-ID + summary generation (~30 min)
  6. chunk_id = sha1(f"{path}:{chunk_idx}").hexdigest()[:4]; on collision extend to 5, 6, ... chars.
  7. summary = body[:80] truncated at last whitespace, ellipsized.
  8. Query CLI three-mode refactor (~60 min)
  9. Default mode: compact lines [id] path date | summary score=0.NN.
  10. --get a3f2,b8c1: lookup by chunk_id, print full body (current default behavior).
  11. --timeline <id> --window N: select same-path chunks ordered by chunk_idx, return id ± N.
  12. --full: alias for current default behavior on a query.
  13. Keep --json, -k.
  14. Skill rewrite (~20 min)
  15. Update forge/.claude/skills/recall/SKILL.md with flag set + decision flowchart (when to use each layer).
  16. Bench harness (~30 min)
  17. scripts/forge_recall_bench.py runs N canned queries v1 vs v2, logs token counts to data/recall-bench/<date>.jsonl.
  18. Docs + reference update (~15 min)
  19. Refresh memory/general/reference_semantic_search.md (also fix the stale scripts/search/... paths while there).
  20. Dogfood week behind FORGE_RECALL_V2=1, then flip default.

Effort Estimate

Phase Hours
Schema + indexer 1.25
CLI refactor 1.0
Skill rewrite 0.33
Bench harness 0.5
Reference doc fix 0.25
Total active ~3.3

Aligns with handoff's ~3h estimate. Dogfood week is passive.

Recommendation

Run /feature-plan recall v2 three-layer retrieval per handoff pickup checklist before writing code, primarily to lock the timeline-scope decision (same-file only vs cross-file) and confirm the cheap summary is the v2.0 target (Sonnet summaries deferred). Everything else is mechanical.