Recall v2 Scope Investigation¶

URL: https://mkdocs.justinsforge.com/memory/investigations/recall-v2-scope-2026-05-04/

Date: 2026-05-04 Author: [Claude Code] Source handoff: recall-v2-three-layer-retrieval-2026-05-03

Current State¶

Implementation is the v1 single-shot design described in the handoff. Nothing from the v2 three-layer plan has landed.

Component	Path	LOC	Status
Indexer	`forge/scripts/forge_search_index.py`	263	v1
Query CLI	`forge/scripts/forge_search_query.py`	102	v1
Skill	`forge/.claude/skills/recall/SKILL.md`	n/a	v1, supports `<query>`, `-k`, `--json` only
Index DB	`forge/data/search/index.db`	n/a	sqlite + sqlite-vec, BGE-small 384-dim

Reference doc memory/general/reference_semantic_search.md cites stale paths (scripts/search/index.py, scripts/search/query.py); actual files are forge_search_*.py flat in scripts/. Reference doc is 10 days old and needs an update either way.

Index DB schema (verified live)¶

files(path PK, hash, mtime, indexed_at)
chunks(id PK AUTOINCREMENT, path FK, heading, chunk_idx, body)
chunk_vecs vec0(embedding float[384])  -- sqlite-vec virtual

No chunk_id (stable opaque), no summary column. Chunks are addressed by autoincrement id + (path, chunk_idx) only.

Gap Analysis vs v2 Design¶

v2 Requirement	Present?	Notes
Stable opaque chunk IDs (4-hex)	NO	Need new `chunk_id` column + sha1(path+chunk_idx)[:4] with collision check
Per-chunk `summary` column	NO	Add column; cheap version = first 80 chars to word boundary
`--mode index` compact output	NO	Default today is full chunk body
`--mode get --ids ...`	NO	Net new
`--mode timeline --id --window N`	NO	Net new; needs ordering by chunk_idx within file (already there) and date for cross-file (mtime / parsed date in path)
`--full` escape hatch	NO	Equivalent to today's default; trivial flag
`FORGE_RECALL_V2=1` gate	NO	Net new
Schema version bump + auto-rebuild on upgrade	NO	Indexer has no schema_version table today
Token-count bench harness `data/recall-bench/`	NO	Net new

Bottom line: zero v2 work has been started. v1 is healthy and correct, just expensive in tokens.

Riskiest Unknown¶

Timeline semantics across files. Layer-2 timeline is well-defined inside one file (chunk_idx ordering). Across files "5 entries before/after" requires a global temporal axis. Options:

Use file mtime, fast but bad for handoffs/daily logs whose semantic date is in the filename, not mtime.
Parse YYYY-MM-DD from filename (handoffs, daily, investigations all carry it), fall back to mtime.
Per-file timeline only (collapse cross-file ambiguity by scoping window to same file).

Handoff doc reads as if (3) is intended ("from the same source file"), which sidesteps the risk. Confirm with Justin before coding; if cross-file timeline is wanted, option 2 is the right call but doubles indexer work. Default assumption: same-file only, matches handoff.

Secondary risk: full re-index time. Reference doc claims 2-3 min for ~100 files / 1500 chunks; handoff claims ~10 min. Real corpus is bigger now. Worth measuring before committing to a "rebuild on first run" deploy story; if it's >5 min, gate behind explicit --rebuild instead of auto.

Build Sequence¶

Schema migration in indexer (~45 min)
Add chunk_id TEXT UNIQUE, summary TEXT to chunks.
Add meta(key PK, value) table; write schema_version=2.
On startup, if missing or v1 → drop chunks + chunk_vecs, full re-embed.
Indexer chunk-ID + summary generation (~30 min)
chunk_id = sha1(f"{path}:{chunk_idx}").hexdigest()[:4]; on collision extend to 5, 6, ... chars.
summary = body[:80] truncated at last whitespace, ellipsized.
Query CLI three-mode refactor (~60 min)
Default mode: compact lines [id] path date | summary score=0.NN.
--get a3f2,b8c1: lookup by chunk_id, print full body (current default behavior).
--timeline <id> --window N: select same-path chunks ordered by chunk_idx, return id ± N.
--full: alias for current default behavior on a query.
Keep --json, -k.
Skill rewrite (~20 min)
Update forge/.claude/skills/recall/SKILL.md with flag set + decision flowchart (when to use each layer).
Bench harness (~30 min)
scripts/forge_recall_bench.py runs N canned queries v1 vs v2, logs token counts to data/recall-bench/<date>.jsonl.
Docs + reference update (~15 min)
Refresh memory/general/reference_semantic_search.md (also fix the stale scripts/search/... paths while there).
Dogfood week behind FORGE_RECALL_V2=1, then flip default.

Effort Estimate¶

Phase	Hours
Schema + indexer	1.25
CLI refactor	1.0
Skill rewrite	0.33
Bench harness	0.5
Reference doc fix	0.25
Total active	~3.3

Aligns with handoff's ~3h estimate. Dogfood week is passive.

Recommendation¶

Run /feature-plan recall v2 three-layer retrieval per handoff pickup checklist before writing code, primarily to lock the timeline-scope decision (same-file only vs cross-file) and confirm the cheap summary is the v2.0 target (Sonnet summaries deferred). Everything else is mechanical.