Skip to content

Research: Superpowers vs Context Mode vs ClaudeMem vs forge stack

URL: https://mkdocs.justinsforge.com/memory/research/superpowers-context-mode-claudemem-vs-forge-stack-2026-05-03/

Date: 2026-05-03 Depth: deep Model: sonnet

TL;DR

  • Superpowers is a methodology plugin, not an infra tool. Its brainstorm-plan-subagent-TDD pipeline duplicates patterns Justin already enforces via doctrine (robust-over-quick, /spawn, eval harness) but adds enforcement gates and git-worktree discipline that forge currently lacks for greenfield feature work. Steal-design-only.
  • Context Mode is genuinely new capability. Forge has no tool-output compression layer. Hook-intercepted sandbox execution that prevents raw git log / WebFetch / large file reads from inflating context is a net-new win for long coordinator-bot dev sessions. Install.
  • ClaudeMem (thedotmack/claude-mem, 71k stars, npm:claude-mem v12.5) is the canonical tool. It runs a persistent background Bun worker, Chroma vector DB, and 5 lifecycle hooks to capture, compress, and re-inject every session observation. Forge already has a better-designed equivalent (auto-memory + auto-dream + /recall). The 3-layer retrieval API and "search index first, details by ID" pattern are design pieces worth stealing.
  • No tool conflicts with forge security rules or causes data-exfiltration risk; all three are local-first.

Findings

1. Superpowers

Repo: obra/superpowers [1], on Anthropic official plugin marketplace and obra/superpowers-marketplace [2]. Stars: 177k [1]. Version: 5.0.7. Install:

/plugin install superpowers@claude-plugins-official
or
/plugin marketplace add obra/superpowers-marketplace
/plugin install superpowers@superpowers-marketplace

Mechanism. Superpowers is a pure-skills plugin with a single SessionStart hook. The hook injects the using-superpowers skill content (~5k bytes) as <EXTREMELY_IMPORTANT> context and registers a Skill tool that lazy-loads any of 14 SKILL.md files on demand [3]. Skills are not loaded at startup; only the using-superpowers bootstrap text is [3]. The model then decides which skill to invoke based on the bootstrap's decision flowchart.

The core workflow is: 1. brainstorming skill: socratic design loop, blocks all code until user approves a spec saved to docs/superpowers/specs/ [4]. 2. writing-plans skill: breaks spec into 2-5 min bite-sized tasks with exact file paths, TDD steps, commit points [5]. 3. subagent-driven-development or executing-plans skill: dispatches fresh subagents per task with two-stage review (spec compliance, then code quality) [6]. 4. test-driven-development skill: RED-GREEN-REFACTOR cycle, hard gate against writing production code before a failing test exists [7]. 5. verification-before-completion, requesting-code-review, finishing-a-development-branch wrap the cycle.

Token cost reality. Only the using-superpowers bootstrap (~5.4k bytes, ~1.4k tokens) is injected at SessionStart [3]. All 14 skills total ~108k bytes (~27-36k tokens), but they are lazy-loaded by the Skill tool call only when triggered. In practice: brainstorm-only sessions incur ~2 additional skill reads; full feature cycles (brainstorm + plan + subagent + TDD) accumulate 5-8 lazy reads totaling 15-25k tokens of additional context, spread across the life of the session.

What it enforces that forge doctrine does not have wired-in. Superpowers has hard-gate blocks ("Do NOT invoke any implementation skill until you have presented a design and the user has approved it" [4]). Forge's feedback_robust_over_quick.md is a text instruction, not a pre-code gate. The using-git-worktrees skill automatically creates an isolated branch and worktree before any implementation [8]. Forge has forge_worktree.sh but no automatic trigger.

Forge overlap. The /spawn pattern [9] replaces subagent-driven-development well for non-code tasks. The eval harness [10] catches post-hoc doctrine violations. The dispatcher worker pattern covers parallel agent fanning. For pure greenfield coding feature work, Superpowers adds enforced process that forge doctrine recommends but does not gate.


2. Context Mode

Two distinct repos exist: - kianwoon/context-mode [11]: 4 MCP tools + 2 hooks, simpler implementation, 1 star. - scottconverse/context-mode [12]: full port of mksglu/context-mode, 9 MCP tools + 6 lifecycle hooks, 3-stage compression pipeline, self-learning, session continuity, 1 star but updated 2026-05-03 and notably more mature.

The question's framing (sandbox interception, Bash/WebFetch/MCP, local SQLite, /contextmode:ctx-stats) maps to the scottconverse variant. Both share the same upstream architecture (mksglu/context-mode, Elastic License 2.0 [12]).

Install (scottconverse, recommended):

npx --yes --package=github:scottconverse/context-mode context-mode
or in Claude Code:
/plugin marketplace add scottconverse/context-mode
/plugin install context-mode@scottconverse-context-mode
Requires Node.js >= 18. Verify with /context-mode:ctx-doctor.

Install (kianwoon, simpler):

claude plugin add kianwoon/context-mode
Requires Node.js 22+.

Mechanism. Context Mode registers 6 lifecycle hooks [12]: - PreToolUse: intercepts 18 patterns (Bash git-log/diff/test, WebFetch, Read on large files, curl/wget, build tools) and redirects to sandboxed ctx_execute / ctx_batch_execute / ctx_fetch_and_index. Safe invocations (piped, bounded) pass through unchanged [12]. - PostToolUse: captures all tool events into per-session SQLite. - PreCompact: saves session snapshot before context compaction. - SessionStart + UserPromptSubmit: inject routing block and session guide each turn. - SubagentStop: cleanup.

SQLite event tracking schema. The knowledge base uses SQLite FTS5 with two virtual tables: a Porter Stemmer FTS5 table (BM25, title fields weighted 5x) and a Trigram FTS5 table (substring matching). Results merge via Reciprocal Rank Fusion (K=60) with proximity reranking. Per-session DB is context-mode-{pid}.db in tmpdir, deleted on session end. TTL eviction at 60 minutes per entry [12].

3-stage compression pipeline (scottconverse): 1. Deterministic ANSI/terminal stripping. 2. Pattern-based tool-output compression: 10 formatters for jest, pytest, git log, cargo, npm install etc. Passing tests and progress bars collapse; failures preserve verbatim. 3. Session-aware relevance filtering: keeps content related to current work [12].

The self-learning loop tracks what compressed content Claude later searches for and raises retention for frequently-retrieved tool patterns [12].

/contextmode:ctx-stats output. Reports token savings per session: number of intercepted tool calls, estimated tokens that would have entered context, tokens actually returned, percentage reduction, and learner accuracy [12].

Forge has nothing equivalent. Forge has /recall for semantic search over static files, but zero tool-output compression. Every Bash: git log, large Read:, or WebFetch: call dumps raw output into context. In long coordinator-bot dev sessions (reading 15+ files, running tests, fetching docs), this inflates context by 20-60k tokens and accelerates compaction. Context Mode addresses exactly this gap [12].

Known risk. The scottconverse fork is very new (1 star, created recently as a port). The upstream mksglu/context-mode project needs checking for maintenance status. The kianwoon fork is simpler but lacks the compression pipeline. Given Elastic License 2.0, commercial use in a self-hosted product is allowed but must preserve license attribution; personal forge use is unambiguously clear.


3. ClaudeMem (claude-mem)

Canonical repo: thedotmack/claude-mem [13]. Stars: 71k. npm: [email protected] (AGPL-3.0) [14]. Author: Alex Newman. Install:

npx claude-mem install
or in Claude Code:
/plugin marketplace add thedotmack/claude-mem
/plugin install claude-mem

Mechanism. ClaudeMem runs a persistent Bun worker service on port 37777 with a web viewer UI. 5 lifecycle hooks wire into Claude Code [15]:

Hook Call Action
SessionStart startup/clear/compact starts worker service, then calls hook claude-code context to inject prior session summaries
UserPromptSubmit every prompt hook claude-code session-init
PostToolUse every tool hook claude-code observation (120s timeout, fires on every tool call)
PreToolUse (Read) file reads hook claude-code file-context
Stop session end hook claude-code summarize (120s timeout)

The worker calls the Claude Agent SDK (runs on subscription quota, no separate API billing) to: (a) compress tool observations into semantic summaries, (b) extract decisions and lessons, and (c) on session end write a cross-session handoff [13].

Storage. SQLite DB for sessions, observations, summaries. Optional Chroma vector DB (via uvx chroma-mcp subprocess) for semantic search; Chroma is opt-in and adds a background MCP subprocess. Default local path: ~/.claude-mem/ [16].

Embedding model. When Chroma is enabled, embeddings are managed by chromadb's default embedding function (sentence-transformers/all-MiniLM-L6-v2, 384-dim). No custom embedding model is hardcoded in the source; the Chroma MCP subprocess handles embedding [16]. Without Chroma, search is FTS5 BM25 keyword-only.

3-layer retrieval API (MCP tools) [13]: 1. search - compact index, ~50-100 tokens per result, returns IDs. 2. timeline - chronological context around specific results. 3. get_observations - full details fetched by IDs (~500-1000 tokens per result).

This "index first, details by ID" pattern delivers ~10x token savings vs fetching full results immediately [13]. The same pattern applies to the mem-search skill.

Token/cost profile. PostToolUse fires on every tool call with a 120s timeout; this is the largest overhead. The Stop hook spawns a Claude Agent SDK call which bills subscription quota to summarize the session. In a heavy 50-tool session, this is 2-3 Sonnet calls. The SessionStart context injection pulls the last session's summaries (~500-2000 tokens).

Auto-generated folder-level CLAUDE.md. ClaudeMem does not generate folder-level CLAUDE.md files. It generates a session handoff injected at SessionStart, not CLAUDE.md. The KimYx0207/claude-memory-3layer tool [17] does generate per-project .claude/memory/MEMORY.md with lifecycle management; that is a separate project, not the canonical claude-mem.


Synthesis

The three tools address different problems: Superpowers is a development process enforcer; Context Mode is a token budget manager; ClaudeMem is a session memory accumulator. Forge already has strong coverage of the memory accumulator problem and partial coverage of process enforcement. It has zero coverage of token budget management.

Design patterns worth importing from ClaudeMem. The 3-layer retrieval pattern (index with IDs first, timeline for chronological context, details by explicit ID) is more token-efficient than forge's current /recall design, which returns full chunks immediately. For any future /recall v2 work, adopting search-returns-IDs, get-details-by-ID would reduce context consumption on multi-query research sessions.

Design patterns worth importing from Superpowers. The hard-gate pattern (HARD-GATE block in brainstorming skill that literally says "do not proceed without approval") is more reliable than text instructions. The writing-plans format (exact file paths, complete code, verification step, commit per task) is a higher-quality briefing structure than forge's current /spawn prompts. Consider adding a canonical worker-briefing template to forge/.claude/skills/ that follows this structure.

Context Mode fills a real gap. No current forge tool prevents raw tool output from inflating context. On a 4-hour coordinator bot dev session reading 20+ files and running tests, uncompressed output could push context to compaction 2x sooner. Context Mode's hook-based interception is zero-config once installed.


Forge-stack comparison

Superpowers

Capability Superpowers provides Forge equivalent Verdict
Pre-code design gate Hard HARD-GATE block, spec must be approved before impl feedback_robust_over_quick.md (text instruction only) Superpowers stronger
Subagent dispatch per task subagent-driven-development skill, fresh context per task /spawn pattern + dispatcher workers Equivalent
Two-stage code review Spec compliance reviewer + code quality reviewer subagents Eval harness (post-commit, doctrine-only) Superpowers stronger for code quality
TDD enforcement RED-GREEN-REFACTOR hard gates, delete code written before tests No equivalent; TDD optional Superpowers only
Git worktree isolation using-git-worktrees skill auto-creates isolated branch forge_worktree.sh (manual invocation) Superpowers more automatic
Plan documentation docs/superpowers/plans/ with task checkboxes /spawn prompt informal Superpowers stronger
Token overhead ~1.4k startup + 2-5k per skill lazy-loaded N/A Low overhead
Doctrine alignment Skills override defaults, user CLAUDE.md has priority CLAUDE.md is canonical Compatible

Context Mode (scottconverse variant)

Capability Context Mode provides Forge equivalent Verdict
Tool output compression 3-stage pipeline, 10 format-aware compressors None Context Mode only
Bash interception (git log/diff, test runners) PreToolUse hook denies/redirects 18 patterns None Context Mode only
WebFetch interception Redirects to fetch_and_index with 24h TTL cache None Context Mode only
FTS5 knowledge base per session BM25 + trigram + RRF fusion search over session content /recall (cross-session, permanent) Different scope, both useful
Session continuity after compaction PreCompact snapshot + structured Session Guide rebuild Auto-dream nightly consolidation Context Mode handles intra-session; forge handles cross-session
Context savings reporting /contextmode:ctx-stats with token and dollar estimates None Context Mode only
Self-learning retention Feedback loop raises retention for frequently-retrieved patterns None Context Mode only
Process isolation Subprocess with stdout cap, no filesystem restriction Dispatcher workers (full isolation) Different scope

ClaudeMem

Capability ClaudeMem provides Forge equivalent Verdict
Session observation capture PostToolUse hook, every tool, Claude Agent SDK compression auto-memory (Stop hook, confidence-gated) Forge more conservative (confidence gate), ClaudeMem more aggressive
Cross-session memory injection SessionStart context injection, last-session summaries auto-memory writes to MEMORY.md topic files, auto-loaded Equivalent, different format
Semantic search Optional Chroma vector search (MiniLM-L6-v2, 384-dim) /recall (BGE-small-en-v1.5, 384-dim, sqlite-vec) Equivalent embedding dim; forge integrated, ClaudeMem optional
3-layer retrieval (index/timeline/details) MCP tools: search, timeline, get_observations /recall returns full chunks immediately ClaudeMem design more token-efficient
Nightly consolidation None (worker processes continuously) auto-dream (dedup, stale prune, promote staged) Forge stronger
Memory audit trail None LESSONS.md append-only audit, forge_memory_revert.py Forge only
Memory revert None forge_memory_revert.py --session Forge only
Sensitive content protection <private> tag manual exclusion Pre-redaction of secrets before Sonnet, safe_path() guard Forge stronger
Persistent background service Bun worker on port 37777 N/A (cron-based) Different arch; ClaudeMem higher operational complexity
Web viewer UI http://localhost:37777 None ClaudeMem only
AGPL-3.0 license Yes N/A Copyleft; fine for personal forge use

Recommendations

Superpowers: Steal-design-only. Forge already has equivalent process enforcement via doctrine plus /spawn plus eval harness. Superpowers would add genuine value only for disciplined greenfield coding features, but Justin's workflow (Telegram-driven, operator-heavy, few greenfield CLI tools) doesn't match the brainstorm-plan-TDD pattern. The install adds ~1.4k tokens of session overhead every session. More importantly, the hard-gate and worker-briefing document patterns should be stolen into forge's own skill system: add a forge/.claude/skills/feature-plan/SKILL.md that enforces spec-before-code and uses the writing-plans task structure.

Context Mode: Install. The scottconverse variant is recommended.

/plugin marketplace add scottconverse/context-mode
/plugin install context-mode@scottconverse-context-mode
Then verify: /context-mode:ctx-doctor This fills a genuine gap: zero tool-output compression exists in forge today. Long coordinator-bot dev sessions and research-heavy remote-bridge sessions will benefit immediately. The hook intercept is silent and zero-config; it does not interfere with forge's existing hooks or CLAUDE.md rules. The CLAUDE.md shipped with the plugin uses "Think in Code" directive and tool-selection rules that complement forge doctrine. Risk: scottconverse fork is very new. If stability is a concern, kianwoon/context-mode (claude plugin add kianwoon/context-mode) is a lighter alternative with 4 tools and 2 hooks.

ClaudeMem: Skip. The Bun persistent background worker, Chroma subprocess, AGPL license complication, and PostToolUse hook firing on every single tool call (120s timeout budget) adds operational complexity that forge's auto-memory system already covers at lower cost. ClaudeMem's biggest risk is the PostToolUse hook competing with forge's existing PostToolUse hooks. Forge's confidence-gated auto-memory is more conservative (fewer false writes) and has audit/revert tooling ClaudeMem lacks.

Design pieces to steal from ClaudeMem into forge /recall v2: 1. Index-first retrieval: change /recall to return a compact result index (title, date, 1-line summary, ID) in the first call, then offer --get-details <id> for full chunk text. This would cut /recall context consumption by ~70% on multi-result queries. 2. Timeline tool: given an ID, return the N entries before and after it chronologically. Useful for "what was the context around this decision" queries.


Disagreements / open questions

  1. Context Mode maintainer. Both kianwoon/context-mode and scottconverse/context-mode are ports/derivatives of mksglu/context-mode. The upstream mksglu project is not found in the search results; its maintenance status is unverified. If the upstream is abandoned, the ports may diverge without coordination.

  2. Superpowers token overhead varies by platform. The README claims skills are lazy-loaded, but the using-superpowers SessionStart injection contains the bootstrap text unconditionally. Some community forks (jnMetaCode/superpowers-zh) inject all 14 skills at startup. The obra/superpowers canonical repo is lazy. Verify behavior after install with /context-mode:ctx-stats or by counting tokens in the first session turn.

  3. ClaudeMem version mismatch. npm shows [email protected] but the GitHub repo plugin.json shows 12.5.0 as well while the package.json README badge shows 6.5.0. The npm version appears to reflect a different numbering scheme. This is unresolved; npx claude-mem install should install the latest npm-published version regardless.

  4. Context Mode license (Elastic License 2.0). ELv2 prohibits providing the software as a managed service. Personal forge use is permitted; embedding it in a commercial SaaS product would require re-evaluation.


Sources

  1. obra/superpowers GitHub repo, 177k stars, canonical source for install commands, skills list, mechanism.
  2. obra/superpowers-marketplace README, marketplace structure and plugin catalog.
  3. obra/superpowers session-start hook + using-superpowers SKILL.md, lazy-load confirmation and bootstrap injection mechanism.
  4. brainstorming SKILL.md, HARD-GATE block, spec approval flow, 9-step checklist.
  5. writing-plans SKILL.md, task structure, 2-5 min granularity, TDD steps.
  6. subagent-driven-development SKILL.md, two-stage review loop, model selection guidance.
  7. test-driven-development SKILL.md, RED-GREEN-REFACTOR hard gates, "Iron Law" delete requirement.
  8. using-superpowers SKILL.md, skill priority order, platform adaptation, Red Flags list.
  9. Forge reference_task_queue.md and /spawn pattern, forge dispatcher + spawn equivalent.
  10. Forge reference_eval_harness.md, 12-check eval harness pre-commit and nightly.
  11. kianwoon/context-mode GitHub repo, simpler 4-tool variant, README with FTS5 schema, hook table.
  12. scottconverse/context-mode GitHub repo, full README with 9-tool MCP, 18-rule routing table, 3-stage compression, SQLite schema, security model, install command.
  13. thedotmack/claude-mem GitHub repo, canonical claude-mem, 71k stars, README with 3-layer MCP workflow, hooks architecture.
  14. claude-mem on npm, v12.5.0, AGPL-3.0, install command.
  15. claude-mem plugin/hooks/hooks.json, 5 lifecycle hook registrations with exact commands and timeouts.
  16. claude-mem chroma-flowcharts.md, Chroma vector DB integration, ChromaSync write/read paths, embedding via chroma-mcp subprocess.
  17. KimYx0207/claude-memory-3layer, 3-layer JSON+MD+MD architecture, lifecycle management, git-trackable, token-efficient (~1500 tokens), one-line install - referenced for design comparison.
  18. coleam00/claude-memory-compiler, 975 stars, Karpathy LLM knowledge base architecture adapted for Claude Code sessions, no RAG design rationale (index beats cosine similarity at personal scale).
  19. severity1/claude-code-auto-memory, 140 stars, auto-managed CLAUDE.md sections via PostToolUse + Stop hook pattern.
  20. yoloshii/ClawMem, 150 stars, hybrid RAG memory (BM25 + vector + RRF + cross-encoder reranking), SAME/MAGMA/A-MEM architecture, for contrast with ClaudeMem.

Search trail

  1. GitHub API search: superpowers claude code - found obra repos
  2. obra/superpowers-marketplace README fetch - confirmed install command and plugin list
  3. obra/superpowers README fetch - core mechanism, skills list, workflow steps
  4. GitHub API search: context mode claude code plugin - found kianwoon and scottconverse
  5. GitHub API search: claudemem claude memory - found fragmentary results
  6. kianwoon/context-mode README fetch - simpler variant mechanism
  7. scottconverse/context-mode README fetch - full variant with routing table, compression, hooks
  8. obra/superpowers skills directory listing - confirmed 14 skills
  9. obra/superpowers hooks/session-start fetch - confirmed lazy-load mechanism
  10. obra/superpowers individual skill sizes - token overhead calculation
  11. GitHub API search: claude-mem memory plugin code stars - found coleam00, ClawMem
  12. npm registry: claude-mem - found thedotmack/claude-mem as canonical
  13. thedotmack/claude-mem README fetch - 71k stars, 5 hooks, 3-layer MCP, Chroma
  14. thedotmack/claude-mem plugin/hooks/hooks.json fetch - confirmed 5 hook registrations and timeouts
  15. thedotmack/claude-mem chroma-flowcharts.md fetch - Chroma vector DB architecture
  16. KimYx0207/claude-memory-3layer README fetch - 3-layer JSON+MD+MD architecture
  17. coleam00/claude-memory-compiler README + AGENTS.md fetch - Karpathy architecture, no-RAG rationale
  18. obra/superpowers using-superpowers SKILL.md + hooks.json fetch - confirmed single SessionStart hook, lazy skill loading
  19. forge auto-memory and eval harness reference files read - accurate forge-stack comparison baseline