JustinWieb-VR AI Video Pipeline, Handoff¶

URL: https://mkdocs.justinsforge.com/memory/handoffs/justinwieb-vr-ai-video-pipeline-2026-04-29/

Date: 2026-04-29 Owner: Future Claude Code session (or worker) that builds Phase 1, Justin pairing on it Parent session: Conversation 2026-04-29 where Justin described the workflow and asked what AI can automate

The Goal¶

Build a forge-driven pipeline that absorbs the manual assembly labor in Justin's video workflow so that when he opens Premiere Pro, the project is already 60 to 70 percent built. Final taste decisions (B-roll placement, music selection, color polish) stay manual. Everything else (folder scaffold, transcription, rough cuts, captions, social packaging, thumbnail drafting, performance feedback) becomes automated.

Justin keeps Premiere Pro. No tool migration. The integration surface is FCP7 XML import on the Premiere side and forge scripts on the ingest side.

Success looks like: Justin drops footage in JustinWieb-VR/<project>/Footage/, walks away, comes back, opens <project>.xml in Premiere, and the rough cut is on the timeline matching his script. He refines with Text-Based Editing instead of doing the first pass blind. When he exports, social captions, thumbnails, and per-platform variants are waiting.

Justin's Current Workflow (verbatim from him, restructured)¶

Stage	What Justin does today
1. Script	Writes the script in Google Docs or Notion. Often uses AI assist for the initial draft.
2. Shoot	Sets up camera, shoots. Sony A7iv via OBS for desktop work, iPhone for mobile/handheld.
3. Folder scaffold	Runs a Windows PC script that creates a dated meta-project folder, e.g. `2026-04-29_Meta-Project/`. Each project gets the same subfolder structure. Reference example: `2026-04-07_Quest-Ad-3-Traitors`. Could be moved to forge / LifeOS / Telegram bot.
4. Upload footage	Drops Sony A7iv (OBS) or iPhone footage into the project's `Footage/` subfolder. Adds images, assets, and game footage to the appropriate folders.
5. Open Premiere	Creates a new Premiere project named the same as the folder. Imports footage and assets manually.
6. Trim with Text-Based Editing	First pass: uses Premiere's TBE feature to cut the talking-head footage down by deleting words from the transcript.
7. Layout	Decides per-section: full talking head, or info-overlay layout (Justin cropped above or below the frame, info graphic in the open space).
8. Music + SFX	Adds music bed, sound effects.
9. B-roll / assets	Inserts B-roll, game footage, asset overlays.
10. Captions	Generates captions, places them on the timeline.
11. Thumbnail still	Takes a still from somewhere in the edit, creates a thumbnail. Embeds a still at the end of the export so YouTube can auto-pick it for the thumbnail.
12. Export	Exports the final video.
13. Social caption	Writes / AI-assists the caption text and platform-specific copy. Saves as a note file alongside the export.
14. Mobile delivery	Connects iPhone to the server, downloads the exported video, copies the caption text.
15. Publish	Uploads to Instagram Reel, YouTube Shorts, TikTok, etc. from the iPhone.

Key constraints / preferences: - Stays on Premiere Pro. Has a working Adobe subscription, knows the tool, uses Text-Based Editing heavily. - Open to new tools that augment Premiere, not replace it. - Uses AI for scripting and social caption generation already, comfortable with it. - The folder scaffold script lives on the Windows PC currently; would prefer cross-device (Telegram, LifeOS, mobile). - A7iv files are heavy; proxies would help editing performance. - Workflow is solo; no team handoff to design around.

Current State¶

Phase 0 (this handoff): design and inventory. Done.

Nothing built yet. This is a greenfield design. Justin wants to "work on this scripting soon" so the next session is a build session.

What forge already has that this pipeline can lean on: - forge_text_sanitize for em-dash purging on any LLM-generated copy - LLM routing infrastructure for caption / title generation - Notion API client for pulling scripts from Notion - /save-to-drive and the Drive subsystem for file movement - /recall semantic search (could be repurposed for asset library indexing) - Telegram bot fleet for mobile triggers - Dispatcher + worker pattern for long-running jobs (transcription, proxy generation) - Cron infrastructure for scheduled tasks (analytics pulls)

What needs building (and is not in forge yet): - Whisper transcription wrapper (with word-level timestamps) - Script-to-transcript fuzzy aligner - FCP7 XML generator - ffmpeg proxy generation pipeline - Folder scaffold tool (/new-video) - Asset library semantic indexer - Thumbnail generation pipeline (frame extraction + template compositing) - YouTube/IG analytics pullers - Per-platform packaging generator

What AI Can Automate, the Full Landscape¶

Categorized by where in the workflow it slots in. Maturity column: Mature = production-ready, do this; Solid = works well, some tuning needed; Experimental = cool, not reliable yet.

Pre-production¶

Capability	Approach	Maturity
Idea generation	LLM watches niche trends, audience comments, channel analytics; surfaces topics weekly	Mature
Script drafting in Justin's voice	LLM fine-tuned (or few-shot prompted) on past scripts	Mature
Hook prediction	Score draft scripts against retention models	Experimental
Shot list from script	LLM parses script, generates "shoot this, this, this" + B-roll needs	Mature
Mobile teleprompter	Forge bot pushes script to phone/iPad, scrolls at his pace	Trivial

Ingest + first pass (the big bucket)¶

Capability	Approach	Maturity
Watch-folder ingest	Forge service watches `JustinWieb-VR/<proj>/Footage/`, fires pipeline on new files	Mature
Proxy generation	ffmpeg generates edit proxies on Finn NVMe, A7iv files become snappy in Premiere	Mature
Whisper transcription	Word-level timestamps, runs locally on Finn GPU or Console CPU	Mature
Script-to-transcript alignment	Fuzzy match Justin's written script against Whisper output, identify best takes, mark cut points	Solid (custom build)
Filler-word removal	"Um/uh/like" detected via Whisper, auto-cut. 30+ min saved per video	Mature
False-start detection	Repeated lines auto-pick the better take using audio quality + script alignment	Solid
Pause shortening	Tighten gaps to a target duration	Mature
Hook detection	Find best 3-second moment, suggest as opener	Solid
FCP7 XML generation	Write project.xml: bin structure, imports, sequence, cuts, subtitle track	Mature (well-documented format)

Edit-time augmentation (still in Premiere)¶

Capability	Approach	Maturity
Auto B-roll suggestion	CLIP-embedded asset library, semantic search against script	Solid
Generate B-roll that doesn't exist	Sora 2, Veo 3, Runway Gen-4, Kling	Solid for cinematic, weak for product/gameplay
Layout cue detection	LLM reads script for "let me show you X" markers, tags those sections as info-overlay	Solid
9:16 reframe with subject tracking	Auto-crop horizontal to vertical following his face	Mature (Adobe Auto Reframe, DaVinci Smart Reframe, Captions.app)
Music selection	LLM matches script mood to track library, beat-syncs cuts	Solid (Epidemic Sound API, Mubert)
SFX placement	Auto-place whooshes, impacts at scene changes / beats	Solid
Lower-thirds + chyrons	Auto-generate when Justin names a person, product, or place	Solid
Voice enhancement	Adobe Enhance Speech, DaVinci Voice Isolation, Auphonic	Mature
Auto color match	Match shots, apply LUT, balance exposure	Mature (Lumetri Color Match)
Stabilize, denoise, upscale	Topaz Video AI does all three at studio quality, runs on GPU	Mature
Auto-blur sensitive info	License plates, screens, bystander faces	Solid (YOLO + ffmpeg or Runway)

Packaging (the multi-format cliff)¶

Capability	Approach	Maturity
Auto-shorts from long-form	Pull best 30-60s clips for Shorts/Reels/TikTok with captions burned in	Mature (OpusClip, Submagic, Vizard, Klap)
Multi-aspect-ratio export	One master, AI delivers 16:9, 9:16, 1:1 with reframing	Mature
Per-platform titles + descriptions + tags	LLM generates each variant from script	Mature
Thumbnail generation + A/B	Best frame, text overlays in his style, 3 variants for YouTube A/B test	Mature
YouTube chapters	Auto-generate from script section headers + transcript	Trivial
Translation + AI dubbing	Captions in 12 languages, voice-cloned dubbing	Solid (ElevenLabs, HeyGen, Captions Lipdub)
End-screen graphics	Subscribe + related-video tiles, auto-composited	Solid

Distribution¶

Capability	Approach	Maturity
Scheduled multi-channel upload	One trigger: YT long + YT Shorts + IG Reel + TikTok + X with platform-specific metadata	Mature (forge cron + APIs)
Cross-promotion injection	Forge identifies past videos worth mentioning, drafts the line	Solid
Comment moderation	Pre-screen, hide spam, surface replies worth time	Mature
Reply drafts	Top comments get drafted replies in his voice, approve in Telegram	Solid

Post-release feedback loop (the missing piece)¶

This is the long-term moat. Almost no creator does this manually.

Pull YouTube + IG analytics into forge nightly
Attribute performance to script structure, thumbnail style, hook pattern, length, CTA placement
Build a model of "what works for Justin specifically"
Feed insights back into next video's script + thumbnail + hook generation
Surface in daily brief: "top performer this week was X, here's why, here's what to do more of"

Files You'll Work With¶

File / Path	Purpose
`forge/scripts/forge_video_scaffold.py`	(new) `/new-video <name>` creates dated project folder structure
`forge/scripts/forge_video_proxy.py`	(new) ffmpeg proxy generation, watches `Footage/`, outputs to `Proxies/`
`forge/scripts/forge_video_transcribe.py`	(new) Whisper wrapper, outputs `transcript.json` (word-level) and `transcript.srt`
`forge/scripts/forge_video_align.py`	(new) Script-to-transcript fuzzy aligner, outputs `cuts.json`
`forge/scripts/forge_video_xml.py`	(new) FCP7 XML generator from `cuts.json` + media list
`forge/scripts/forge_video_thumbnail.py`	(new) Frame extraction + template compositor, outputs thumbnail variants
`forge/scripts/forge_video_social.py`	(new) Per-platform packaging: titles, descriptions, hashtags from script
`forge/scripts/forge_video_assets_index.py`	(new) Semantic indexer for `Assets/` folders across all VR projects
`forge/scripts/forge_video_analytics.py`	(new) YouTube + IG analytics puller, nightly cron
`JustinWieb-VR/CLAUDE.md`	(new) Path-scoped briefing that auto-loads when CWD descends into JustinWieb-VR
`forge/memory/general/reference_video_pipeline.md`	(new) Topic file describing the pipeline + adding to `MEMORY.md` index
`~/.forge-secrets/youtube.env`	(new) YouTube Data API + OAuth
`~/.forge-secrets/instagram.env`	(new) IG Graph API
`~/.forge-secrets/elevenlabs.env`	(existing? verify) For voice dubbing later

Reference structure for project folders, mirroring 2026-04-07_Quest-Ad-3-Traitors:

JustinWieb-VR/
  YYYY-MM-DD_<project-slug>/
    script.md                 # synced from Notion or Drive
    Footage/                  # watched folder, fires pipeline on new files
    Proxies/                  # auto-generated edit proxies
    Assets/                   # graphics, logos, overlays
    Game-Footage/             # gameplay capture
    Music/                    # audio beds
    SFX/                      # sound effects
    Captions/                 # transcript.srt, transcript.json, cuts.json
    Exports/                  # final renders
    Thumbnails/               # generated variants
    social/                   # per-platform copy + clips
    project.xml               # generated, opens in Premiere

Likely Approach¶

Build in phases. Each phase shippable on its own; later phases compound.

Phase 1, foundation (the build that makes the rest possible)¶

Goal: Justin drops footage, opens Premiere to a half-built project.

/new-video <name> creates the dated folder scaffold
Watch-folder service on Footage/ triggers the pipeline
ffmpeg proxy generation
Whisper transcription with word-level timestamps
Script aligner reads script.md, produces cuts.json (rough cut decisions)
FCP7 XML generator writes <project>.xml
Justin opens the XML in Premiere, project is built, refines with Text-Based Editing

Done when: Justin can drop A7iv footage in a folder, walk away for 10 minutes, open Premiere to a project with bins, imports, rough cuts, and a captions track.

Phase 2, edit-time augmentation¶

Filler word + false-start cleanup pass before XML export (saves 30+ min/video)
Hook detection, marks the strongest opener candidate in the rough cut
Asset library semantic indexer (Assets/ across all projects gets CLIP-embedded)
B-roll suggestions, surface relevant clips at script keywords as Premiere markers

Done when: the rough cut has filler words removed, hook moment marked, and B-roll suggestions on the timeline.

Phase 3, packaging¶

Per-platform copy generator: YT long, YT Shorts, IG Reel, TikTok, X copy from script.md
Thumbnail generator: extract best frame, composite text overlays in his style, output 3 variants
Auto-shorts: identify best 30 to 60s clip, render vertical with captions burned in
Multi-aspect-ratio render presets

Done when: export from Premiere triggers social/ folder population with all platform variants ready to upload.

Phase 4, distribution + feedback¶

Scheduled multi-channel upload via APIs
YouTube + IG analytics nightly puller
Performance attribution model (script structure to retention)
Daily brief integration: "your top performer this week, what's working, what to try"

Done when: Justin publishes from a single forge command and sees analytics-driven feedback in his daily brief.

Don't Do¶

Don't switch Justin off Premiere Pro. He has the subscription, knows TBE, doesn't want a new editor. DaVinci's Python API is more elegant, but that's not a reason to migrate.
Don't try to automate taste. B-roll placement, music selection, color polish, layout decisions stay manual. AI surfaces and suggests; Justin decides.
Don't build a new caption tool when Whisper exists. One transcription source feeds everything: Premiere captions, social copy, chapters, search index.
Don't put cloud LLMs in the hot path for raw footage. Transcription runs locally. Footage doesn't leave Finn unless the user explicitly opts in.
Don't store API tokens in git. YouTube, IG, ElevenLabs, all go in ~/.forge-secrets/.
Don't build the analytics feedback loop in Phase 1. It's the highest-value piece long-term but useless without a data history. Foundation first.
Don't use em dashes anywhere in generated copy. Run all LLM output through forge_text_sanitize.
Don't auto-publish without confirmation. Multi-channel upload requires Justin's explicit go per release.

Deliverables¶

Phase 1 working end-to-end: drop footage, get a Premiere XML
JustinWieb-VR/CLAUDE.md with the pipeline rules and conventions
reference_video_pipeline.md topic file + MEMORY.md entry
/new-video skill for cross-device project creation
Test run on a real upcoming video; measure time saved vs. manual

Done When¶

Phase 1 done when: - [ ] /new-video <name> creates the full folder scaffold - [ ] Dropping a .mp4 in Footage/ triggers proxy + transcription within 10 min - [ ] script.md aligns to the transcript and produces cuts.json - [ ] project.xml opens cleanly in Premiere with bins, imports, rough cuts, captions track - [ ] Justin's first real-world test produces a usable starting timeline

Pipeline done when: - [ ] Justin's manual labor per video drops from ~6 hours to ~2 hours - [ ] Social packaging (titles, descriptions, hashtags, shorts) generated automatically - [ ] Thumbnail variants drafted automatically - [ ] Daily brief surfaces performance attribution from past releases - [ ] Justin can kick off a project from his phone (Telegram or LifeOS), shoot, drop, edit, publish, all without touching a Windows PC script

Open Questions for Phase 1 build session¶

Whisper inference target: Finn GPU (if available), Console CPU, or cloud API for speed? Justin's call.
Proxy spec: ProRes Proxy or DNxHR LB? Premiere prefers DNxHR on Windows; ProRes is fine on Sol/macOS.
Notion-to-folder script sync: push (Notion webhook to forge) or pull (forge polls a tagged DB view)?
Folder location: does JustinWieb-VR/ live on Finn NFS-mounted to Sol, or local-first on Sol with Finn backup? Affects watch-folder design.
Existing Windows scaffold script: does Justin still have it? Worth porting the structure rather than reinventing.
Sample project for Phase 1 test: what's the next planned VR video? Use it as the live test case.