JustinWieb-VR AI Video Pipeline, Handoff¶
URL: https://mkdocs.justinsforge.com/memory/handoffs/justinwieb-vr-ai-video-pipeline-2026-04-29/
Date: 2026-04-29 Owner: Future Claude Code session (or worker) that builds Phase 1, Justin pairing on it Parent session: Conversation 2026-04-29 where Justin described the workflow and asked what AI can automate
The Goal¶
Build a forge-driven pipeline that absorbs the manual assembly labor in Justin's video workflow so that when he opens Premiere Pro, the project is already 60 to 70 percent built. Final taste decisions (B-roll placement, music selection, color polish) stay manual. Everything else (folder scaffold, transcription, rough cuts, captions, social packaging, thumbnail drafting, performance feedback) becomes automated.
Justin keeps Premiere Pro. No tool migration. The integration surface is FCP7 XML import on the Premiere side and forge scripts on the ingest side.
Success looks like: Justin drops footage in JustinWieb-VR/<project>/Footage/, walks away, comes back, opens <project>.xml in Premiere, and the rough cut is on the timeline matching his script. He refines with Text-Based Editing instead of doing the first pass blind. When he exports, social captions, thumbnails, and per-platform variants are waiting.
Justin's Current Workflow (verbatim from him, restructured)¶
| Stage | What Justin does today |
|---|---|
| 1. Script | Writes the script in Google Docs or Notion. Often uses AI assist for the initial draft. |
| 2. Shoot | Sets up camera, shoots. Sony A7iv via OBS for desktop work, iPhone for mobile/handheld. |
| 3. Folder scaffold | Runs a Windows PC script that creates a dated meta-project folder, e.g. 2026-04-29_Meta-Project/. Each project gets the same subfolder structure. Reference example: 2026-04-07_Quest-Ad-3-Traitors. Could be moved to forge / LifeOS / Telegram bot. |
| 4. Upload footage | Drops Sony A7iv (OBS) or iPhone footage into the project's Footage/ subfolder. Adds images, assets, and game footage to the appropriate folders. |
| 5. Open Premiere | Creates a new Premiere project named the same as the folder. Imports footage and assets manually. |
| 6. Trim with Text-Based Editing | First pass: uses Premiere's TBE feature to cut the talking-head footage down by deleting words from the transcript. |
| 7. Layout | Decides per-section: full talking head, or info-overlay layout (Justin cropped above or below the frame, info graphic in the open space). |
| 8. Music + SFX | Adds music bed, sound effects. |
| 9. B-roll / assets | Inserts B-roll, game footage, asset overlays. |
| 10. Captions | Generates captions, places them on the timeline. |
| 11. Thumbnail still | Takes a still from somewhere in the edit, creates a thumbnail. Embeds a still at the end of the export so YouTube can auto-pick it for the thumbnail. |
| 12. Export | Exports the final video. |
| 13. Social caption | Writes / AI-assists the caption text and platform-specific copy. Saves as a note file alongside the export. |
| 14. Mobile delivery | Connects iPhone to the server, downloads the exported video, copies the caption text. |
| 15. Publish | Uploads to Instagram Reel, YouTube Shorts, TikTok, etc. from the iPhone. |
Key constraints / preferences: - Stays on Premiere Pro. Has a working Adobe subscription, knows the tool, uses Text-Based Editing heavily. - Open to new tools that augment Premiere, not replace it. - Uses AI for scripting and social caption generation already, comfortable with it. - The folder scaffold script lives on the Windows PC currently; would prefer cross-device (Telegram, LifeOS, mobile). - A7iv files are heavy; proxies would help editing performance. - Workflow is solo; no team handoff to design around.
Current State¶
Phase 0 (this handoff): design and inventory. Done.
Nothing built yet. This is a greenfield design. Justin wants to "work on this scripting soon" so the next session is a build session.
What forge already has that this pipeline can lean on:
- forge_text_sanitize for em-dash purging on any LLM-generated copy
- LLM routing infrastructure for caption / title generation
- Notion API client for pulling scripts from Notion
- /save-to-drive and the Drive subsystem for file movement
- /recall semantic search (could be repurposed for asset library indexing)
- Telegram bot fleet for mobile triggers
- Dispatcher + worker pattern for long-running jobs (transcription, proxy generation)
- Cron infrastructure for scheduled tasks (analytics pulls)
What needs building (and is not in forge yet):
- Whisper transcription wrapper (with word-level timestamps)
- Script-to-transcript fuzzy aligner
- FCP7 XML generator
- ffmpeg proxy generation pipeline
- Folder scaffold tool (/new-video)
- Asset library semantic indexer
- Thumbnail generation pipeline (frame extraction + template compositing)
- YouTube/IG analytics pullers
- Per-platform packaging generator
What AI Can Automate, the Full Landscape¶
Categorized by where in the workflow it slots in. Maturity column: Mature = production-ready, do this; Solid = works well, some tuning needed; Experimental = cool, not reliable yet.
Pre-production¶
| Capability | Approach | Maturity |
|---|---|---|
| Idea generation | LLM watches niche trends, audience comments, channel analytics; surfaces topics weekly | Mature |
| Script drafting in Justin's voice | LLM fine-tuned (or few-shot prompted) on past scripts | Mature |
| Hook prediction | Score draft scripts against retention models | Experimental |
| Shot list from script | LLM parses script, generates "shoot this, this, this" + B-roll needs | Mature |
| Mobile teleprompter | Forge bot pushes script to phone/iPad, scrolls at his pace | Trivial |
Ingest + first pass (the big bucket)¶
| Capability | Approach | Maturity |
|---|---|---|
| Watch-folder ingest | Forge service watches JustinWieb-VR/<proj>/Footage/, fires pipeline on new files |
Mature |
| Proxy generation | ffmpeg generates edit proxies on Finn NVMe, A7iv files become snappy in Premiere | Mature |
| Whisper transcription | Word-level timestamps, runs locally on Finn GPU or Console CPU | Mature |
| Script-to-transcript alignment | Fuzzy match Justin's written script against Whisper output, identify best takes, mark cut points | Solid (custom build) |
| Filler-word removal | "Um/uh/like" detected via Whisper, auto-cut. 30+ min saved per video | Mature |
| False-start detection | Repeated lines auto-pick the better take using audio quality + script alignment | Solid |
| Pause shortening | Tighten gaps to a target duration | Mature |
| Hook detection | Find best 3-second moment, suggest as opener | Solid |
| FCP7 XML generation | Write project.xml: bin structure, imports, sequence, cuts, subtitle track | Mature (well-documented format) |
Edit-time augmentation (still in Premiere)¶
| Capability | Approach | Maturity |
|---|---|---|
| Auto B-roll suggestion | CLIP-embedded asset library, semantic search against script | Solid |
| Generate B-roll that doesn't exist | Sora 2, Veo 3, Runway Gen-4, Kling | Solid for cinematic, weak for product/gameplay |
| Layout cue detection | LLM reads script for "let me show you X" markers, tags those sections as info-overlay | Solid |
| 9:16 reframe with subject tracking | Auto-crop horizontal to vertical following his face | Mature (Adobe Auto Reframe, DaVinci Smart Reframe, Captions.app) |
| Music selection | LLM matches script mood to track library, beat-syncs cuts | Solid (Epidemic Sound API, Mubert) |
| SFX placement | Auto-place whooshes, impacts at scene changes / beats | Solid |
| Lower-thirds + chyrons | Auto-generate when Justin names a person, product, or place | Solid |
| Voice enhancement | Adobe Enhance Speech, DaVinci Voice Isolation, Auphonic | Mature |
| Auto color match | Match shots, apply LUT, balance exposure | Mature (Lumetri Color Match) |
| Stabilize, denoise, upscale | Topaz Video AI does all three at studio quality, runs on GPU | Mature |
| Auto-blur sensitive info | License plates, screens, bystander faces | Solid (YOLO + ffmpeg or Runway) |
Packaging (the multi-format cliff)¶
| Capability | Approach | Maturity |
|---|---|---|
| Auto-shorts from long-form | Pull best 30-60s clips for Shorts/Reels/TikTok with captions burned in | Mature (OpusClip, Submagic, Vizard, Klap) |
| Multi-aspect-ratio export | One master, AI delivers 16:9, 9:16, 1:1 with reframing | Mature |
| Per-platform titles + descriptions + tags | LLM generates each variant from script | Mature |
| Thumbnail generation + A/B | Best frame, text overlays in his style, 3 variants for YouTube A/B test | Mature |
| YouTube chapters | Auto-generate from script section headers + transcript | Trivial |
| Translation + AI dubbing | Captions in 12 languages, voice-cloned dubbing | Solid (ElevenLabs, HeyGen, Captions Lipdub) |
| End-screen graphics | Subscribe + related-video tiles, auto-composited | Solid |
Distribution¶
| Capability | Approach | Maturity |
|---|---|---|
| Scheduled multi-channel upload | One trigger: YT long + YT Shorts + IG Reel + TikTok + X with platform-specific metadata | Mature (forge cron + APIs) |
| Cross-promotion injection | Forge identifies past videos worth mentioning, drafts the line | Solid |
| Comment moderation | Pre-screen, hide spam, surface replies worth time | Mature |
| Reply drafts | Top comments get drafted replies in his voice, approve in Telegram | Solid |
Post-release feedback loop (the missing piece)¶
This is the long-term moat. Almost no creator does this manually.
- Pull YouTube + IG analytics into forge nightly
- Attribute performance to script structure, thumbnail style, hook pattern, length, CTA placement
- Build a model of "what works for Justin specifically"
- Feed insights back into next video's script + thumbnail + hook generation
- Surface in daily brief: "top performer this week was X, here's why, here's what to do more of"
Files You'll Work With¶
| File / Path | Purpose |
|---|---|
forge/scripts/forge_video_scaffold.py |
(new) /new-video <name> creates dated project folder structure |
forge/scripts/forge_video_proxy.py |
(new) ffmpeg proxy generation, watches Footage/, outputs to Proxies/ |
forge/scripts/forge_video_transcribe.py |
(new) Whisper wrapper, outputs transcript.json (word-level) and transcript.srt |
forge/scripts/forge_video_align.py |
(new) Script-to-transcript fuzzy aligner, outputs cuts.json |
forge/scripts/forge_video_xml.py |
(new) FCP7 XML generator from cuts.json + media list |
forge/scripts/forge_video_thumbnail.py |
(new) Frame extraction + template compositor, outputs thumbnail variants |
forge/scripts/forge_video_social.py |
(new) Per-platform packaging: titles, descriptions, hashtags from script |
forge/scripts/forge_video_assets_index.py |
(new) Semantic indexer for Assets/ folders across all VR projects |
forge/scripts/forge_video_analytics.py |
(new) YouTube + IG analytics puller, nightly cron |
JustinWieb-VR/CLAUDE.md |
(new) Path-scoped briefing that auto-loads when CWD descends into JustinWieb-VR |
forge/memory/general/reference_video_pipeline.md |
(new) Topic file describing the pipeline + adding to MEMORY.md index |
~/.forge-secrets/youtube.env |
(new) YouTube Data API + OAuth |
~/.forge-secrets/instagram.env |
(new) IG Graph API |
~/.forge-secrets/elevenlabs.env |
(existing? verify) For voice dubbing later |
Reference structure for project folders, mirroring 2026-04-07_Quest-Ad-3-Traitors:
JustinWieb-VR/
YYYY-MM-DD_<project-slug>/
script.md # synced from Notion or Drive
Footage/ # watched folder, fires pipeline on new files
Proxies/ # auto-generated edit proxies
Assets/ # graphics, logos, overlays
Game-Footage/ # gameplay capture
Music/ # audio beds
SFX/ # sound effects
Captions/ # transcript.srt, transcript.json, cuts.json
Exports/ # final renders
Thumbnails/ # generated variants
social/ # per-platform copy + clips
project.xml # generated, opens in Premiere
Likely Approach¶
Build in phases. Each phase shippable on its own; later phases compound.
Phase 1, foundation (the build that makes the rest possible)¶
Goal: Justin drops footage, opens Premiere to a half-built project.
/new-video <name>creates the dated folder scaffold- Watch-folder service on
Footage/triggers the pipeline - ffmpeg proxy generation
- Whisper transcription with word-level timestamps
- Script aligner reads
script.md, producescuts.json(rough cut decisions) - FCP7 XML generator writes
<project>.xml - Justin opens the XML in Premiere, project is built, refines with Text-Based Editing
Done when: Justin can drop A7iv footage in a folder, walk away for 10 minutes, open Premiere to a project with bins, imports, rough cuts, and a captions track.
Phase 2, edit-time augmentation¶
- Filler word + false-start cleanup pass before XML export (saves 30+ min/video)
- Hook detection, marks the strongest opener candidate in the rough cut
- Asset library semantic indexer (
Assets/across all projects gets CLIP-embedded) - B-roll suggestions, surface relevant clips at script keywords as Premiere markers
Done when: the rough cut has filler words removed, hook moment marked, and B-roll suggestions on the timeline.
Phase 3, packaging¶
- Per-platform copy generator: YT long, YT Shorts, IG Reel, TikTok, X copy from
script.md - Thumbnail generator: extract best frame, composite text overlays in his style, output 3 variants
- Auto-shorts: identify best 30 to 60s clip, render vertical with captions burned in
- Multi-aspect-ratio render presets
Done when: export from Premiere triggers social/ folder population with all platform variants ready to upload.
Phase 4, distribution + feedback¶
- Scheduled multi-channel upload via APIs
- YouTube + IG analytics nightly puller
- Performance attribution model (script structure to retention)
- Daily brief integration: "your top performer this week, what's working, what to try"
Done when: Justin publishes from a single forge command and sees analytics-driven feedback in his daily brief.
Don't Do¶
- Don't switch Justin off Premiere Pro. He has the subscription, knows TBE, doesn't want a new editor. DaVinci's Python API is more elegant, but that's not a reason to migrate.
- Don't try to automate taste. B-roll placement, music selection, color polish, layout decisions stay manual. AI surfaces and suggests; Justin decides.
- Don't build a new caption tool when Whisper exists. One transcription source feeds everything: Premiere captions, social copy, chapters, search index.
- Don't put cloud LLMs in the hot path for raw footage. Transcription runs locally. Footage doesn't leave Finn unless the user explicitly opts in.
- Don't store API tokens in git. YouTube, IG, ElevenLabs, all go in
~/.forge-secrets/. - Don't build the analytics feedback loop in Phase 1. It's the highest-value piece long-term but useless without a data history. Foundation first.
- Don't use em dashes anywhere in generated copy. Run all LLM output through
forge_text_sanitize. - Don't auto-publish without confirmation. Multi-channel upload requires Justin's explicit go per release.
Deliverables¶
- Phase 1 working end-to-end: drop footage, get a Premiere XML
JustinWieb-VR/CLAUDE.mdwith the pipeline rules and conventionsreference_video_pipeline.mdtopic file +MEMORY.mdentry/new-videoskill for cross-device project creation- Test run on a real upcoming video; measure time saved vs. manual
Done When¶
Phase 1 done when:
- [ ] /new-video <name> creates the full folder scaffold
- [ ] Dropping a .mp4 in Footage/ triggers proxy + transcription within 10 min
- [ ] script.md aligns to the transcript and produces cuts.json
- [ ] project.xml opens cleanly in Premiere with bins, imports, rough cuts, captions track
- [ ] Justin's first real-world test produces a usable starting timeline
Pipeline done when: - [ ] Justin's manual labor per video drops from ~6 hours to ~2 hours - [ ] Social packaging (titles, descriptions, hashtags, shorts) generated automatically - [ ] Thumbnail variants drafted automatically - [ ] Daily brief surfaces performance attribution from past releases - [ ] Justin can kick off a project from his phone (Telegram or LifeOS), shoot, drop, edit, publish, all without touching a Windows PC script
Open Questions for Phase 1 build session¶
- Whisper inference target: Finn GPU (if available), Console CPU, or cloud API for speed? Justin's call.
- Proxy spec: ProRes Proxy or DNxHR LB? Premiere prefers DNxHR on Windows; ProRes is fine on Sol/macOS.
- Notion-to-folder script sync: push (Notion webhook to forge) or pull (forge polls a tagged DB view)?
- Folder location: does
JustinWieb-VR/live on Finn NFS-mounted to Sol, or local-first on Sol with Finn backup? Affects watch-folder design. - Existing Windows scaffold script: does Justin still have it? Worth porting the structure rather than reinventing.
- Sample project for Phase 1 test: what's the next planned VR video? Use it as the live test case.