Skip to content

JustinWieb-VR AI Video Pipeline, Handoff

URL: https://mkdocs.justinsforge.com/memory/handoffs/justinwieb-vr-ai-video-pipeline-2026-04-29/

Date: 2026-04-29 Owner: Future Claude Code session (or worker) that builds Phase 1, Justin pairing on it Parent session: Conversation 2026-04-29 where Justin described the workflow and asked what AI can automate


The Goal

Build a forge-driven pipeline that absorbs the manual assembly labor in Justin's video workflow so that when he opens Premiere Pro, the project is already 60 to 70 percent built. Final taste decisions (B-roll placement, music selection, color polish) stay manual. Everything else (folder scaffold, transcription, rough cuts, captions, social packaging, thumbnail drafting, performance feedback) becomes automated.

Justin keeps Premiere Pro. No tool migration. The integration surface is FCP7 XML import on the Premiere side and forge scripts on the ingest side.

Success looks like: Justin drops footage in JustinWieb-VR/<project>/Footage/, walks away, comes back, opens <project>.xml in Premiere, and the rough cut is on the timeline matching his script. He refines with Text-Based Editing instead of doing the first pass blind. When he exports, social captions, thumbnails, and per-platform variants are waiting.


Justin's Current Workflow (verbatim from him, restructured)

Stage What Justin does today
1. Script Writes the script in Google Docs or Notion. Often uses AI assist for the initial draft.
2. Shoot Sets up camera, shoots. Sony A7iv via OBS for desktop work, iPhone for mobile/handheld.
3. Folder scaffold Runs a Windows PC script that creates a dated meta-project folder, e.g. 2026-04-29_Meta-Project/. Each project gets the same subfolder structure. Reference example: 2026-04-07_Quest-Ad-3-Traitors. Could be moved to forge / LifeOS / Telegram bot.
4. Upload footage Drops Sony A7iv (OBS) or iPhone footage into the project's Footage/ subfolder. Adds images, assets, and game footage to the appropriate folders.
5. Open Premiere Creates a new Premiere project named the same as the folder. Imports footage and assets manually.
6. Trim with Text-Based Editing First pass: uses Premiere's TBE feature to cut the talking-head footage down by deleting words from the transcript.
7. Layout Decides per-section: full talking head, or info-overlay layout (Justin cropped above or below the frame, info graphic in the open space).
8. Music + SFX Adds music bed, sound effects.
9. B-roll / assets Inserts B-roll, game footage, asset overlays.
10. Captions Generates captions, places them on the timeline.
11. Thumbnail still Takes a still from somewhere in the edit, creates a thumbnail. Embeds a still at the end of the export so YouTube can auto-pick it for the thumbnail.
12. Export Exports the final video.
13. Social caption Writes / AI-assists the caption text and platform-specific copy. Saves as a note file alongside the export.
14. Mobile delivery Connects iPhone to the server, downloads the exported video, copies the caption text.
15. Publish Uploads to Instagram Reel, YouTube Shorts, TikTok, etc. from the iPhone.

Key constraints / preferences: - Stays on Premiere Pro. Has a working Adobe subscription, knows the tool, uses Text-Based Editing heavily. - Open to new tools that augment Premiere, not replace it. - Uses AI for scripting and social caption generation already, comfortable with it. - The folder scaffold script lives on the Windows PC currently; would prefer cross-device (Telegram, LifeOS, mobile). - A7iv files are heavy; proxies would help editing performance. - Workflow is solo; no team handoff to design around.


Current State

Phase 0 (this handoff): design and inventory. Done.

Nothing built yet. This is a greenfield design. Justin wants to "work on this scripting soon" so the next session is a build session.

What forge already has that this pipeline can lean on: - forge_text_sanitize for em-dash purging on any LLM-generated copy - LLM routing infrastructure for caption / title generation - Notion API client for pulling scripts from Notion - /save-to-drive and the Drive subsystem for file movement - /recall semantic search (could be repurposed for asset library indexing) - Telegram bot fleet for mobile triggers - Dispatcher + worker pattern for long-running jobs (transcription, proxy generation) - Cron infrastructure for scheduled tasks (analytics pulls)

What needs building (and is not in forge yet): - Whisper transcription wrapper (with word-level timestamps) - Script-to-transcript fuzzy aligner - FCP7 XML generator - ffmpeg proxy generation pipeline - Folder scaffold tool (/new-video) - Asset library semantic indexer - Thumbnail generation pipeline (frame extraction + template compositing) - YouTube/IG analytics pullers - Per-platform packaging generator


What AI Can Automate, the Full Landscape

Categorized by where in the workflow it slots in. Maturity column: Mature = production-ready, do this; Solid = works well, some tuning needed; Experimental = cool, not reliable yet.

Pre-production

Capability Approach Maturity
Idea generation LLM watches niche trends, audience comments, channel analytics; surfaces topics weekly Mature
Script drafting in Justin's voice LLM fine-tuned (or few-shot prompted) on past scripts Mature
Hook prediction Score draft scripts against retention models Experimental
Shot list from script LLM parses script, generates "shoot this, this, this" + B-roll needs Mature
Mobile teleprompter Forge bot pushes script to phone/iPad, scrolls at his pace Trivial

Ingest + first pass (the big bucket)

Capability Approach Maturity
Watch-folder ingest Forge service watches JustinWieb-VR/<proj>/Footage/, fires pipeline on new files Mature
Proxy generation ffmpeg generates edit proxies on Finn NVMe, A7iv files become snappy in Premiere Mature
Whisper transcription Word-level timestamps, runs locally on Finn GPU or Console CPU Mature
Script-to-transcript alignment Fuzzy match Justin's written script against Whisper output, identify best takes, mark cut points Solid (custom build)
Filler-word removal "Um/uh/like" detected via Whisper, auto-cut. 30+ min saved per video Mature
False-start detection Repeated lines auto-pick the better take using audio quality + script alignment Solid
Pause shortening Tighten gaps to a target duration Mature
Hook detection Find best 3-second moment, suggest as opener Solid
FCP7 XML generation Write project.xml: bin structure, imports, sequence, cuts, subtitle track Mature (well-documented format)

Edit-time augmentation (still in Premiere)

Capability Approach Maturity
Auto B-roll suggestion CLIP-embedded asset library, semantic search against script Solid
Generate B-roll that doesn't exist Sora 2, Veo 3, Runway Gen-4, Kling Solid for cinematic, weak for product/gameplay
Layout cue detection LLM reads script for "let me show you X" markers, tags those sections as info-overlay Solid
9:16 reframe with subject tracking Auto-crop horizontal to vertical following his face Mature (Adobe Auto Reframe, DaVinci Smart Reframe, Captions.app)
Music selection LLM matches script mood to track library, beat-syncs cuts Solid (Epidemic Sound API, Mubert)
SFX placement Auto-place whooshes, impacts at scene changes / beats Solid
Lower-thirds + chyrons Auto-generate when Justin names a person, product, or place Solid
Voice enhancement Adobe Enhance Speech, DaVinci Voice Isolation, Auphonic Mature
Auto color match Match shots, apply LUT, balance exposure Mature (Lumetri Color Match)
Stabilize, denoise, upscale Topaz Video AI does all three at studio quality, runs on GPU Mature
Auto-blur sensitive info License plates, screens, bystander faces Solid (YOLO + ffmpeg or Runway)

Packaging (the multi-format cliff)

Capability Approach Maturity
Auto-shorts from long-form Pull best 30-60s clips for Shorts/Reels/TikTok with captions burned in Mature (OpusClip, Submagic, Vizard, Klap)
Multi-aspect-ratio export One master, AI delivers 16:9, 9:16, 1:1 with reframing Mature
Per-platform titles + descriptions + tags LLM generates each variant from script Mature
Thumbnail generation + A/B Best frame, text overlays in his style, 3 variants for YouTube A/B test Mature
YouTube chapters Auto-generate from script section headers + transcript Trivial
Translation + AI dubbing Captions in 12 languages, voice-cloned dubbing Solid (ElevenLabs, HeyGen, Captions Lipdub)
End-screen graphics Subscribe + related-video tiles, auto-composited Solid

Distribution

Capability Approach Maturity
Scheduled multi-channel upload One trigger: YT long + YT Shorts + IG Reel + TikTok + X with platform-specific metadata Mature (forge cron + APIs)
Cross-promotion injection Forge identifies past videos worth mentioning, drafts the line Solid
Comment moderation Pre-screen, hide spam, surface replies worth time Mature
Reply drafts Top comments get drafted replies in his voice, approve in Telegram Solid

Post-release feedback loop (the missing piece)

This is the long-term moat. Almost no creator does this manually.

  • Pull YouTube + IG analytics into forge nightly
  • Attribute performance to script structure, thumbnail style, hook pattern, length, CTA placement
  • Build a model of "what works for Justin specifically"
  • Feed insights back into next video's script + thumbnail + hook generation
  • Surface in daily brief: "top performer this week was X, here's why, here's what to do more of"

Files You'll Work With

File / Path Purpose
forge/scripts/forge_video_scaffold.py (new) /new-video <name> creates dated project folder structure
forge/scripts/forge_video_proxy.py (new) ffmpeg proxy generation, watches Footage/, outputs to Proxies/
forge/scripts/forge_video_transcribe.py (new) Whisper wrapper, outputs transcript.json (word-level) and transcript.srt
forge/scripts/forge_video_align.py (new) Script-to-transcript fuzzy aligner, outputs cuts.json
forge/scripts/forge_video_xml.py (new) FCP7 XML generator from cuts.json + media list
forge/scripts/forge_video_thumbnail.py (new) Frame extraction + template compositor, outputs thumbnail variants
forge/scripts/forge_video_social.py (new) Per-platform packaging: titles, descriptions, hashtags from script
forge/scripts/forge_video_assets_index.py (new) Semantic indexer for Assets/ folders across all VR projects
forge/scripts/forge_video_analytics.py (new) YouTube + IG analytics puller, nightly cron
JustinWieb-VR/CLAUDE.md (new) Path-scoped briefing that auto-loads when CWD descends into JustinWieb-VR
forge/memory/general/reference_video_pipeline.md (new) Topic file describing the pipeline + adding to MEMORY.md index
~/.forge-secrets/youtube.env (new) YouTube Data API + OAuth
~/.forge-secrets/instagram.env (new) IG Graph API
~/.forge-secrets/elevenlabs.env (existing? verify) For voice dubbing later

Reference structure for project folders, mirroring 2026-04-07_Quest-Ad-3-Traitors:

JustinWieb-VR/
  YYYY-MM-DD_<project-slug>/
    script.md                 # synced from Notion or Drive
    Footage/                  # watched folder, fires pipeline on new files
    Proxies/                  # auto-generated edit proxies
    Assets/                   # graphics, logos, overlays
    Game-Footage/             # gameplay capture
    Music/                    # audio beds
    SFX/                      # sound effects
    Captions/                 # transcript.srt, transcript.json, cuts.json
    Exports/                  # final renders
    Thumbnails/               # generated variants
    social/                   # per-platform copy + clips
    project.xml               # generated, opens in Premiere

Likely Approach

Build in phases. Each phase shippable on its own; later phases compound.

Phase 1, foundation (the build that makes the rest possible)

Goal: Justin drops footage, opens Premiere to a half-built project.

  1. /new-video <name> creates the dated folder scaffold
  2. Watch-folder service on Footage/ triggers the pipeline
  3. ffmpeg proxy generation
  4. Whisper transcription with word-level timestamps
  5. Script aligner reads script.md, produces cuts.json (rough cut decisions)
  6. FCP7 XML generator writes <project>.xml
  7. Justin opens the XML in Premiere, project is built, refines with Text-Based Editing

Done when: Justin can drop A7iv footage in a folder, walk away for 10 minutes, open Premiere to a project with bins, imports, rough cuts, and a captions track.

Phase 2, edit-time augmentation

  1. Filler word + false-start cleanup pass before XML export (saves 30+ min/video)
  2. Hook detection, marks the strongest opener candidate in the rough cut
  3. Asset library semantic indexer (Assets/ across all projects gets CLIP-embedded)
  4. B-roll suggestions, surface relevant clips at script keywords as Premiere markers

Done when: the rough cut has filler words removed, hook moment marked, and B-roll suggestions on the timeline.

Phase 3, packaging

  1. Per-platform copy generator: YT long, YT Shorts, IG Reel, TikTok, X copy from script.md
  2. Thumbnail generator: extract best frame, composite text overlays in his style, output 3 variants
  3. Auto-shorts: identify best 30 to 60s clip, render vertical with captions burned in
  4. Multi-aspect-ratio render presets

Done when: export from Premiere triggers social/ folder population with all platform variants ready to upload.

Phase 4, distribution + feedback

  1. Scheduled multi-channel upload via APIs
  2. YouTube + IG analytics nightly puller
  3. Performance attribution model (script structure to retention)
  4. Daily brief integration: "your top performer this week, what's working, what to try"

Done when: Justin publishes from a single forge command and sees analytics-driven feedback in his daily brief.


Don't Do

  • Don't switch Justin off Premiere Pro. He has the subscription, knows TBE, doesn't want a new editor. DaVinci's Python API is more elegant, but that's not a reason to migrate.
  • Don't try to automate taste. B-roll placement, music selection, color polish, layout decisions stay manual. AI surfaces and suggests; Justin decides.
  • Don't build a new caption tool when Whisper exists. One transcription source feeds everything: Premiere captions, social copy, chapters, search index.
  • Don't put cloud LLMs in the hot path for raw footage. Transcription runs locally. Footage doesn't leave Finn unless the user explicitly opts in.
  • Don't store API tokens in git. YouTube, IG, ElevenLabs, all go in ~/.forge-secrets/.
  • Don't build the analytics feedback loop in Phase 1. It's the highest-value piece long-term but useless without a data history. Foundation first.
  • Don't use em dashes anywhere in generated copy. Run all LLM output through forge_text_sanitize.
  • Don't auto-publish without confirmation. Multi-channel upload requires Justin's explicit go per release.

Deliverables

  1. Phase 1 working end-to-end: drop footage, get a Premiere XML
  2. JustinWieb-VR/CLAUDE.md with the pipeline rules and conventions
  3. reference_video_pipeline.md topic file + MEMORY.md entry
  4. /new-video skill for cross-device project creation
  5. Test run on a real upcoming video; measure time saved vs. manual

Done When

Phase 1 done when: - [ ] /new-video <name> creates the full folder scaffold - [ ] Dropping a .mp4 in Footage/ triggers proxy + transcription within 10 min - [ ] script.md aligns to the transcript and produces cuts.json - [ ] project.xml opens cleanly in Premiere with bins, imports, rough cuts, captions track - [ ] Justin's first real-world test produces a usable starting timeline

Pipeline done when: - [ ] Justin's manual labor per video drops from ~6 hours to ~2 hours - [ ] Social packaging (titles, descriptions, hashtags, shorts) generated automatically - [ ] Thumbnail variants drafted automatically - [ ] Daily brief surfaces performance attribution from past releases - [ ] Justin can kick off a project from his phone (Telegram or LifeOS), shoot, drop, edit, publish, all without touching a Windows PC script


Open Questions for Phase 1 build session

  1. Whisper inference target: Finn GPU (if available), Console CPU, or cloud API for speed? Justin's call.
  2. Proxy spec: ProRes Proxy or DNxHR LB? Premiere prefers DNxHR on Windows; ProRes is fine on Sol/macOS.
  3. Notion-to-folder script sync: push (Notion webhook to forge) or pull (forge polls a tagged DB view)?
  4. Folder location: does JustinWieb-VR/ live on Finn NFS-mounted to Sol, or local-first on Sol with Finn backup? Affects watch-folder design.
  5. Existing Windows scaffold script: does Justin still have it? Worth porting the structure rather than reinventing.
  6. Sample project for Phase 1 test: what's the next planned VR video? Use it as the live test case.