Asset Grabber Playbook¶
Tools for pulling images / videos / sounds from anywhere on the web, optimizing them, and cataloging Justin's asset library.
What's installed¶
| Tool | Where | Purpose |
|---|---|---|
| Playwright (python) + Chromium | venv ~/.forge-venvs/assets/ |
Headless browser for scraping + search |
cwebp |
apt (webp) |
WebP encoding |
avifenc |
apt (libavif-bin) |
AVIF encoding |
yt-dlp |
apt | Video downloads (YouTube, generic) |
gallery-dl |
pip | Gallery/profile bulk downloads (IG, Twitter, etc.) |
Pillow |
pip | Image resize / format conversion |
ffmpeg, imagemagick, exiftool |
apt (pre-existing) | Media handling + EXIF strip |
Venv path: /home/justinwieb/.forge-venvs/assets/bin/python. Scripts auto-resolve it via FORGE_VENV env var or a sensible default.
Scripts (scripts/assets/)¶
| Script | What it does |
|---|---|
forge_assets_grab.py <url> |
Playwright-powered page scraper. Extracts <img>/<video>/<source> plus direct image-URL anchors. Downloads through the browser context so cookies/referer are honored. Writes provenance.json. |
forge_assets_search.py <query> |
Image search via Google / Bing / DuckDuckGo (Playwright, no API key). Also supports Unsplash / Pexels APIs if keys are in env. |
forge_assets_optimize.py <path> |
WebP + AVIF copies at responsive widths (default 320/640/1280/1920), strips EXIF by default, emits optimize.json. |
forge_assets_catalog.py |
Scans /mnt/workspace/Assets + forge/assets, emits forge/data/assets-catalog.json with dimensions, sha256, tags, and source URLs. |
run |
Thin bash wrapper, scripts/forge_assets_run.sh {grab|search|optimize|catalog} … dispatches to the venv. |
lib/provenance.py |
Helper that writes provenance.json, merges across calls so you can append to an existing grab dir. |
Provenance (not a license gate, just a trace)¶
Every script that downloads files writes a provenance.json in the output dir:
{
"generated_at": "2026-04-22T01:30:40-0500",
"items": [
{
"filename": "000_downtown-austin.jpg",
"source_url": "https://upload.wikimedia.org/.../Downtown_Austin.jpg",
"page_url": "https://en.wikipedia.org/wiki/Austin,_Texas",
"engine": "grab",
"query": "",
"referer": "https://en.wikipedia.org/wiki/Austin,_Texas",
"content_type": "image/jpeg",
"bytes": 103514,
"sha256": "…",
"saved_at": "2026-04-22T01:30:40-0500",
"extra": {"alt": "…", "kind": "image"}
}
]
}
This is purely an organizational record (where did this come from, when). No license check, no gating, pull anything.
Where assets land by default¶
| Source | Default path |
|---|---|
forge_assets_grab.py <url> |
/mnt/workspace/Assets/Web-Grabs/<date>_<host>/ |
forge_assets_search.py <query> |
/mnt/workspace/Assets/Web-Grabs/<date>_<query>/ |
| Brand assets (manual) | /mnt/workspace/<Brand>/Brand-Assets/ (existing convention) |
| Optimized outputs for a specific page | forge/sites/<site>/<page>/assets/ (deploy-ready) |
Override with --out DIR on any script.
Common recipes¶
Grab images from any URL¶
Google Images search¶
scripts/forge_assets_run.sh search "neon retro grid" --engine google --count 15 \
--out /mnt/workspace/Assets/Web-Grabs/2026-04-22_neon
Unsplash (clean API if key set)¶
export UNSPLASH_ACCESS_KEY=...
scripts/forge_assets_run.sh search "coffee beans" --engine unsplash --count 8
Optimize a directory of grabs into deploy-ready assets¶
scripts/forge_assets_run.sh optimize /mnt/workspace/Assets/Web-Grabs/2026-04-22_neon \
--out sites/justinsforge.com/neon/assets --widths 640 1280 1920
Bulk social-media grabs¶
~/.forge-venvs/assets/bin/gallery-dl "https://www.instagram.com/some_account/"
# output goes to ./gallery-dl/ by default; configure via ~/.config/gallery-dl/config.json
Video downloads¶
Catalog¶
Auth / cookies¶
For authenticated fetches (Shopify, Adobe Stock, wherever Justin has a login), drop a persistent Chromium profile at ~/.forge-venvs/assets/browser-profile/ and reuse it:
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
ctx = p.chromium.launch_persistent_context(
"/home/justinwieb/.forge-venvs/assets/browser-profile/",
headless=False, # first time: log in manually over VNC/X forwarding
)
After first login, run headless, the profile keeps cookies. Never commit the profile directory to git (already outside the repo).
Useful API keys (optional)¶
All optional, scripts work without them via Playwright scraping.
| Env var | Service | Notes |
|---|---|---|
UNSPLASH_ACCESS_KEY |
Unsplash | Free tier: 50 req/hr |
PEXELS_API_KEY |
Pexels | Free tier: ~200 req/hr |
Security posture¶
- No browser service binds to the public internet. Everything runs local/CLI.
- Session cookies live in
~/.forge-venvs/assets/browser-profile/if you set one up, outside the git repo. - All downloads go through the browser's request context with the originating page as Referer.
- Default rate limit: 250ms between downloads inside a single grab run.