Self-hosted orchestration for channel content. Local inference, procedural
assembly, human review at every gate.
A self-hosted system that takes a topic or a live news source and returns a
finished After Effects project for a human to review and publish -
narration, stills, video, music, timeline. Generation runs locally: ComfyUI
for image, video, TTS, and recognition; local models for prompting and
metadata. Nothing leaves the machine unless it has to.
Two spreadsheets run it. One is a per-channel DSL - voices, visual style,
LoRAs, music, timeline parameters - so a new channel is a new row, not new
code. The other is a state machine tracking where every video sits across
stages, so scripting, production, and upload batch independently with a
human QA gate between each. The timeline is assembled procedurally in
Houdini, not cut by hand: movement, transitions, and overlays fall out of
channel parameters as an After Effects script.
The diagram shows the system at two levels. Overview is the
four-stage backbone; Detailed expands each block with its
steps and stack.
Channels, config DSL
One spreadsheet drives every block: narration, voices, visual style, LoRAs, music, Houdini settings.
01Script Writing
News or non-news branch routes through an agent chain. Output is script JSON and a new row in the tracking sheet.
n8nLLMnewsapi.org
Awaiting review
02Production
Audio, stills, recognition, video, cull and fix, timeline, AE script.
n8nComfyUILLMWhisperFFMPEGHoudini
Ae Timeline Ready.
03Manual Validation
Run generated JS in After Effects, review composition, sign off, final render.
After Effectshuman
Ready for upload
04
Uploadseparate workflow
Publish all approved videos
n8nYouTube API
Channels, config DSL
Drives every block of the pipeline.
Identity
Channel ID, Name, Language. Channel Info is a system prompt injected into every LLM step where channel character matters.
Script behaviour
Narration style, News channel, Podcast. Switches Stage 1 between news-pull, podcast, and free-topic modes.
Stills
Stills provider, Visual style, Image prompt suffix, Core LoRAs, Animation style. Which ComfyUI workflow runs, what style is prepended or appended, which LoRAs load.
Cover thumbs
Cover provider, Cover style, Cover prompt suffix, Cover LoRAs. Same controls, separate generation path for YouTube thumbnails.
Voice and audio
Voice TTS, Voices, Music folder, Audio settings. TTS engine, voice characters used, music pool, ComfyUI-side audio settings (voice seed, etc.).
Non-news routes to Non-fiction, Fiction, or Podcast.channel-driven
Short or Long form selector.channel-driven
Agent chain: Planner, Writer, Editor.
Write script JSON to dataset.
Append title and metadata to a tracking-sheet row.Sheets
Awaiting review
02Production
Create dataset folders, place music, prepare config JSON.
YouTube metadata: titles, descriptions, tags, categories.Cloud, Local LLM
Cover thumbnail prompts.Local LLM
TTS from script using channel's Voices.ComfyUI
Whisper transcription for precise timings.ComfyUI
Audio mix: voice plus random-shuffled music master.FFMPEG
Block splitter turns text into scene blocks by time.
Still prompts per block.
Stills and Cover thumbnails (xN).ComfyUI
Image recognition pass over every still.ComfyUI, Local LLM
Video prompts built from recognition (not intent).
Video generation.ComfyUI
Upscale (optional).
Cull bad shots.manual, opt-in
Smart Fix fills empty slots from remaining shots.
Timeline assembly produces AE JS script.Houdini
Ae Timeline Ready.
03Manual Validation
Run JS in After Effects. Composition assembled.
Human review and sign off.
Final render.
Ready for upload
04
Uploadseparate workflow
Scan the tracking sheet for approved videos.
Publish approved videos to YouTube.n8n, YouTube API
Worked example
One episode end to end. Breakfast World on a micro-world
channel where tiny civilizations live inside everyday food.
01 Script
A topic seed enters the agent chain (Planner, Writer, Editor) and
comes out a script.
An educational survey of the inhabited world that is served, each
morning, upon a single breakfast tray - a geographical tour of its
regions, peoples, and customs, from the Pancake Plateaus to the
Bacon Ridges, recorded in the manner of a 1950s field study by
observers who know the world will be gone by nine o'clock.
02 Audio
TTS voiceover mixed with a shuffled music bed into one master track
that drives the timeline.
master mix, voiceover plus music bed (excerpt)
03 Covers
Five thumbnail options generated from the script; one chosen.
04 Stills
A still is generated per block, then described; the description
drives its video prompt. Click any frame to play its clip.
▶
▶
▶
▶
05 Result
Composition assembled in After Effects, reviewed, rendered, and
published as Breakfast World.