Educational video pipeline

Textbook2Video

A PDF-or-topic to narrated explainer video system that turns source material into outlines, storyboards, generated Manim scenes, voiceover, and final MP4 outputs.

Problem

Why this mattered

Dense textbooks and technical material are hard to turn into concise video lessons. Manual video production is slow, while one-shot generated video is hard to edit, validate, and reproduce.

Build

What shipped

  • A pipeline that renders PDF pages, runs OCR, produces markdown, summarizes pages, and selects relevant spans under a token budget.
  • Agent stages for outline, storyboard, Manim code generation, validation, manuscript creation, and narration.
  • Deterministic programmatic animation so code remains the editable source of truth for diagrams, charts, callouts, and math scenes.
  • Section rendering, voiceover generation, clip merging, and final MP4 assembly.
  • A monorepo with FastAPI packaging, a React/Tailwind console, Docker/Caddy deployment assets, and CLI/UI entry points.

Stack

Tools and systems

PythonFastAPIReactTailwind CSSManimffmpegPopplerOCRTTSLLM provider adapters

Decisions

Technical choices

  • Use generated code instead of screen capture so individual scenes can be repaired and regenerated.
  • Keep OCR, LLM, and TTS providers behind adapters so model choices can change without rewriting the pipeline.
  • Store per-run artifacts in case folders for debugging and reproducibility.
  • Let generated scene code call chart and diagram helpers instead of inventing imports ad hoc.

Outcome

Proof surface

The public repo, demo video, and app surface show a working AI build pipeline rather than a prompt-only concept.