The basic flow is: place SVG characters on a scene, write dialogue, pick voices, and render to MP4. It handles word timestamps, mouth cues, and lip-sync automatically.
This started as me playing around with Jellypod's Speech SDK and HeyGen's HyperFrames. I wanted a small tool that could go from script to video without a big animation pipeline and next thing I knew I was trying to create my own South Park style show and here we are. :D
A few details:
- desktop app built with Electron
- supports multiple TTS providers through Jellypod's Speech SDK
- renders via HyperFrames
- lets you upload or generate characters and backdrop scenes
- includes default characters/scenes so you can try it quickly
- open source
It runs from source today. AI features use bring-your-own API keys, but the app itself is fully inspectable and local-first in the sense that there’s no hosted backend or telemetry.
Here are some fun examples of the the types of videos you can create:
https://x.com/deepwhitman/status/2046425875789631701
https://x.com/deepwhitman/status/2047040471579697512
And the repo:
https://github.com/Jellypod-Inc/cartoon-studio
Happy to answer questions and appreciate any feedback!