It supports all three Qwen3-TTS generation modes: - Voice Cloning — Clone any voice from a 3+ second audio sample - Voice Design — Generate voices from text descriptions (e.g., "a warm, elderly British woman speaking slowly") - Preset Voices — 9 built-in speakers with emotion/style control via an instruct parameter
Basic usage looks like:
from manim import * from manim_voiceover import VoiceoverScene from manim_voiceover_qwen3_tts import Qwen3PresetVoiceService class MyScene(VoiceoverScene): def construct(self): self.set_speech_service(Qwen3PresetVoiceService(speaker="Ryan")) with self.voiceover(text="Let's draw a circle") as tracker: self.play(Create(Circle()), run_time=tracker.duration)
Runs locally, ~4GB VRAM for the 1.7B models (0.6B models available for lighter setups). Supports 10 languages. Audio is cached so re-renders are fast.
GitHub: https://github.com/DurhamSmith/manim-voiceover-qwen3-tts Happy to answer any questions.