Hi HN,
I built this tool to solve a specific frustration with current motion control models (like Kling): they are extremely picky about input images. If the subject isn't perfectly framed or standing in a standard T/A-pose, the generation usually fails. This makes it particularly hard to create videos for pets or casual selfies.
I built a pipeline combining Kling Motion Control and NanoBanana Pro to bridge this gap.
The core logic acts as a middleware that:
1. Automatically out-paints and expands cropped images (e.g., selfies) to fit the required aspect ratio.
2. "Rigs" difficult subjects (especially cats/dogs) into a structure that the motion model can interpret, effectively mapping human dance logic onto non-human agents.
3. Wraps this in a template system so users don't need complex prompting.
The goal was to make the input robust enough that you can throw almost any "imperfect" photo at it and get a coherent dance video.
It's live at https://aibabydance.com – would love any feedback on how it handles your edge-case photos!
JustinXie•1h ago