I’ve been experimenting with ACE-Step 1.5 lately and wanted to share a short summary of what actually helped me get more controllable and musical results, based on the official tutorial + hands-on testing.
The biggest realization: ACE-Step works best when you treat prompts as [structured inputs], not a single sentence (same as other LLMs)
1. Separate “Tags” from “Lyrics”
Instead of writing one long prompt, think in two layers:
Tags = global control
Use comma-separated keywords to define:
- genre / vibe (funk, pop, disco)
- tempo (112 bpm, up-tempo)
- instruments (slap bass, drum machine)
- vocal type (male vocals, clean, rhythmic)
- era / production feel (80s style, punchy, dry mix)
Being specific here matters a lot more than being poetic.
2. Use structured lyrics
Lyrics aren’t just text — section labels help a ton:
[intro]
[verse]
[chorus]
[bridge]
[outro]
Even very simple lines work better when the structure is clear. It pushes the model toward “song form” instead of a continuous loop.
3. Think rhythm, not prose
Short phrases, repetition, and percussive wording generate more stable results than long sentences. Treat vocals like part of the groove.
4. Iterate with small changes
If something feels off:
- tweak tags first (tempo / mood / instruments)
- then adjust one lyric section
No need to rewrite everything each run.
5. LoRA + prompt synergy
LoRAs help with style, but prompts still control:
- structure
- groove
- energy
Over-strong LoRA weights can easily push outputs into parody.
Overall, ACE-Step feels less like “text-to-music” and more like music-conditioned generation. Once you start thinking in tags + structure, results get much more predictable.
Curious how others here are prompting ACE-Step — especially for groove-based music.
DanielWen•2h ago
resource:https://github.com/ace-step/ACE-Step-1.5