Veena is a 3B parameter autoregressive transformer model based on the Llama architecture. It is designed to synthesize high-quality speech from text in Hindi and English, including code-mixed scenarios. The model outputs audio at a 24kHz sampling rate using the SNAC neural codec.
Dheemanthreddy•4h ago