This is really cool. Voice cloning + translation in one pipeline is something a lot of content creators would pay for right now. Especially for YouTube dubbing where you want to keep the original personality of the speaker.
Are you handling the speech-to-text, translation, and voice synthesis as separate steps or is it more of an end-to-end model? Curious how you deal with things like pacing and intonation that don't always carry over between languages.
andberx•11m ago
Are you handling the speech-to-text, translation, and voice synthesis as separate steps or is it more of an end-to-end model? Curious how you deal with things like pacing and intonation that don't always carry over between languages.