Are there any open weight models that do? Not talking about speech to text -> LLM -> text to speech btw I mean a real voice <-> language model.
edit:
It does support real-time conversation! Has anybody here gotten that to work on local hardware? I'm particularly curious if anybody has gotten to work with a non-nvidia setup.
Especially in the fruit pricing portion of the video for this model. Sounds completely normal but I can immediately tell it is ai. Maybe it's intonation or the overly stable rate of speech?
Maybe that's a good thing?
I think ChatGPT has the most lifelike speech with their voice models. They seem to have invested heavily in that area while other labs focused elsewhere.
On the video itself. Interesting, but "ideal" was pronounced wrong in German.
Not their fault frontier labs are letting their speech to speech offerings languish.
dvh•30m ago
iFire•27m ago
Weird, as someone not having a database of the web, I wouldn't be able to calculate either result.
iFire•27m ago
dvh•19m ago
esafak•19m ago
parineum•7m ago
OP provided a we link with the answer, aren't these models supposed to be trained on all of that data?
brookst•17m ago