It looks like the author created it using the "higher quality" ffmpeg command line, except for the "webm" final extension, producing the opposite of what's described as "an MP4 file that's compatible with more devices".
https://github.com/denizsafak/abogen/tree/main/demo#for-high...
The difference is that even weak LLMs are good at magically doing this, so I wonder what the problem is for the TTS mentioned above.
I think there are still basic hurdles to take before we can go epub to audiobook in a quality that can compete with current state of the art.
Or am I missing something?
It's probably possible with current systems to do though. I believe there are TTS systems that can use context/prompting to change emphasis and other speech qualities, though I'm not sure how reliably.
It's definitely an exited voice, but is it read out as in a battle or as in a romantic scene?
I actually hate this. I like quotes to be read with the tone and inflection implied by the context but I don't like the different voices.
I played a bit with Eleven labs voices and while they aren't bad when I tried make them read fragment of a text that I wrote, it sounded chaotic, boring, quite terrible, for anything longer than a sentence or two. But when I tried their v3 voices which they are currently in the process of rolling out, the same text sounded consistent, emotional, engaging, simply amazing. I think we are just crossing vocal uncanny valley.
As an aside, while this tool can be used to create an audiobook from a book you have in text format, for your private consumption, having an author employ something like this to create files for distribution is extremely risky, even if they acknowledge its use and intend those files to only be available on their website.
Indie authors struggle a lot to promote their works, and the new normal is that potential readers, the polite ones[^1], use the slightest hint of AI usage to discard their title and move on...as they are entitled to, since there are so many books.
I in particular have started to hire voice actors that have good acting skills and good diction but for whom English is their second language, or it's their first language but they speak something else at home; sometimes I even ask them to go a notch up with their accents. It helps with the non-AI recognition, and it also increases the appeal of the book for people who would like to try out something new. Once, I did an audition for a project and was pleasantly surprised with how much life people from around the Mediterranean basin were able to inject into their renderings, compared with people from Britain and North America.
[^1] Impolite readers set the town on fire, and then go about and spread that fire to neighboring towns, for good measure.
This is especially helpful when you’re on the go but still want to have a visual now and then or highlight text for later.
The problem is that many books don’t offer that feature. There is a built-in read function now in the kindle app, but it’s crap.
So, if you ask me, I’d prefer a good human-written book with an additional AI voice on top to enable that feature for me.
I'd love to hear any feedback you have. "prefer a good human-written book with an additional AI voice on top to enable that feature for me" is exactly what I prefer when it comes to reading.
I've been meaning to use its position sync protocol with KoReader, but it's not trivial.
Is that the new normal?
My impression is that when it comes to reading text, nobody cares as long as the final product is good.
People don't want AI-written books, but people have been comfortably listening to AI voices reading text for a long time now. Text-to-speech isn't really a controversial thing for listening to articles or books.
(Which is very different from voice acting, for example, which requires acting not just reading.)
> My impression is that when it comes to reading text, nobody cares as long as the final product is good.
Maybe so, but it only takes a vocal minority to ruin things for an author. And that minority tends to use anything, even a non-overtly-critical mention of AI in social media.
The consensus right now is that using LLMs for fixing grammar and typos is acceptable. I personally use them for word completion (specially the devil incarnate which are the prepositions on/in), but tend to discard suggestions that improve flow, sentence structure and readability, because those increase the odds of triggering "AI detectors". In fact, I've found a renewed taste for unconventional sentence structure and unconventional punctuation; things that three years ago, before the LLM boom, I really didn't care for.
https://github.com/nazdridoy/kokoro-tts
It generates a directory of audio files, along with a metadata file for ebook chapters
You have to use m4b-tool to stitch the audio files together into an audiobook and include the chapter metadata, but it works great:
https://github.com/sandreas/m4b-tool
I've been meaning to write a post on this workflow because it's incredibly useful
Regarding "abogen chunks the text it feeds to Kokoro by sentence", that's not quite correct, it actually splits subtitles by sentence, not the chunks sent to Kokoro.
This might be happening because the "Replace single newlines with spaces" option isn’t enabled. Some books require that setting to work correctly. Could you try enabling it and see if it fixes the issue?
I was running into issues with this one: https://theanarchistlibrary.org/library/kevin-carson-studies..., this one: https://files.libcom.org/files/Accelerate%20-%20Robin%20Mack... (converted to plain text using MinerU, double checked to make sure the text was clean).
> Regarding "abogen chunks the text it feeds to Kokoro by sentence", that's not quite correct, it actually splits subtitles by sentence, not the chunks sent to Kokoro.
Ah, that's odd. So I don't know why abogen'd be doing the weird fading out and skipping words thing then when my tool (https://github.com/alexispurslane/kokoro-audiobook-reliable/) isn't.
> This might be happening because the "Replace single newlines with spaces" option isn’t enabled. Some books require that setting to work correctly. Could you try enabling it and see if it fixes the issue?
I tried that, as well as doing it myself, and it didn't seem to help.
Otherwise, it's a nicely packaged GUI. Well done!
I tried a PDF and the UI to select pages or sections is good and generation is fast on my laptop's GTX 1650.
The result is an .ogg audio and .ass subtitle file. Played with mpv allows listening and reading along in the terminal. Only issue I have with the result is that visual line breaks from the PDF are preserved resulting in long pauses "randomly" in the middle of sentences. This greatly interrupts understanding of the audio.
Edit: enabling the skipping of single newlines helps!
I didn't have the newlines enabled though so it was pretty useless.
Enabling makes this pretty awesome.
af_heart is a great voice to me while af_jessica I find annoying. That is the main issue I have with audiobooks , the randomness of liking the voice actor or not almost matters as much as what the book says for me.
I knew this day was coming soon and I really am blown away. I have got so use to audiobooks that it is hard to actually sit and read a full book for me. I have about 20 books to convert that would never have a market to bother having someone read the book and in a voice I really like. Incredible.
Some narrators, like Wil Wheaton, are so entertaining to me I actively search by what they have voiced.
In general, I have to agree the narrator can make or break a series.
I'm with you on this, but my reaction is the opposite -- I'm wondering if there are some books I couldn't stand to listen to, that now I could with a nice neutral narration voice? Instead of the weird untrained voice with weird vocal tics that was the official narration?
I wonder, is there some open source NN that can consume PDF pages and produce a "pure prose" version of it. Say, a page with mixed text and an image of a car engine would be output to the text and then a detailed description of the image, or what it is depicting.
I agree that the project need not be renamed to remove the single syllable that may be an obscure slur, especially since every syllable may be an obscure slur in some language and you can't expect somebody to learn them all just to avoid them.
But there was no need to use that syllable as a slur.
Btw, Don't look into the name of a famous python formatter or you might be offended.
Go fight for chat control under the guise of caring about others. It's been really successful so far. "Free speech rant"...
I completely understand your concern, and I'm grateful that you pointed this out. It's clear that your comment comes from a place of wanting to help, and I really value community members like you who look out for potential issues.
The name was chosen purely based on the technical functionality (audiobook generation), and I had no awareness of the unfortunate similarity you've mentioned. As English is not my native language, I sometimes miss these cultural nuances that native speakers would naturally catch. I appreciate your understanding that this was entirely unintentional.
Thank you again for the thoughtful heads-up
One should stop asuming everyone is versed in all slangs (or slurs for that matter) existing in all languages in the world. The author seems to have a Turkish name, thus I will assume he is Turkish and so I would guess he didnt think much about the name.
So I imagine generated audiobooks to be good in that regard. Another option would be to have a "normalize volume" setting at audible, or other services.
nikolayasdf123•6mo ago
hulitu•6mo ago
hajimuz•6mo ago
throwup238•6mo ago
pyman•6mo ago
beboplifa•6mo ago