As is, I jumped out of the funnel already.
2. Make it available for all (major) languages.
3. Profit!
Also, integrate with captcha + rate limit to prevent abuse. Authentication alone is not strong enough deterrent to a motivated adversary.
I could see using this on a mobile/tablet app.
Also, pricing is not mentioned anywhere on the start page or before signup
I'm the target audience for this app, but I don't really want to generate my own stories; I want to read. Generating stories is just additional work for me, and I don't know how good the story is before I generate and read it. If I had access to a library of human-reviewed stories, I'd just read those, knowing that they are at least okay. You wouldn't even have to review them yourself; you could have a system for users to rate stories.
You're selling this as an AI product, but I don't care about AI. I care about learning a new language by reading stories.
Then subscribers could like/upvote Stories when they like them.
- You could look at running a locally hosted model. There are some good story writing ones, albeit unsure for languages.
- Help visitors generate example stories for language pairs on your website ... if a language pair already has one, maybe show a pre-existing one.
.. if it's a new language pair being tested, inform the user it may be shared with others to let them see how the system works?
There has to be user value to login or nobody will do it.
To calibrate the content to your reading level, rather than generating the content, it tracks your comprehension and shows you how much of a given webpage or book you already understand.
It has optional Anki integration if you don't want to use the built-in ones. I work on this full-time now and am about to launch a manga reading mode, plus Netflix caption lookups.
A couple of suggestions: - I'm learning Hebrew and I'm at the beginner stage, so it would be good to have niqqud. Even with the STT, it's helpful at this stage. - For the STT, every time I tried it, it just said something that sounded like "Dodd."
1.) a "long" story is still only like 20 sentences 2.) a real translation for each sentence is nice but often you are still left wondering what each word means. a "word by word" literal translation would be more useful either as an option or additionally. or the ability to click on any word and see the translation (bonus points for the declinations / conjugations too)
But, please focus on 3 or 4 languages. Do those well.
Asian languages ( apparently Cantonese isn’t on the list ) are very hard to machine translate.
As is this is just Chat GPT + AWS Translate + AWS Text To Speech. Along with Firebase for user management and a very nice UX front end.
To turn this into a product I’d select maybe 3 languages, French , Spanish, German and hire advisors for all 3. Work on creating a few stories edited by your advisors and add basic gamification/quizzes.
I like the idea though
So you pass in two texts and get back some form of aligned text. If you have some knowledge of the language you are trying to learn and are ok without a perfect sentence-to-sentence alignment, then this would work.
My motivation is to improve my wife's and my knowledge of each other's languages while reading books to our daughter.
For a moment I got excited that someone else had already built it :)
No idea how to order my food, but if i ever need to find my son, who's a clownfish, and was kidnapped, and i need to do it in italian, i am ready.
(I'm trying to start with a positive tone, since I have only negative things to say about the site itself. I want to make sure that I'm coming across a critical without coming across as mean.)
I spent a few minutes generating a couple of sample stories using their prompts for the pair that I'm most qualified to evaluate “English”→“Chinese (Traditional)” and just wasn't very impressed. Honestly, I think the approach is largely a dead-end.
Let's set aside that “Chinese (Traditional)” is not a language, and that someone with experience learning or teaching Chinese ought to know this (and, as I will argue, knowing this is critical to producing high-quality educational materials!) That the creators of this tool aren't particularly familiar with the languages themselves is probably much less consequential than that they don't really appear to be familiar with the pedagogy of teaching or learning languages.
One would anticipate that the languages that most learners want to learn are subject to broad market forces, and that, as a consequence, these languages already have a variety of high-quality, human-written primary texts and educational texts (many of which may even be free-to-access!) For the language pair I tested, this is definitely true, and I would encourage every learner to start with those materials (and to avoid anything AI-generated.)
(Of course, if I wanted to learn a less-common language where materials are hard to find this might be marginally useful—e.g., Telugu probably has more total speakers than Italian, but my local high school probably has an Italian class—but I would wonder whether the training set would be good enough to accurately reproduce the language. I suppose if I wanted to learn an endangered language, where they may simply not be enough native speakers to maintain a rich catalogue of written language, then someone could train an AI to reproduce this language to aid in learning, but a similar question arises as to whether this kind of preservation or reconstruction is sufficiently “faithful.”)
It's absolutely the case that AI tools are at a point where (for common languages) they are able to reliably generate grammatically accurate language, independent of its factual accuracy. Indeed, while I could spot fluency issues in the sample stories I reviewed (since, of course, “Chinese (Traditional)” is not a language,) I could not spot outright grammatical errors. (This is an impressive accomplishment for AI models!)
But this is really a solution looking for a problem (and, in my opinion, finding the most obvious but also least useful.)
Contrast these randomly generated story with the equivalent from a human-generated educational resource. In the case of a human-generated educational resource, the quality of language may actually be worse than than that in the AI generated resource (even in the face of sloppy AI writing tends to be!) In fact, in the case of Chinese (“Traditional” or otherwise,) this is absolutely guaranteed to be the case for an introductory text. Almost all introductory texts will be written in a very choppy, repetitive style: e.g., 「那隻狗很可愛。我養的狗也很可愛。」
(It's likely the case that even intermediate and advanced learning materials will not resemble actual primary texts. e.g., I was reading the news the other day and came across the sentence 「北捷重申,無論任何年齡,各車站閘門前的黃色標線內一律禁止喝水等飲食行為,除非是身體不適或母乳哺育」 which is perfectly appropriate for an intermediate learner… except 「閘門」 is simply not useful or appropriate textbook vocabulary!)
So why is the human-generated educational material better? Well, there's a lot of design to writing these kinds of materials. How do we teach and reïterate the most broadly useful grammatical structures and vocabulary? How do we teach this in a way that maximises retention? (And, often, how do we expose the learner to useful cultural background that will help them when they visit a region where the language is spoken?)
All of this is visible in human-generated materials, yet none of this is evident in these AI-generated materials. It is, in fact, this design that makes these materials useful in the first place. In the absence of it, we end up with vocabulary lists that define 「狗:dog」 next to 「呈現:to emerge」 where a human educator would align the difficulty of these terms to the order and process in which a human learner would learn them. Similarly, a human educator knows how to evolve a student's fluency with language and understanding of tone and register, taking them from 「媽媽: mother」 to 「母親: mother」 perhaps even strategically including 「媽咪: mommy」 or even 「阿母 a-bú: mother (台)」 to engage the student. (Real educators do this very often, and students tend to really like it when they get “fun fact”-style local flavour!) I have not seen anyone attempt to introduce any of this design into AI-generated learning materials, and I suspect this is why they always come across as being so bland and mushy. Instead, the AI-generated materials are creating only rote practice items (which is why their prompts typically include things like “limit the generated text to use only vocabulary as published in the prep materials for such-and-such language proficiency exam.”) This kind of practice is, indeed, useful, but it's debatable whether it's measurably more useful than just spaced-repetition with flashcards.
Now, contrast these materials with primary texts (i.e., written language artefacts produced for an audience of native speakers.) Primary texts are often very difficult to incorporate into language learning, especially for languages like Chinese. This is probably because at the introductory level, the materials simply aren't dense enough for an adult learner, and at the advanced level, probably because these materials are far too challenging given the amount of specialised terminology and vocabulary used. (There are, in fact, very appropriate materials that sit between these extremes, such as news magazines or short stories written for middle schoolers, but these materials can be hard to access.)
The benefit of the primary text is that it is very close to the actual goal of the learner: I really don't want to read a story about a lost dog, and I only do it, because with enough practice reading such drivel, I might eventually read ‘Dream of the Red Mansion’ or ‘Red Sorghum.’ As a consequence, what most learners will reach for are “graded readers” which are adaptations of well-known works with simplified language and grammar. I'm on the fence with how well AI can create these for us. On the one hand, there is a pedagogical and creative dimension to producing a good graded reader. The former may be possible to approximate with additional prompting (“use only vocabulary from this list; use only grammatical structures familiar to a learner at this tested level,”) but I'm not sure about the latter. The reader is probably losing a lot when we simplify Gandalf to ‘Run away now!’
So while I'm quite hopeful that AI technologies can improve language learning, this kind of tool just doesn't seem to add anything to what already exists and is already much better.
The approach is just too obvious. I think it's too focused on finding a way to adapt something we know that AI can do well (generate grammatically correct text) to something we want to be able to do more cheaply or effectively (teach language learners how to read) without really considering how to solve this problem.
What would really help is a way to find stories at a pre-defined learner level in a particular language.
Ie you say you're studying Japanese, your learner level is such and such, give me an example of an article at that level (from a blog, news article, anywhere).
Regardless of good LLMs are rn it's still a slop if not verified by human speaker. Ability to find texts in open access written by human beings would be so much more helpful.
Of course a lot of them are not REALLY written by humans anymore but one problem at a time :)
Also, I’m using it in the phone and don’t see a way to get the translation of just one word since I can’t hover.
And my preference in general would be for a more literal, word-for-word translation so I can learn what each individual word means.
celltalk•9mo ago
That’s when I remembered those books with one language on one page and the translation on the opposite page. Inspired by that concept, I thought, why not use AI to create something similar, but even more interactive?
So, we built DuoBook.
Here's how it works:
1) Start writing your story in your language.
2) Select the language you want to learn.
3) AI helps complete the story, side-by-side with your native language.
It’s still early days, and it might not be perfect, but it's genuinely helping us—and we hope it helps you too!
Check it out: duobook.co
WalterGR•9mo ago
One term for this is "Parallel Text" (https://en.wikipedia.org/wiki/Parallel_text).
Cool idea! I've been thinking about learning German - I'll have to give this a try.
celltalk•9mo ago
gus_massa•9mo ago
Does it highlight the matching words?
celltalk•9mo ago
And, we don’t have language alignment yet. Still, it highlights some of the “hard” words.
freddie_mercury•9mo ago
celltalk•9mo ago
gus_massa•9mo ago
Some comments about the example:
It's weird that the Spanish version is complete and the English version appears sentence by sentence. I'd prefer o see both.
Highlighting by sentence is fine. Matching word would be better, but I can imagine it's a nightmare to implement.
Why does it read aloud only the Spanish version? I'd like to hear any of them with a click.
celltalk•9mo ago
vunderba•9mo ago
celltalk•9mo ago
PaulRobinson•9mo ago
Go get something out of Project Gutenberg that is popular, and not too long.
Translate it into 3-4 major languages. Give those away for free.
Even better, test your translations against actual public domain translations if they exist, and be transparent about WER for each language.
If you want this to be a business, you’re going to have to do some serious business-like things.
freddie_mercury•9mo ago
PaulRobinson•9mo ago
Translation models are getting better all the time - it's a weird artefact of transformer architectures that got missed in the GenAI hype, that they're pretty great at translation, especially across languages with smaller training corpuses - but you should definitely know if the text you're reading is only likely to be 90% "correctly" translated.
trinix912•9mo ago
Especially this. I often come across AI products that claim to do well in 100+ languages, show some really good DE/FR/SP/RU examples, then I try it with my language (Slovene) and am just disappointed. If you claim to support all those languages, please have a sample result in all of them. Even if they aren't all equally good, it comes across as more genuine than making bold claims that anyone who speaks a language with < 10 million speakers knows likely aren't true.
PaulRobinson•9mo ago
By far the biggest factor seems to be the amount of translated material in that language, followed by how "obvious" the rules of the language are making it easy to decompose algorithmically.
roel_v•9mo ago
wingerlang•9mo ago
celltalk•9mo ago
https://imgur.com/a/nTbBgdJ
atoav•9mo ago
- uses words and phrases that are either of rare historic origin or completely made up new ones
- verb forms so uncommon that verb form she used frequently (the Infliktiv) has a second inofficial name: the Erikativ
- she frequently borrowed from the biggest writers and poets in the German language in her translations
- for the younger figures there is an entirely made up youth slang that is both appealing and incredibly entertaining to read
The english originals are utterly boring to read in comparison. Her work has a literaric and entertainer quality of the kind that made generations realize there is no real border between serious high brow literature and comics.
benatkin•9mo ago
Edit: Year of Linux on the Desktop is another.
Edit 2: For Year of Linux on the Desktop it did 2024 as that year. Might want to add the current date to the prompt and say that to have stuff imagined in the future be after that. Another thought is to have the LLM suggest a story prompt for you.
celltalk•9mo ago
benatkin•9mo ago
yurishimo•9mo ago
The "long" story is not that long. I was expecting something closer to 1000 words or more. Using your Donald Duck example, I doubt the long story was more than 4 or 5 comic book pages. I started with the magic mushroom example, intermediate difficulty, and long length.
I also generated another story using the "lost puppy" prompt, but this time advanced difficulty and long. The difficulty was ramped up which I appreciate, but the length was even shorter than before!
The speech synthesis is garbage, but I'm sure you already know that. For a free service (for now), I understand why some limitations are in place, but it doesn't look good for your product unless you're targeting beginners. I'm not sure what you're offering is useful to anyone above that level. 3 stories per day for me is about 10-15 minutes on advanced (including the generation waiting time).
I wish you luck with the project! Imo, your time is best spent now optimizing your spend so you can provide higher quality audio to go alongside the text. I'm sure you will have plenty of content generated, but surfacing that to users without it feeling icky might also be a challenge.
bossyTeacher•9mo ago
celltalk•9mo ago
We were thinking about a pricing page with few pro options. For instance, longer text generation, access to better LLMs, or much better TTS etc.
bossyTeacher•9mo ago
celltalk•9mo ago