Show HN: Turn native language audio into flashcards and shadowing practice

33•alder•4h ago

Here is a tool I built initially for myself to help with my German and Greek language studies. It started as a hack for creating Anki cards from native language audio. It extracts the words, finds their base forms (lemmas) and groups the examples by the lemma. At some point I realised that I have a transcription with word level timestamps that opens a lot of other opportunities. So I added a mode to click the first and last word in the transcript and it starts looping with the right gap and repeat count.

Another feature I use a lot is selecting an audio fragment, sending a predefined prompt to an AI to "explain grammar" or "explain nuances of meaning" and I still experimenting with prompts.

And because shadowing is so easy I also use it as a player to improve my English pronunciation. (I am not a native English speaker.)

I made a quick video showing the workflow for creating Anki cards and shadowing: https://youtu.be/TaR58uuDBvU?si=o5aGLAi2S-BZ7Zy9

The app supports 15 input languages (Japanese and Chinese are the latest experimental additions), and more than 30 output languages.

I would really appreciate it if you could try it https://lingochunk.com/try. I know there are other tools with similar functionality but I created something that fits my workflow and it is fun to build.

Also I struggled to find public domain audio for the try page. I'd be grateful if anyone could point me to public domain sources (I used LibriVox, Wikimedia and FSI courses), or if you're a creator, let me feature some of your own recordings with credits and links.

Comments

3stacks•1h ago

This is awesome! I’ll be lurking for new data sources. I’m working on a self-hosted language app more focused around cloze and sentence mining into Anki. I love seeing more stuff happening in this space

alder•1h ago

Thanks! I am glad you like it! I essentially mine the source audio, and all examples have cloze style gaps (blurring, in my case) that are revealed on the back of the card. I also beep the word in the sentence when you try to play it on the front card in built-in SRS system. Unfortunately that is not implemented in the Anki export, but it is technically possible.

__float•1h ago

I don't know what resolution or display you built this on, but a heads up the initial impression on my 4K monitor is that everything is incredibly tiny.

alder•1h ago

To be honest I haven't tested it on a 4K monitor yet, so I am not surprised. There are two controls above the transcript that change the font size and the line spacing, which should help a bit for now. Something to fix, thanks!

hiAndrewQuinn•1h ago

Very nice work. I'm going for a different thing, but my audio2anki tool [1] is about as streamlined as I could make it to turn a YouTube URL I want to learn into a stack of Anki flashcards, purely locally.

[1]: https://github.com/hiAndrewQuinn/audio2anki

jrrv•1h ago

Is it possible to add traditional characters for mandarin?

Also the pinyin for 誰/谁 is coming through as shuí, whilst this character has two pronounciations, I believe shéi is the more common one.

alder•46m ago

Thanks! Chinese and Japanese as source languages are still experimental, I did my best to support them but I have to rely on people who actually know the language and this kind of feedback is really useful. I'll look into adding traditional characters and fixing the pinyin.

jrrv•40m ago

No worries, I appreciate the effort. I did go back and listen and they are indeed pronouncing sheí in the audio too.

I use a firefox extension to convert simplified to traditional, looks like it's open source so that may be of some use to you: https://github.com/tongwentang/tongwentang-extension.

Although there are some clashes that it does not handle, e.g. 隻 and 只 are both 只 in simplified, you just have to know which one it is from context, but the extension fails to convert to 隻 where appropriate.

Koaisu•1h ago

Just tried it with an unsupported language and it still worked I set it to Chinese and inputted the audio. Still got correct results.

dirteater_•57m ago

What are you doing for Chinese word segmentation/pinyin?

alder•15m ago

For segmentation and POS I rely on spaCy zh_core_web_sm, pinyin from pypinyin library. Also the small correction level on top. But I am not a Chinese language expert to judge if it really works and I'll rely on feedback from the users to improve it.

jcg591•50m ago

Very cool! I'm also learning Greek and it's amazing how many resources are becoming available.

alder•34m ago

Thanks! Yes, it's getting better for Greek but still not on par with other languages. I completed the only 2 Greek levels on Duolingo and they are really boring compared to the German one I am doing now. Easy Greek is a bit above my level, and the number of YouTubers in Greek is tiny compared to German.

pzagor2•25m ago

I also built a tool to help me study Spanish. I really like the idea of shadowing, so I built a tool that lets you take any YouTube video and generate a sentence-by-sentence exercise to help you repeat the speaker's phrases.

https://talkhabit.com/shadow Or example, of one exercise: https://talkhabit.com/shadow?videoUrl=https%3A%2F%2Fwww.yout...

Stuff I need to work on: - It only works with videos that have auto-generated captions - It works best with monologue videos

deaton•13m ago

This is really cool, just as I'm starting to get towards the back end of the Kaishi 1.5k deck so this will be perfect for my Japanese studies. Thanks for sharing.

Show HN: I made Google Trends for Hacker News by indexing 18 years of comments

You can't unit test for taste

Zig's New BitCast Semantics and LLVM Back End Improvements

Ford rehires 350 engineers after AI fails to preserve expertise or train juniors

Half-Life 2 in a Browser

Show HN: Turn native language audio into flashcards and shadowing practice

Anthropic says Alibaba illicitly extracted Claude AI model capabilities

LastPass notifies users of yet another data breach

Ask HN: What surprised you about Estonia e-Residency and running an Estonian OÜ?

Puzzling Success of Overparameterization: Lottery Tickets or Escape Dimensions?

OpenAI unveils its first custom chip, built by Broadcom

Wikipedia Workers in Britain set global first by seeking union recognition

Cloudflare launched self-managed OAuth for all

Blogging can just be stating the obvious

Lianda and the Long March

Bohemia Interactive: Cold War Assault Remastered Source Code on GitHub

LuaJIT 3.0 proposed syntax extensions

45°C cooling design cuts data center water use to near zero

Medical students are using popular research tool to pump out misleading studies

SoftBank 2026 AGM [pdf]

GLM-5.2 is a step change for open agents

Show HN: Secs-man, a secrets manager you can (not) rely on

Show HN: StartupsBR – A map of Brazilian startups

Dostoyevsky isn't difficult

Lies, Damn Lies and Database Benchmarks

RubyLLM: A Ruby framework for all major AI providers

Words, Words, Words

Qualcomm to Acquire Modular

PR spam today looks like email spam in the early 2000s

Countries are competing to see which can carry out mass surveillance the best

Show HN: Turn native language audio into flashcards and shadowing practice

Comments

Show HN: I made Google Trends for Hacker News by indexing 18 years of comments

You can't unit test for taste

Zig's New BitCast Semantics and LLVM Back End Improvements

Ford rehires 350 engineers after AI fails to preserve expertise or train juniors

Half-Life 2 in a Browser

Show HN: Turn native language audio into flashcards and shadowing practice

Anthropic says Alibaba illicitly extracted Claude AI model capabilities

LastPass notifies users of yet another data breach

Ask HN: What surprised you about Estonia e-Residency and running an Estonian OÜ?

Puzzling Success of Overparameterization: Lottery Tickets or Escape Dimensions?

OpenAI unveils its first custom chip, built by Broadcom

Wikipedia Workers in Britain set global first by seeking union recognition

Cloudflare launched self-managed OAuth for all

Blogging can just be stating the obvious

Lianda and the Long March

Bohemia Interactive: Cold War Assault Remastered Source Code on GitHub

LuaJIT 3.0 proposed syntax extensions

45°C cooling design cuts data center water use to near zero

Medical students are using popular research tool to pump out misleading studies

SoftBank 2026 AGM [pdf]

GLM-5.2 is a step change for open agents

Show HN: Secs-man, a secrets manager you can (not) rely on

Show HN: StartupsBR – A map of Brazilian startups

Dostoyevsky isn't difficult

Lies, Damn Lies and Database Benchmarks

RubyLLM: A Ruby framework for all major AI providers

Words, Words, Words

Qualcomm to Acquire Modular

PR spam today looks like email spam in the early 2000s

Countries are competing to see which can carry out mass surveillance the best