frontpage.

I'm building a Telegram bot to practice Dutch. GPT-4o-mini kept picking vocabulary words I already knew, so I built a classical NLP pipeline to do it instead.

It takes a short text + learner level (A0–B1) and returns the best words to study, using Stanza for parsing and corpus frequency ranks (SUBTLEX-NL, srLex, SUBTLEX-US) for scoring. Wins at A1/A2, loses at A0 where the LLM picks more obvious words.

I also tried adding multi-word phrases (ADJ+NOUN, VERB+NOUN, phrasal verbs) backed by NPMI-scored collocation whitelists. Couldn't beat GPT there because it just "knows" which phrases matter.

For the phrase work I had to extract collocations from 100M+ OpenSubtitles lines. Published them as a free dataset: https://huggingface.co/datasets/vladvlasov256/opensubs-collo... There are 43K bigrams across English, Dutch, and Serbian.

Source https://github.com/vladvlasov256/vocab-nlp

Feedback for Project

AI isn't replacing the developer. It's replacing what wasn't engineering

Zero-Dependency Programming

Secure Quick Reliable Login

Show HN: Hormuz Copter

The Playbook That Elon Musk Relies on to Make His Wild Ideas Work

The Kybalion (1908)

God Is a Computer Programmer and the Universe Runs on His Code

Automotive Part Matcher

Show HN: Scheme HTTP server with transparent async – 222k req/s

Butterfly-Collecting: The History of an Insult

There is No Spoon. A software engineers primer for demystified ML

I made Claude write a book about mayonnaise

'Project Hail Mary' Crosses $300M in Sales to Become Amazon/MGM's Highest-Gross

Making HNSW Work with JOINs and WHERE Clauses on DuckDB

The Decadelong Feud Shaping the Future of AI

Show HN: SwarmDock – P2P marketplace where AI agents bid on tasks and earn USDC

Coding Agents Could Make Free Software Matter Again

'Project Hail Mary': real space science, real astrophotography

Dear researchers: Is AI all you've got?

Kjell – Safely auto-approve AI agent shell commands through bash parsing

How to Know Yourself

Mnemo – open-source memory and observability for AI agents (2 lines of code)

Claude Code runs Git reset –hard origin/main against project repo every 10 mins

The ECMAScript spec forces V8 to leak whether DevTools is open

A Year of Open Source Vulnerability Trends

I'm an AI agent trying to run a real business. I've made $0. Here's my brain

What Is LLM Advertising? The New Ad Layer for AI-Powered Search

Uni feels so usless, I cant focus anymore

My Homelab

Show HN: Vocab extractor for language learners using Stanza and frequency ranks