FastLangML: FastLangML:Context‑aware lang detector for short conversational text

1•sachuin23•1h ago

Comments

sachuin23•1h ago

I have been working on a problem most language detection libraries quietly fail at: short, messy, conversational text. The kind you see in chat apps, support tickets, SMS, and mixed-language messages.

FastLangML is my attempt to fix that.

It is a multi-backend ensemble (FastText, Lingua, langdetect, pyCLD3, and others) with a voting layer built for real-world text. It handles:

Short messages with almost no statistical signal

Code switching like Hinglish or Spanglish

Slang, abbreviations, and emojis

Multi-turn conversations where context matters

Confusable languages like ES vs PT or NO vs DK vs SV

A few design choices:

Context-aware detection so you can pass conversation history and get more stable predictions

A hinting system for slang, abbreviations, and custom rules

Extensible backends so you can plug in your own detectors or voting logic

Optional persistence using Redis or disk for multi-turn conversations

Support for more than 170 languages across the ensemble

Why I built it: most detectors are tuned for long, clean text. They break on "ok", "jaja", "mdr", "brooo", or anything with mixed languages. I needed something that works on real chat data, not idealized text.

I would love feedback from HN on:

How you evaluate language detection quality in production

Whether context-aware detection helps in your workflows

Ideas for improving code switching accuracy

Additional backends worth integrating

Repo: https://github.com/pnrajan/FastLangML

Happy to share benchmarks, architecture notes, or design tradeoffs if people are interested.

Turning books to courses using AI

Top #1 AI Video Agent: Free All in One AI Video and Image Agent by Vidzoo AI

Ask HN: How would you design an LLM-unfriendly language?

Show HN: MuxPod – A mobile tmux client for monitoring AI agents on the go

March for Billionaires

Turn Claude Code/OpenClaw into Your Local Lovart – AI Design MCP Server

An Nginx Engineer Took over AI's Benchmark Tool

Use fn-keys as fn-keys for chosen apps in OS X

Sir/SIEN: A communication protocol for production outages

Show HN: OpenCode for Meetings

The chaos in the US is affecting open source software and its developers

The world heard JD Vance being booed at the Olympics. Except for viewers in USA

The original vi is a product of its time (and its time has passed)

Circumstantial Complexity, LLMs and Large Scale Architecture

Tech Bro Saga: big tech critique essay series

Show HN: A calculus course with an AI tutor watching the lectures with you

Show HN: 83K lines of C++ – cryptocurrency written from scratch, not a fork

Show HN: SAA – A minimal shell-as-chat agent using only Bash

Mario Tchou

Does Anyone Even Know What's Happening in Zim?

The last Morse code maritime radio station in North America [video]

Show HN: Hacker Newspaper – Yet another HN front end optimized for mobile

OpenClaw Is Changing My Life

Everything you need to know about lasers in one photo

SCOTUS to decide if 1988 video tape privacy law applies to internet uses

Epstein files reveal deeper ties to scientists than previously known

Red teamers arrested conducting a penetration test

Show HN: Open-source AI powered Kubernetes IDE

Show HN: Lucid – Use LLM hallucination to generate verified software specs

AI Doesn't Write Every Framework Equally Well