frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Pg_kazsearch – Full-text search for Kazakh in PostgreSQL

https://github.com/darkhanakh/pg-kazsearch
2•darkhanakh•1h ago

Comments

darkhanakh•1h ago
hey, so postgres has zero support for kazakh full-text search. no dictionary, no stemmer, nothing. if you try searching kazakh text today you basically get trigram matching which is useless for this language

kazakh is agglutinative — one word can have like 5-6 suffixes stacked on it so the surface form between query and document almost never matches. i tested on 3000 real news articles and trigram gives you 1 result where pg_kazsearch gives 61 for the same query. overall recall improvement is around 23%

the core stemmer is written in rust, and it plugs into standard postgres FTS stuff — gin indexes, ts_rank, phrase search etc. you can install via deb package or docker, no need to compile anything

as far as i know this is the first FTS implementation for kazakh in any major database. the stemming approach should be pretty transferable to other turkic languages (uyghur, uzbek, kyrgyz) since they have similar morphology

this is kind of a hobby project so theres probably rough edges i havent caught yet. would really appreciate feedback especially from anyone whos worked on postgres extensions or text search for non-english languages. also open to ideas on what to improve next