Hello HN(Hacker News),
I’m the author of KRR v2.1. I have a background in Education, not Computer Science, but I built this because I was frustrated with the inefficiency of the current Korean romanization standard (RR).
The Problem:
Existing standards are designed for "sound," not "data."
They are irreversible (Lossy compression).
They are ambiguous (e.g., 'Gang' = River or liver+ㄴ?).
They cause safety issues (The Korean word 니가 transliterates to Niga, triggering filters).
The Solution:
I designed a purely logical, 1:1 mapping protocol.
It uses 3 control keys (\, ~, `) to handle all 11,172 Hangul syllables without collision.
It preserves morphological boundaries (e.g., goog\bab instead of Gukbap), which is better for LLM tokenization.
It is implemented in Python (about 50 lines of core logic).
It might look structurally different from conventional romanization, but for indexing, searching, and AI processing, I believe this is mathematically superior.
The code is open-source (Apache 2.0). I’d love to hear your feedback on the logic.
R8dymade•2h ago