Figuring out how to develop a Unicode collator from scratch for a research group that I working with in Berlin was one of my formative experiences as a programmer. Ever since then, I've wanted to write something to collect my thoughts on the Unicode Collation Algorithm and the process of building a conformant implementation. Last summer I had a good excuse to do this, when I decided to adapt my collator to Zig as a way of learning that language.
The Unicode standards, and the (relatively) low-level software libraries based on them, do a lot of things for us to make computing possible. We have the luxury of not needing to worry about most of those things most of the time. I find it humbling whenever I do peek under the hood.
adaptit•50m ago
Probably a naive question, but: couldn't you precompute some vector representation of the string once, and reduce collation to a vector comparison? Basically move the cost upfront and get back to the "fast" byte-comparison case?
theobeers•1h ago
Figuring out how to develop a Unicode collator from scratch for a research group that I working with in Berlin was one of my formative experiences as a programmer. Ever since then, I've wanted to write something to collect my thoughts on the Unicode Collation Algorithm and the process of building a conformant implementation. Last summer I had a good excuse to do this, when I decided to adapt my collator to Zig as a way of learning that language.
The Unicode standards, and the (relatively) low-level software libraries based on them, do a lot of things for us to make computing possible. We have the luxury of not needing to worry about most of those things most of the time. I find it humbling whenever I do peek under the hood.
adaptit•50m ago