Losing 1½ Million Lines of Go

https://www.tbray.org/ongoing/When/202x/2026/01/14/Unicode-Properties

96•moks•2w ago

Comments

mroche•2w ago

> Unfortunately, Go’s library doesn’t get updated every time Unicode does. As of now, January 2026, it’s still stuck at Unicode 15.0.0, which dates to September 2023; the latest version is 17.0.0, last September. Which means there are plenty of Unicode characters Go doesn’t know about, and I didn’t want Quamina to settle for that.

I have to say I am surprised about that. Does anyone have any context or guesses as to why this is the case?

EDIT: Go's unicode was actually updated to v17 yesterday:

https://github.com/golang/go/commit/dd39dfb534d2badf1bb2d72d...

watchful_moose•2w ago

Hard to get promoted at Google doing that

matt3210•2w ago

Based on the commit message and using "CL" which is google lingo for Change List on their internal system, I bet this was already available on the internal version and just ported to github version after someone pointed it out.

neild•2w ago

Much more prosaic (if slightly embarrassing), I'm afraid: The update was non-trivial (this CL is simple, but there are some accompanying ones in x/text which are not) and it didn't hit the top of the priority list for anyone who understands x/text.

Go is pretty much entirely developed in public; there are some Google-internal customizations but none of them are particularly exciting and almost all changes start in the open source repo and are imported from there.

LukeShu•2w ago

"CL"/"Change List" is the lingo for the Gerrit code review tool, which is how all contributions to Go happen. Creating a GitHub PR simply triggers a bot to create a Gerrit CL, which is where all discussion about the "PR" happens and where the "accept" button gets clicked.

8n4vidtmkvmk•2w ago

Is Gerrit the same as Critique?

tonfa•2w ago

It's a descendant of critique's predecessor (Mondrian)

https://www.gerritcodereview.com/about.html

fsmv•2w ago

There was a short thread about this on mastodon involving Rob Pike the other day https://hachyderm.io/@robpike/115896334649905170

Someone•2w ago

> Sure, these automata are “wide”, with lots of branches, but they’re also shallow, since they run on UTF-8 encoded characters whose maximum length is four and average length is much less

I would consider splitting this task into two:

- extracting the next Unicode code unit

- determining whether it’s in the code class

For the second, instead of using an automaton, one could use a perfect hash (https://en.wikipedia.org/wiki/Perfect_hash_function). That could make that part branch-free.

Is that a good idea?

norir•2w ago

A precomputed lookup table would be about 1MB covering all of then code points. The lookup code would first compute the code point (and also could do validation) and directly look up the class in the table. The lookup table would not need to be directly embedded in go code and could just be stored in a binary file. But I'd imagine it also could be put in an array literal in its own file that would never be opened by an ide if the program needs to be distributed as a single binary.

nektro•1w ago

https://github.com/nektro/zig-unicode-ucd if you'd like reference for another implementation

France's homegrown open source online office suite

British drivers over 70 to face eye tests every three years

Start all of your commands with a comma (2009)

Leisure Suit Larry's Al Lowe on model trains, funny deaths and Disney

Hoot: Scheme on WebAssembly

Stories from 25 Years of Software Development

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

First Proof

Reinforcement Learning from Human Feedback

The Waymo World Model

Coding agents have replaced every framework I used

Software Factories and the Agentic Moment

Vocal Guide – belt sing without killing yourself

A Fresh Look at IBM 3270 Information Display System

Unseen Footage of Atari Battlezone Arcade Cabinet Production

72M Points of Interest

StrongDM's AI team build serious software without even looking at the code

Making geo joins faster with H3 indexes

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Ga68, a GNU Algol 68 Compiler

Monty: A minimal, secure Python interpreter written in Rust for use by AI

Hackers (1995) Animated Experience

Show HN: Kappal – CLI to Run Docker Compose YML on Kubernetes for Local Dev

Sheldon Brown's Bicycle Technical Info

What Is Ruliology?

Show HN: I spent 4 years building a UI design tool with only the features I use

Show HN: If you lose your memory, how to regain access to your computer?

An Update on Heroku

Microsoft open-sources LiteBox, a security-focused library OS

Female Asian Elephant Calf Born at the Smithsonian National Zoo