frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Never Buy A .online Domain

https://www.0xsid.com/blog/online-tld-is-pain
333•ssiddharth•2h ago•174 comments

How to fold the Blade Runner origami unicorn (1996)

https://web.archive.org/web/20011104015933/www.linkclub.or.jp/~null/index_br.html
129•exvi•2d ago•7 comments

Danish government agency to ditch Microsoft software (2025)

https://therecord.media/denmark-digital-agency-microsoft-digital-independence
486•robtherobber•5h ago•267 comments

US orders diplomats to fight data sovereignty initiatives

https://www.reuters.com/sustainability/boards-policy-regulation/us-orders-diplomats-fight-data-so...
141•colinhb•1h ago•110 comments

Show HN: Django Control Room – All Your Tools Inside the Django Admin

https://github.com/yassi/dj-control-room
26•yassi_dev•1h ago•8 comments

100M-Row Challenge with PHP

https://github.com/tempestphp/100-million-row-challenge
107•brentroose•5h ago•36 comments

Show HN: A real-time strategy game that AI agents can play

https://llmskirmish.com/
148•__cayenne__•6h ago•53 comments

Claude Code Remote Control

https://code.claude.com/docs/en/remote-control
260•empressplay•8h ago•156 comments

Launch HN: TeamOut (YC W22) – AI agent for planning company events

https://app.teamout.com/ai
9•vincentalbouy•2h ago•13 comments

I'm helping my dog vibe code games

https://www.calebleak.com/posts/dog-game/
1035•cleak•22h ago•334 comments

The History of a Security Hole

https://www.os2museum.com/wp/the-history-of-a-security-hole/
11•st_goliath•3d ago•1 comments

Confusables.txt and NFKC disagree on 31 characters

https://paultendo.github.io/posts/unicode-confusables-nfkc-conflict/
25•pimterry•2d ago•18 comments

Pi – A minimal terminal coding harness

https://pi.dev
505•kristianpaul•18h ago•241 comments

Event Horizon Labs (YC W24) Is Hiring

https://www.ycombinator.com/companies/event-horizon-labs/jobs/xGQicps-founding-infrastructure-eng...
1•ocolegro•4h ago

Mercury 2: Fast reasoning LLM powered by diffusion

https://www.inceptionlabs.ai/blog/introducing-mercury-2
290•fittingopposite•17h ago•114 comments

Japanese Death Poems

https://www.secretorum.life/p/japanese-death-poems-part-3
99•NaOH•2d ago•28 comments

Turing Completeness of GNU find

https://arxiv.org/abs/2602.20762
94•todsacerdoti•10h ago•24 comments

Show HN: Moonshine Open-Weights STT models – higher accuracy than WhisperLargev3

https://github.com/moonshine-ai/moonshine
287•petewarden•18h ago•66 comments

Red Hat takes on Docker Desktop with its enterprise Podman Desktop build

https://thenewstack.io/red-hat-enters-the-cloud-native-developer-desktop-market/
44•twelvenmonkeys•2h ago•28 comments

Topological Naming Problem

https://wiki.freecad.org/Topological_naming_problem
7•tripdout•4d ago•0 comments

I pitched a roller coaster to Disneyland at age 10 in 1978

https://wordglyph.xyz/one-piece-at-a-time
510•wordglyph•1d ago•187 comments

Mac mini will be made at a new facility in Houston

https://www.apple.com/newsroom/2026/02/apple-accelerates-us-manufacturing-with-mac-mini-production/
586•haunter•18h ago•592 comments

Show HN: Scheme-langserver – Digest incomplete code with static analysis

https://github.com/ufo5260987423/scheme-langserver
25•ufo5260987423•1d ago•0 comments

Hacking an old Kindle to display bus arrival times

https://www.mariannefeng.com/portfolio/kindle/
314•mengchengfeng•20h ago•80 comments

Cl-kawa: Scheme on Java on Common Lisp

https://github.com/atgreen/cl-kawa
64•varjag•3d ago•17 comments

Nearby Glasses

https://github.com/yjeanrenaud/yj_nearbyglasses
387•zingerlio•22h ago•170 comments

Show HN: Emdash – Open-source agentic development environment

https://github.com/generalaction/emdash
188•onecommit•22h ago•66 comments

LLM=True

https://blog.codemine.be/posts/2026/20260222-be-quiet/
178•avh3•7h ago•126 comments

Steel Bank Common Lisp

https://www.sbcl.org/
252•tosh•21h ago•101 comments

Half million 'Words with Spaces' missing from dictionaries

https://www.linguabase.org/words-with-spaces.html
110•gligierko•1d ago•212 comments
Open in hackernews

Confusables.txt and NFKC disagree on 31 characters

https://paultendo.github.io/posts/unicode-confusables-nfkc-conflict/
25•pimterry•2d ago

Comments

brazzy•1h ago
> The correct use is to check whether a submitted identifier contains characters that visually mimic Latin letters, and if so, reject it

That is a really bad and user-hostile thing to do. Many of those characters are perfectly valid characters in various non-latin scripts. If you want everyone to force Latin script for identifiers, then own up to it and say so. But rejecting just some them for being too similar to latin characters just makes the behaviour inconsistent and confusing for users.

orthoxerox•1h ago
The correct approach is to accept [a-z][a-z0-9]* as identifiers and forbid everything else.
Zardoz84•52m ago
And you pissed off nearly half of the world population.
skrebbel•47m ago
Yeah fuck foreigners who want to be able to spell their own name right.
silon42•43m ago
As someone with non-ASCII name, I'd like a unicode whitelist (system wide if possible).

And special features to mark cyrillic or other for-me-dangerous characters.

tsimionescu•12m ago
In all cultures, there is an expectation that you have to provide a name for yourself that is intelligible to the culture you're interacting with, both in written language and in speech. If your name is Albert and you are going to interact with many Japanese speakers, you'll have to call yourself アルバート in writing and pronounce your name as something like "Ah roo bay toe" to fit in. If you have a name whose pronunciation depends heavily on tones, such as a Mandarin or Vietnamese name, and you are going to interact with speakers of a non-tonal language, you'll have to come up with a version that you're happy with even if pronounced in the default neutral tone that those people will naturally use. If your name is 高山, you'll have to spell it as Takayama.

Similarly, if you're going to create an identifier for yourself that is supposed to be usable in an international context, you'll have to use the lowest common denominator that is acceptable in that context - and that happens to be a-zA-Z0-9. Why the Latin alphabet and numerals and not, say, Arabic, you might ask? Because Chinese and Indian and Arabic speakers are far more likely to be familiar with the Latin alphabet than with each other's writing systems.

kgeist•4m ago
For logins, we're already used to the fact that they're expected to be in Latin. Having them in the native alphabet is more trouble than it's worth (one system supports it, another breaks etc., easier to remember one, in Latin, across systems) I'd be irritated though if I couldn't use my native alphabet in the user profile for the first name/last name
wongarsu•51m ago
What would make sense is to have a blacklist of usernames (like "admin" or "moderator"), then use the confusables map to see if a username or slug is visually confusable with a name from that blacklist.

I initially thought that must surely be what they are doing and they just worded it very, very poorly. But then of the 31 "disagreements" only one matters, the long s that's either f or s. All other disagreements map to visually similar symbols, like O and 0, which you should already treat as the same for this check

Not to mention that this is mostly an issue for URL slugs, so after NFKC normalization. In HTML this is more robustly solved by styling conventions. Even old bb-style forums will display admin and moderator user names in a different color or in bold to show their status. The modern flourish is to put a little icon next to these kinds of names, which also scales well to other identifiers.

akersten•1h ago
Unicode is both the best thing that's ever happened to text encoding and the worst. The approach I take here is to treat any text coming from the user as toxic waste. Assume it will say "Administrator" or "Official Government Employee" or be 800 pixels tall because it was built only out of decorative combining characters. Then put it in a fixed box with overflow hidden, and use some other UI element to convey things like "this is an official account."

The worst part that this article doesn't even touch on with normalizing and remapping characters is the risk your login form doesn't do it but your database does. Suddenly I can re-register an existing account by using a different set of codepoints that the login system doesn't think exists but the auth system maps to somebody else's record.

ElectricalUnion•11m ago
For some sorts of "confusables", you don't even need Unicode in some cases. Depending on the cursed combination of font, kerning, rendering and display, `m` and `rn` are also very hard to distinguish.
kccqzy•1h ago
If you allow users to submit arbitrary Unicode string as text, why would you need to check confusables.txt? Whose confusion are you guarding against?
zahlman•48m ago
I suppose: other users, if you store the first user's text and transmit it to another one.
Liftyee•48m ago
Does the "removing dead code" advantage outweigh the additional complexity of having to maintain 2 different confusables lists: one for when NFKC has been applied first and one without? It didn't sound like applying one after the other caused any errors, just that some previously reachable states are unreachable.
lich_king•42m ago
This is an inexplicable, AI-written article and the obvious answer is no. There's no performance or complexity overhead to not removing a couple of dead characters. There is a complexity overhead to forking off the list or adding pointless special cases to your code.
happytoexplain•45m ago
Tangential - I'm aware of various types of, let's say, "swappability" that Unicode defines (broader than the Unicode concept of "equivalence"):

- Canonical (NF)

- Compatible (NFK)

- Composed vs decomposed

- Confusable (confusables.txt)

Does Unicode not define something like "fuzzy" equivalence? Like "confusable" but more broad, for search bar logic? The most obvious differences would be case and diacritic insensitivity (e, é). Case is easy since any string/regex API supports case insensitivity, but diacritic insensitivity is not nearly as common, and there are other categories of fuzzy equivalence too (e.g. ø, o).

I guess it makes sense for Unicode to not be interested in defining something like this, since it relates neither to true semantics nor security, but it's an incredibly common pattern, and if they offered some standard, I imagine more APIs would implement it.

csense•36m ago
My theory: The "long S" in "Congreſs" is an f. They used f instead of s because without modern dental care, a lot of people in the 1600's and 1700's were miffing teeth and fpoke with a lifp.
nkrisc•9m ago
https://en.wikipedia.org/wiki/Long_s

That’s not the case.

joshdata•29m ago
> If your application also runs NFKC normalization (which it should — ENS, GitHub, and Unicode IDNA all require it)

That's not right. Most of the web requires NFC normalization, not NFKC. NFC doesn't lose information in the original string. It reorders and combines code points into equivalent code point sequences, e.g. to simplify equality tests.

In NFKC, the K for "Compatibility" means some characters are replaced with similar, simpler code points. I've found NFKC useful for making text search indexes where you want matches to be forgiving, but it would be both obvious and wrong to use it in most of the web because it would dramatically change what the user has entered. See the examples in https://www.unicode.org/reports/tr15/.