frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

It's Not Wrong that " ".length == 7

https://hsivonen.fi/string-length/
39•program•1h ago

Comments

bstsb•56m ago
ironic that unicode is stripped out the post's title here, making it very much wrong ;)

for context, the actual post features an emoji with multiple unicode codepoints in between the quotes

cmeacham98•46m ago
Funny enough I clicked on the post wondering how it could possibly be that a single space was length 7.
ale42•39m ago
Maybe it isn't a space, but a list of invisible Unicode chars...
yread•33m ago
It could also be a byte length of a 3 byte UTF-8 BOM and then some stupid space character like f09d85b3
eastbound•29m ago
It can be many Zero-Width Space, or a few Hair-Width Space.

You never know, when you don’t know CSS and try to align your pixels with spaces. Some programers should start a trend where 1 tab = 3 hairline-width spaces (smaller than 1 char width).

Next up: The <half-br/> tag.

c12•6m ago
I did exactly the same, thinking that maybe it was invisible unicode characters or something I didn't know about.
mrheosuper•54m ago
>We’ve seen four different lengths so far:

Number of UTF-8 code units (17 in this case) Number of UTF-16 code units (7 in this case) Number of UTF-32 code units or Unicode scalar values (5 in this case) Number of extended grapheme clusters (1 in this case)

We would not have this problem if we all agree to return number of bytes instead.

com2kid•45m ago
How would that help? UTF-8, 16, and 32 languages would still report different numbers.
curtisf•44m ago
"number of bytes" is dependent on the text encoding.

UTF-8 code units _are_ bytes, which is one of the things that makes UTF-8 very nice and why it has won

minebreaker•43m ago
> We would not have this problem if we all agree to return number of bytes instead.

I don't understand. It depends on the encoding isn't it?

charcircuit•43m ago
>Number of extended grapheme clusters (1 in this case)

Only if you are using a new enough version of unicode. If you were using an older version it is more than 1. As new unicode updates come out, the number of grapheme clusters a string has can change.

Aissen•47m ago
I'd disagree the number of unicode scalars is useless (in the case of python3), but it's a very interesting article nonetheless. Too bad unicode.org decided to break all the URLs in the table at the end.
darkwater•45m ago
(2019) updated in (2022)
DavidPiper•39m ago
I think that string length is one of those things that people (including me) don't realise they never actually want. In a production system, I have never actually wanted string length. I have wanted:

- Number of bytes this will be stored as in the DB

- Number of monospaced font character blocks this string will take up on the screen

- Number of bytes that are actually being stored in memory

"String length" is just a proxy for something else, and whenever I'm thinking shallowly enough to want it (small scripts, mostly-ASCII, mostly-English, mostly-obvious failure modes, etc) I like grapheme cluster being the sensible default thing that people probably expect, on average.

baq•26m ago
ASCII is very convenient when it fits in the solution space (it’d better be, it was designed for a reason), but in the global international connected computing world it doesn’t fit at all. The problem is all the tutorials, especially low level ones, assume ASCII so 1) you can print something to the console and 2) to avoid mentioning that strings are hard so folks don’t get discouraged.

Notably Rust did the correct thing by defining multiple slightly incompatible string types for different purposes in the standard library and regularly gets flak for it.

sigmoid10•24m ago
I have wanted string length many times in production systems for language processing. And it is perfectly fine as long as whatever you are using is consistent. I rarely care how many bytes an emoji actually is unless I'm worried about extreme efficiency in storage or how many monospace characters it uses unless I do very specific UI things. This blog is more of a cautionary tale what can happen if you unconsciously mix standards e.g. by using one in the backend and another in the frontend. But this is not a problem of string lengths per se, they are just one instance where modern implementations are all over the place.
xg15•7m ago
It gets more complicated if you do substring operations.

If I do s.charAt(x) or s.codePointAt(x) or s.substring(x, y), I'd like to know which values for x and y are valid and which aren't.

impure•35m ago
I learned this recently when I encountered a bug due to cutting an emoji character in two making it unable to render.
kazinator•32m ago
Why would I want this to be 17, if I'm representing strings as array of code points, rather than UTF-8?

TXR Lisp:

  1> (len " ")
  5
  2> (coded-length " ")
  17
(Trust me when I say that the emoji was there when I edited the comment.)

The second value takes work; we have to go through the code points and add up their UTF-8 lengths. The coded length is not cached.

troupo•12m ago
Obligatory, Emoji under the hood https://tonsky.me/blog/emoji/
spyrja•5m ago
I really hate to rant on about this. But the gymnastics required to parse UTF-8 correctly are truly insane. Besides that we now see issues such as invisible glyph injection attacks etc cropping up all over the place due to this crappy so-called "standard". Maybe we should just to go back to the simplicity of ASCII until we can come up with with something better?

MCP plugins study – 10% of tested plugins „fully exploitable"

https://www.pynt.io/blog/llm-security-blogs/state-of-mcp-security
1•truegoric•29s ago•0 comments

Sapir-Whorf does not apply to Programming Languages

https://buttondown.com/hillelwayne/archive/sapir-whorf-does-not-apply-to-programming/
1•BerislavLopac•55s ago•0 comments

Physics-based simulation and optimization in desktop 3D printing

https://www.fabbaloo.com/news/bambustudio-now-integrates-helio-additive-simulation-for-optimized-3d-print-quality
1•dhar118•2m ago•0 comments

Biotech CEO sues Uber after illegal immigrant driver assault caught on camera

https://www.foxnews.com/us/biotech-ceo-sues-uber-after-illegal-immigrant-driver-assault-caught-camera-downtown-charleston-sc
1•5555624•8m ago•0 comments

The "Super Weight:" How Even a Single Parameter Can Determine a LLM's Behavior

https://machinelearning.apple.com/research/the-super-weight
1•cjrd•11m ago•0 comments

I have no mut and I must borrow

https://old.reddit.com/r/rust/comments/1mwmei6/media_i_have_no_mut_and_i_must_borrow/
1•truegoric•15m ago•0 comments

Organizers Are Demanding Palantir Drop Contracts with ICE and Israeli Military

https://truthout.org/articles/organizers-are-demanding-palantir-drop-contracts-with-ice-and-israeli-military/
1•01-_-•17m ago•0 comments

Show HN: Ultra-fast, embedded KV store in pure Rust

https://github.com/mehrantsi/FeOxDB
3•mehrant•18m ago•2 comments

China cut itself off from the global internet for an hour on Wednesday

https://www.theregister.com/2025/08/21/china_port_443_block_outage/
1•01-_-•19m ago•0 comments

Ask HN: Why is Prolog not gaining traction?

1•0x07ca•21m ago•0 comments

Martyrs to the Unspeakable: The Assassinations of JFK, Martin, Malcolm, and RFK

https://orbisbooks.com/products/martyrs-to-the-unspeakable-vol-2
1•hkhn•23m ago•0 comments

OUI-Spy Is a Slick Bluetooth Low Energy Scanner

https://www.hackster.io/news/colonel-panic-s-oui-spy-is-a-slick-bluetooth-low-energy-scanner-or-a-foxhunting-handset-c16927adad71
1•meilily•23m ago•0 comments

How to load test PostgreSQL database and not miss anything

https://habr.com/en/companies/tantor/articles/936622/
1•amalinovic•24m ago•0 comments

Sonic Liberation Devices

https://sonicliberationdevices.com/
1•crousto•30m ago•1 comments

Claude Code's erratic behavior from May-August 2025

https://github.com/bogdansolga/claude-code-summer-2025-erratic-behavior
4•bogdansolga•38m ago•1 comments

Germany's Ecosia proposes stewardship to run Google Chrome

https://www.reuters.com/business/germanys-ecosia-proposes-stewardship-run-google-chrome-2025-08-21/
4•13324•41m ago•0 comments

Demand-Side Platform

https://en.wikipedia.org/wiki/Demand-side_platform
2•doener•51m ago•0 comments

Grok chats exposed in Google results

https://www.bbc.com/news/articles/cdrkmk00jy0o
2•mdhb•52m ago•2 comments

When People Giggle at Your Name, or the 2025 Hugo Awards Incident

https://grigorylukin.com/2025/08/21/when-people-giggle-at-your-name-or-the-2025-hugo-awards-incident/
2•Rokesmith•56m ago•0 comments

The Minecraft code no one has solved (2024) [video]

https://www.youtube.com/watch?v=nz2LeXwJOyI
3•zichy•57m ago•0 comments

Mitigating Backpressure from High Join Amplification with Unaligned Joins

https://risingwave.com/blog/unaligned-joins-risingwave/
1•Sheldon_fun•1h ago•0 comments

Trump administration to vet all 55M foreigners with U.S. visas

https://www.washingtonpost.com/national-security/2025/08/21/us-visa-vetting-foreigners-immigration-tourism/
9•KnuthIsGod•1h ago•0 comments

Saneject: Dependency Injection the Unity Way

https://github.com/alexanderlarsen/Saneject
1•msk-lywenn•1h ago•0 comments

Insights from 100 Years of Research with Probiotic E. Coli

https://pmc.ncbi.nlm.nih.gov/articles/PMC5063008/
2•luu•1h ago•0 comments

Too Much of a Good Thing: How Genericide Sends Trademarks to the Graveyard

https://uclawreview.org/2023/03/21/how-genericide-sends-trademarks-to-the-graveyard/
2•thunderbong•1h ago•0 comments

What the Hell Is Going On?

https://catskull.net/what-the-hell-is-going-on-right-now.html
4•todsacerdoti•1h ago•0 comments

Do blogs need to be so lonely?

https://thehistoryoftheweb.com/do-blogs-need-to-be-so-lonely/
1•Brajeshwar•1h ago•0 comments

Ryan Dancey on the Acquisition of TSR

https://www.insaneangel.com/insaneangel/RPG/Dancey.html
2•Michelangelo11•1h ago•0 comments

My first project in Go is a terminal dashboard (what a programming language)

4•vinserello•1h ago•3 comments

Opt-In Event Phases for Reliably Fast DOM Operations

https://github.com/MicrosoftEdge/MSEdgeExplainers/blob/main/EventPhases/explainer.md
1•robin_reala•1h ago•0 comments