frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

How I do and don't use agents

https://twitter.com/jessfraz/status/2019975917863661760
1•tosh•5m ago•0 comments

BTDUex Safe? The Back End Withdrawal Anomalies

1•aoijfoqfw•7m ago•0 comments

Show HN: Compile-Time Vibe Coding

https://github.com/Michael-JB/vibecode
1•michaelchicory•10m ago•0 comments

Show HN: Ensemble – macOS App to Manage Claude Code Skills, MCPs, and Claude.md

https://github.com/O0000-code/Ensemble
1•IO0oI•13m ago•1 comments

PR to support XMPP channels in OpenClaw

https://github.com/openclaw/openclaw/pull/9741
1•mickael•14m ago•0 comments

Twenty: A Modern Alternative to Salesforce

https://github.com/twentyhq/twenty
1•tosh•15m ago•0 comments

Raspberry Pi: More memory-driven price rises

https://www.raspberrypi.com/news/more-memory-driven-price-rises/
1•calcifer•21m ago•0 comments

Level Up Your Gaming

https://d4.h5go.life/
1•LinkLens•25m ago•1 comments

Di.day is a movement to encourage people to ditch Big Tech

https://itsfoss.com/news/di-day-celebration/
2•MilnerRoute•26m ago•0 comments

Show HN: AI generated personal affirmations playing when your phone is locked

https://MyAffirmations.Guru
4•alaserm•27m ago•3 comments

Show HN: GTM MCP Server- Let AI Manage Your Google Tag Manager Containers

https://github.com/paolobietolini/gtm-mcp-server
1•paolobietolini•28m ago•0 comments

Launch of X (Twitter) API Pay-per-Use Pricing

https://devcommunity.x.com/t/announcing-the-launch-of-x-api-pay-per-use-pricing/256476
1•thinkingemote•28m ago•0 comments

Facebook seemingly randomly bans tons of users

https://old.reddit.com/r/facebookdisabledme/
1•dirteater_•30m ago•1 comments

Global Bird Count Event

https://www.birdcount.org/
1•downboots•30m ago•0 comments

What Is Ruliology?

https://writings.stephenwolfram.com/2026/01/what-is-ruliology/
2•soheilpro•32m ago•0 comments

Jon Stewart – One of My Favorite People – What Now? with Trevor Noah Podcast [video]

https://www.youtube.com/watch?v=44uC12g9ZVk
2•consumer451•34m ago•0 comments

P2P crypto exchange development company

1•sonniya•48m ago•0 comments

Vocal Guide – belt sing without killing yourself

https://jesperordrup.github.io/vocal-guide/
2•jesperordrup•53m ago•0 comments

Write for Your Readers Even If They Are Agents

https://commonsware.com/blog/2026/02/06/write-for-your-readers-even-if-they-are-agents.html
1•ingve•53m ago•0 comments

Knowledge-Creating LLMs

https://tecunningham.github.io/posts/2026-01-29-knowledge-creating-llms.html
1•salkahfi•54m ago•0 comments

Maple Mono: Smooth your coding flow

https://font.subf.dev/en/
1•signa11•1h ago•0 comments

Sid Meier's System for Real-Time Music Composition and Synthesis

https://patents.google.com/patent/US5496962A/en
1•GaryBluto•1h ago•1 comments

Show HN: Slop News – HN front page now, but it's all slop

https://dosaygo-studio.github.io/hn-front-page-2035/slop-news
7•keepamovin•1h ago•1 comments

Show HN: Empusa – Visual debugger to catch and resume AI agent retry loops

https://github.com/justin55afdfdsf5ds45f4ds5f45ds4/EmpusaAI
1•justinlord•1h ago•0 comments

Show HN: Bitcoin wallet on NXP SE050 secure element, Tor-only open source

https://github.com/0xdeadbeefnetwork/sigil-web
2•sickthecat•1h ago•1 comments

White House Explores Opening Antitrust Probe on Homebuilders

https://www.bloomberg.com/news/articles/2026-02-06/white-house-explores-opening-antitrust-probe-i...
1•petethomas•1h ago•0 comments

Show HN: MindDraft – AI task app with smart actions and auto expense tracking

https://minddraft.ai
2•imthepk•1h ago•0 comments

How do you estimate AI app development costs accurately?

1•insights123•1h ago•0 comments

Going Through Snowden Documents, Part 5

https://libroot.org/posts/going-through-snowden-documents-part-5/
1•goto1•1h ago•0 comments

Show HN: MCP Server for TradeStation

https://github.com/theelderwand/tradestation-mcp
1•theelderwand•1h ago•0 comments
Open in hackernews

What to know about encodings and character sets to work with text (2011)

https://kunststube.net/encoding/
40•ColinWright•4mo ago

Comments

ColinWright•4mo ago
Full title:

"What every programmer absolutely, positively needs to know about encodings and character sets to work with text"

____tom____•4mo ago
> Because Unicode is not an encoding.

> Overall, Unicode is yet another encoding scheme.

?

Terr_•4mo ago
Yeah, author seems to have made a mistake there.

> Unicode is a large table mapping characters to numbers and the different UTF encodings specify how these numbers are encoded as bits. Overall, Unicode is yet another encoding scheme.

I would guess this represents a confusion between the narrow abstract definition of Unicode versus the way it is casually used as an umbrella term which includes stuff like Transformation Formats.

jibal•4mo ago
The author doesn't understand what a character is, despite the Unicode standard making it very clear that character != codepoint
btilly•4mo ago
That's just somewhat sloppy.

Unicode is not an encoding of text to bits. It is an encoding of text to numbers. There are a variety of encodings of text to bits based on how those numbers are to be encoded into bits.

Though technically Unicode isn't even quite that. For example "é" can be encoded as U+00E9 or as U+0065,U+0301. Going the other way, "水", U+6C34, is drawn differently in simplified Chinese, Japanese, and traditional Chinese. Unicode calls this, "language-sensitive glyph variation".

Which means that the correspondence between text and Unicode is many to many both ways. And then the Unicode can show up in bits and bytes again in multiple ways.

ryandrake•4mo ago
Joel covered this[1] topic over 20 years ago (!!) and we still regularly see "senior" programmers who just casually think of text as a string and strings as text, and that's all there is to it. I still regularly see websites full of ????? and U+FFFD and apostrophes becoming ’ everywhere.

1: https://www.joelonsoftware.com/2003/10/08/the-absolute-minim...

TacticalCoder•4mo ago
> Text is either encoded in UTF-8 or it's not. If it's not, it's encoded in ASCII, ISO-8859-1, UTF-16 or some other encoding.

Nitpicking but if it's encoded in ASCII, it's by definition a validly encoded UTF-8 file.

jibal•4mo ago
This accurate comment was previously dead. Glad that it got resurrected.
nick49488171•4mo ago
Bitmaps. Anything outside of ASCII should be a bitmap.
bloomca•4mo ago
How would that work? How many bytes per character? How different fonts would work?
nick49488171•4mo ago
Sorry, misplaced humor.
Uehreka•4mo ago
This is the encodings equivalent of the “there should just be one timezone” take.
random3•4mo ago
The best things are those that get out of the way.
dang•4mo ago
Related:

What Every Programmer Absolutely, Positively Needs To Know About Encodings (2011) - https://news.ycombinator.com/item?id=30384223 - Feb 2022 (58 comments)

What programmers need to know about encodings and charsets (2011) - https://news.ycombinator.com/item?id=24162499 - Aug 2020 (22 comments)

What to know about encodings and character sets - https://news.ycombinator.com/item?id=9788253 - June 2015 (30 comments)

What Every Programmer Needs To Know About Encodings And Character Sets - https://news.ycombinator.com/item?id=4771987 - Nov 2012 (5 comments)

jibal•4mo ago
> Everybody is aware of this at some level, but somehow this knowledge seems to suddenly disappear in a discussion about text, so let's get it out first: A computer cannot store "letters", "numbers", "pictures" or anything else. The only thing it can store and work with are bits.

This is wrong and it goes downhill from there. I don't want to take the time and effort to fisk it, but it's full of errors like mistaking characters for codepoints and saying things like "In other words, ASCII maps 1:1 unto UTF-8" -- a bizarre and wrong way to say what he said in the previous sentence: "All characters available in the ASCII encoding only take up a single byte in UTF-8 and they're the exact same bytes as are used in ASCII".

torstenvl•4mo ago
It isn't wrong. Computers, broadly speaking, can only store binary digits.

I'm not sure if you're thinking of the Mark II, or the term as meaning human arithmeticians, or what, but that seems pedantic to the point of sophistry.

jibal•4mo ago
I've pointed out your mistakes elsewhere and won't respond to you otherwise. I just want to alert you to the fact that, when you told someone several weeks ago that "Your behavior has no place here" you were addressing an HN public moderator.
jibal•4mo ago
"pedantic to the point of sophistry"

Gotta love how those who can't comprehend reach for the ad hominem. And it's so absurdly hypocritical ... the claim that "A computer cannot store "letters", "numbers", "pictures" or anything else. The only thing it can store and work with are bits" is extraordinarily pedantic sophistry and WRONG. It comes from people who have no understanding of the concepts of a representation and abstraction and either don't know how digital storage works or are pretending not to. The many bi-state mechanisms we use for digital storage are not bits, they represent bits. And CPUs don't contain (or "store") bits, they are made of transistors that control the flow of electrons ... modeling this as "bits" is an abstraction.

But hey, I guess John von Neumann was a pedant and a sophist when he talked about stored program computers rather than stored bit computers.

torstenvl•3mo ago
An ad hominem is a fallacious argument pertaining to one or more individual characteristics of one or more persons. Criticism of an argument as pedantic and/or sophistry cannot possibly be an ad hominem, because it is a criticism of the argument itself.

By contrast, your attempt to discredit me by reference to some other interaction we seem to have had is an ad hominem. More interestingly, your reference to John von Neumann and your reference to a moderator are both also a subclass of ad hominems known as "appeals to authority."

https://ethics.org.au/ethics-explainer-ad-hominem-fallacy/

fainpul•4mo ago
Highly related recommendation: https://i18n-puzzles.com/

It's a series of tasks ("puzzles") in the style of Advent of Code. Some deal with text handling, some with dates and times.

In my opinion it's a fun way to really get this stuff in your brain (by doing, not just reading about it) and especially learn about what your programming language of choice has to offer in this department.

I find the later puzzles have a bit of an artificial difficulty increase, which makes them seem a bit far fetched and unrealistic. But the first few are definitely reasonable and applicable to real-world scenarios. You also don't have to do them in order. Unlike with AoC, all the puzzles are available from the start.

geocar•4mo ago
> Say, your app must accept files uploaded in GB18030, but internally you are handling all data in UTF-32. A tool like iconv can cleanly convert the uploaded file with a one-liner like iconv('GB18030', 'UTF-32', $string). That is, it will preserve the characters while changing the underlying bits:

Oh for goodness sake please please don't do this: Despite the appearance of the "representations" given, GB18030 is bigger than Unicode so this potentially destroys information. Almost any other `character (encoding) set' would have been a better example, but definitely not this one, and unless you already know why it might work for a long time until you discover a problem.

Actually, I do not generally recommend converting anything ever; I try to save the original customer/user submission and then any derivative use of it that needs some specific conversion can use that. If you save the bytes you were given, you can fix problems like this when they come up, but if you normalise everything before saving your golden record in your database, you might actually lose something important.

Three other things to know about "encoding and character sets" that I feel like are more important than code points:

1. If you don't know the language, you can't sort/compare, so if you think this saves you keeping track of the 'character set', well you _should_ have been tracking 'character set+language' anyway, so even if UTF32 worked, you'd still need the field for language anyway. And yeah, this affects "latin" languages too.

2. If you don't know the font, you can't figure out how big something is, draw it, wrap it, count the "characters", and so on. If you're beginning to wonder what you can do with text you can't read, you're starting to get the idea.

3. Microsoft is a massive fucking company and can't get RTL right. Bananas, right? You have no hope if you do not talk to actual human beings that use the language. This guy https://www.notarabic.com gave a talk a few years ago which I recommend if that sounds incredible.

tl;dr: text is hard, let's go to the beach.

danhau•4mo ago
At my job I have to deal with an old system that invented its own encoding, named TSS. The idea was to unify multiple charsets and encodings into one, before Unicode was a thing. But instead of coming up with one big a charset and assigning codepoints plus an encoding scheme, they thought it was wise to just repackage other encodings and charsets. Think Matroska, but for text. And yes, I do mean charsets AND encodings. Sometimes they repackage an encoding, sometimes just a charset where the codepoints are the encoding.

TSS supports the ISO-8859 charsets and corresponding (but deviating) Windows codepages, traditional and simplified Chinese, half- and fullwidth Japanese, Korean via Wansung and Johab, and others I'm forgetting right now. And in newer version of the software, they also support Unicode, but using a custom encoding.

Thankfully a good chunk of all that is well documented, like the byte values introducing a fullwidth Japanese character, for example. But they don't describe what charset or encoding is actually used. EUC-JP? Shift-JIS? Turns out it's JIS X 0208. You'd think they would just use Shift-JIS, which gives them both full- and halfwidth Japanese in one shot, but no. They package those explicitly as JIS X 0208 and JIS X 0201. Similar questions arise for Chinese and the others. It took a lot of reverse engineering to figure that stuff out. But if you think that is hard, have fun finding tables to map those old encodings to Unicode and back. Java is a godsend in this case. Charset.availableCharsets has them all!

What's kinda charming is that TSS also contains text formatting commands. "Make all following text bold! Make it underlined! Now make it both bold and underlined!" Stuff like that.

What's less charming is that TSS is actually a superset (an extension of) the ISO-8859 family, similar to how ISO-8859 is a superset of ASCII. In other words, all ISO-8859-1 (or any other variant) is perfectly valid TSS, but not all TSS is valid ISO-8859-1. This creates a lot of fun meetings with other departments when they query the database and are puzzled as to where those weird characters in their ISO-8859-1 text came from.