The Invisible Character That Cost Me Too Much Debugging Time

https://blog.dochia.dev/blog/the-invisible-character/

23•ludovicianul•1h ago

Comments

Agent766•1h ago

A common issue I've come across is an invisible character added when you copy a certificate fingerprint in Windows. https://support.microsoft.com/en-us/topic/certificate-thumbp...

docsaintly•1h ago

Self promotion, and seems somewhat fake? The front page of their websites has user testimonials from 2026.

basilgohar•1h ago

It literally says, 'user@dev:~$ tail -f future-happy-customers.log [INFO] What engineers will be saying (when they're not debugging)', so the future part is intentional. Do with that information what you will.

ludovicianul•1h ago

Author here. I've just launched the tool and wanted to have some simple dev humor in it. It intentional says future testimonials. The story is real though. And it happened with other types of hidden chars in different forms.

Igrom•1h ago

Somewhat related: a coworker of mine recently wrestled with unexpected output coming from the company's internal CLI tool. What he was seeing did not match the flags that were specified in the command.

Lo and behold, his input method automatically collapsed two consecutive dashes into an en-dash (`–-f`), and the "option" was instead treated as a regular positional argument.

dspillett•1h ago

I've hit something similar recently though thankfully it didn't cause significant problems¹: a left-to-right indicator, U+200E, at the end of a user's name.

Apparently Word has a habit of inserting these in fields, whether needed or not in the context, with any right-to-left language supporting language packs are installed. Once added they are silently maintained and depending on exactly what you select may get included when you copy the text out to paste elsewhere, or get included if you use some form of automation to read the field value directly from the document or Word itself.

--------

[1] I noticed it while digging into some output to analyse a related issue, the file had been mashed together from content with different codepages in a way that meant it included invalid code points.

mattmanser•1h ago

One company I worked with we use to import data from other systems into our project management system for clients to help them get set up.

This was the 2000s so it was all scripts (SQL scripts and vbscripts I seem to remember). As part of it, we ended up cleaning the customer data from a myriad of bugs. Inconsistent capitalization, leading and trailing spaces, and this. Weird characters you didn't even know exists.

Over time more and more of these hidden characters were added to the script, because back then it wasn't a case of googling it or asking on SO.

I have a friend who works as a data analyst for a local council. He hates the school reports season as the data from the schools comes in with all sorts of weird problems in consistency.

crazygringo•1h ago

This is really just an ad for Dochia's testing product.

But the first half of the post really is an interesting problem -- what to do about invisible Unicode characters that wind up in a username login field, thus becoming an invalid user, because the username was copied-pasted from a source that inserted things. The post lists potential sources as:

> Copy-paste from PDFs or Word docs: Rich-text formats often inject hidden control characters.

> Email clients and chat apps: Some insert soft hyphens, directionality markers, or non-breaking spaces.

> Keyboards and IMEs: Certain language input systems add combining marks or zero-width joiners.

But of course it's part of a broader Unicode problem, like the fact that there are two ways of representing common accented characters (precomposed vs decomposed) that are also not equivalent, or that multiple accents can be in a different order. Normalization handles those cases, but it doesn't do anything about nonprinting characters.

Is there not any common method for Unicode we should be using to check for, essentially, "grapheme comparison" that doesn't just normalize but ignores non-printing codepoints?

kawsper•1h ago

Something like that made it into a colleagues Ruby code, and it blew up! I think he lost half a day to it.

It was back in the 1.8.7 days, just before proper Unicode support in 1.9, but I don’t remember if that was relevant to this story.

He was deleting code until the bug disappeared, and then we zeroed in and found the character.

It was in the Textmate days, and it didn’t highlight such characters.

aezart•1h ago

Microsoft is terrible about this kind of stuff. We have a big problem with MS Teams replacing tabs with nbsps in XML code snippet blocks. It breaks our pom files. We've also had similar issues with pasting excel tables into emails.

The Race Is on to Make Rare Earth Magnets Outside China

First AI-designed viruses a step towards AI-generated life

The Useful Idiots of AI Doomsaying

Is there a simple uptime bot with customizable status pages?

Meta Pushes into Power Trading as AI Boom Sends Demand Soaring

Astronomers discover previously unknown quasi-moon near Earth

Could AI unlock the creation of more bespoke software?

Cloudflare: You don't need quantum hardware for post-quantum security

Resemver: Using AI to rewrite semver based on actual breaking changes

New Wave of Recognition: Palestine on the Eve of the UN General Assembly

Giant redwoods: largest trees 'thriving in UK' (2024)

Google AI Workers Were Fired Amid Fight over Working Conditions

Doyensec – Systemic SQL Injection in PREST

Verbalized Algorithms

The Commodity and the Moneymaker

How the Octopus Came to Earth

Ask HN: Feasibility of AI that converts user workflows into local ML apps?

Slow Liquid

Some Notes I Took on Software Architecture

Type Branding in TypeScript

The Veil of Romance in Work – A Reflective Spoken-Word Piece

Year of Independence

Progressive Mermaid and streaming diff code blocks – 100x faster render

Reddit is blocking comments with em dashes assuming they're AI generated

Pentagon Bans Tech Vendors from Using China-Based Personnel

When Not to Do Microservices: A Reality Check

Why Do LLMs Design Mediocre Architecture?

Suspect Asks for a 'Lawyer, Dog,' Willfully Ignorant Court Denies Comma, Counsel

Show HN: Ray3 AI Video – A New Model That Can Makes Realistic Videos (Ray3.run)

VektorGame: Strategic Vector Combat