Ask HN: What was the hardest bug you tracked down in 2025?

11•varshith17•1mo ago

We talk a lot about shipping features, but I want to hear the war stories.

I spent almost a month chasing a silent data corruption issue that turned out to be floating-point non-determinism between x86 and ARM chips. It completely changed how I look at "reliable" memory.

What was your "white whale" bug of the year?

Comments

guntis_dev•1mo ago

Not exactly a bug, but I was given a company written video player that receives a video stream, decodes it via the browser WebCodecs API, and renders via WebGL. Users complained that video was laggy and often froze on their iPhones. My task was to make it perform better - using the browser's built-in player wasn't an option.

After profiling, I found two bottlenecks: converting frames to RGB was happening on the CPU and was quite costly, so I rendered the decoded YUV frames directly on the GPU without conversion. Second, I moved all logic off the main thread since our heavy UI was competing for the same resources.

The main thread thing was that I was iterating through the frame buffer multiple times per second to select the appropriate frame for rendering. When heavy UI animations occurred, the main thread would block, causing the iteration to complete late - by then, the target frame's timestamp had passed, so it would get skipped and only the next frame would be drawn, creating visible stuttering.

gethly•1mo ago

Not a bug but rather an engineering oversight. Also not hard and it did not affect me, I caught it soon, but it was one of those surprising moments worth mentioning.

I have a write-online table in MariaDB and ordering of records is important. I have realised that the database has no such thing as append-only table that stores records in the order they are submitted into the database. Every record has one or more indices, and it is these indices that dictate the ordering and only for the data they index. What I have overlooked is when a transaction A starts, then transaction B starts, the transaction A might have records with smaller keys, as it started sooner, but transaction B commits first with higher keys, which means I end up with out-of-order entries. This is not too bad, actually, it depends on the context and in my case the context was that there were readers constantly waiting for new records. And so if a reader reads records after transaction B commits but not before transaction A commits, the reader will never see new records from transaction A. I have solved it by blocking the readers based on number of active transactions with ordering being considered.

I have wrote about it in this blog post, in the "Event Log and proper ordering of events" section https://gethly.com/blog/how-of-gethly/event-sourcing-right-w...

call68•1mo ago

Pues en una auditoría con note algo raro la verdad su portal web tanto al público y los servidores estaba totalmente colapsados osea no se podía hacer nada porque tal parece que alguien ya lo había hecho por mí en mi cabeza lo que pasó por preguntar es que pasó aquí para que me trajeron vine a buscar posibles vulnerabilidades no a buscar a alguien que hizo esto bueno entre conversiones el se ofreció a decirme que buscará al responsable cosa que no era nada fácil pero tampoco imposible

realitydrift•1mo ago

A lot of the hardest bugs this year feel like nothing is technically broken, but reality isn’t lining up anymore. Async boundaries, floating-point drift, and ordering guarantees. All places where meaning gets lost once systems get fast, parallel, and distributed. Once state stops being inspectable and replayable, debugging turns into archaeology rather than engineering.

varshith17•1mo ago

'Debugging turns into archaeology rather than engineering', this is the exact realization that forced me to stop building agents and start building a database kernel.

I spent 6 months chasing 'ghosts' in my backtests that turned out to be floating-point drift between my Mac and the production Linux server. I realized exactly what you said: if state isn't replayable bit-for-bit, it's not engineering.

I actually ended up rewriting HNSW using Q16.16 fixed-point math just to force 'reality to line up' again. It’s painful to lose the raw speed of AVX floats, but getting 'Engineering' back was worth it. check it out(https://github.com/varshith-Git/Valori-Kernel)

Pensions Are a Ponzi Scheme

Divvy.club – Splitwise alternative that makes sense

Betterment data breach exposes 1.4M customers

MIT Technology Review has confirmed that posts on Moltbook were fake

Epstein Science: the people Epstein discussed scientific topics with

Bambuddy – a free, self-hosted management system for Bambu Lab printers

Every Failed M4 Gun Replacement Attempt

China ramps up energy boom flagged by Musk as key to AI race

Show HN: ClawBox – Dedicated OpenClaw Hardware (Jetson Orin Nano, 67 Tops, 20W)

Ask HN: AI never gets flustered, will that make us better as people or worse?

Show HN: HalalCodeCheck – Verify food ingredients offline

Student makes cosmic dust in a lab, shining a light on the origin of life

In the Australian outback, we're listening for nuclear tests

'Hermès orange' iPhone sparks Apple comeback in China

Show HN: Goxe 19k Logs/S on an I5

The async builder pattern in Rust

(Golang) Self referential functions and the design of options

Show HN: Model Training Memory Simulator

Claude Code Controller

Software design is now cheap

Show HN: Are You Random? – A game that predicts your "random" choices

Poland to probe possible links between Epstein and Russia

Effectiveness of AI detection tools in identifying AI-generated articles

Warsaw Circle

Reverse Engineering Raiders of the Lost Ark for the Atari 2600

The AI4Agile Practitioners Report 2026

Digital Independence Day

What a bot hacking attempt looks like: SQL injections galore

Show HN: FlashMesh – An encrypted file mesh across Google Drive and Dropbox

Show HN: AgentLens – Open-source observability and audit trail for AI agents