frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Fixing a single pointer bug unlocked 1M+ row JSON parsing on Windows

3•hilti•1h ago
I've been building a cross-platform JSONL viewer app that handles multi-GB files. It worked perfectly on macOS (my development machine), but consistently crashed on Windows at exactly 2,650 KB. Here's the debugging journey and the tiny fix that made all the difference.

The Problem

- macOS: Handles 5GB+ files effortlessly - Windows: Crashes at 2,650 KB every time - Same codebase, cross-compiled from Mac Silicon to Windows using MinGW

The Investigation

Added detailed logging to track execution. The crash happened during string interning after successfully parsing ~6,000 rows. Not during parsing, not during file I/O, but during the merge phase.

The Root Cause

My StringPool class used std::unordered_map<std::string_view, uint32_t> to deduplicate strings. The string_views pointed into a std::vector<std::string>.

When the vector grew and reallocated, all the string_view keys became dangling pointers. The hash map was full of invalid references.

Why did it work on macOS? Different memory allocator behavior, different default stack sizes (8MB vs 1MB), different reallocation patterns.

The Fix

Before (broken):

    uint32_t intern(std::string_view str) {
        auto it = indices_.find(str);
        if (it != indices_.end()) return it->second;
        
        uint32_t idx = strings_.size();
        strings_.push_back(std::string(str));
        indices_[std::string_view(strings_.back())] = idx;  // DANGER!
        return idx;
    }
After (fixed):

    uint32_t intern(const std::string& str) {
        auto it = indices_.find(std::string_view(str));
        if (it != indices_.end()) return it->second;
        
        // Preemptively rebuild if we're about to reallocate
        if (strings_.size() >= strings_.capacity()) {
            strings_.reserve(strings_.capacity() * 2);
            rebuildIndices();  // Fix all string_views!
        }
        
        uint32_t idx = strings_.size();
        strings_.push_back(str);
        indices_[std::string_view(strings_.back())] = idx;
        return idx;
    }
    
    void rebuildIndices() {
        indices_.clear();
        for (size_t i = 0; i < strings_.size(); i++) {
            indices_[std::string_view(strings_[i])] = i;
        }
    }
The Result

- 1 million rows: 6 seconds on Windows - Multi-GB files: No crashes - ~166,000 rows/second throughput - Cross-platform stability

Lessons Learned

1. std::string_view is powerful but dangerous - It's a non-owning reference. When the underlying storage moves, you're holding garbage.

2. Cross-platform testing is essential - The bug was invisible on macOS due to different allocator behavior and larger default stack sizes.

3. Structured logging beats debuggers for cross-compilation - I was cross-compiling from Mac to Windows. Adding timestamped logging to a file made the crash point obvious immediately.

4. Small changes, huge impact - One function, ~15 lines of code, turned "crashes at 2MB" into "handles 5GB+ files"

5. Performance stayed excellent - The rebuild only happens during vector reallocation (exponential growth), so amortized cost is negligible.

The Tech Stack

- simdjson (v4.2.2) for parsing - Multi-threaded parsing (20 threads on my test machine) - Columnar storage for memory efficiency - C++17, cross-compiled with MinGW-w64

This was a humbling reminder that the most critical bugs are often the simplest ones, hiding in plain sight behind platform differences.

Happy to discuss the implementation details, simdjson usage, or cross-platform C++ debugging techniques!

Show HN: We built ChatterBooth, an anonymous app to talk and chat freely

https://chatterbooth.app
1•billyjei•1m ago•0 comments

Parasocial Is Cambridge Dictionary Word of the Year

https://www.bbc.com/news/articles/cvgmv877746o
1•throw0101c•1m ago•0 comments

Show HN: Option trading P&L visualizer

https://optioncurves.com
1•artursapek•2m ago•0 comments

The Politics of AI Are About to Explode – The Datacenter Elections

https://www.bloomberg.com/news/audio/2025-11-19/odd-lots-the-politics-of-ai-are-about-to-explode-...
1•zerosizedweasle•2m ago•1 comments

Show HN: Speaker Analyzer – Get analytics on who spoke how much in your meetings

https://www.speakeranalyzer.com/
1•mbosch•2m ago•0 comments

Overthinker

https://cutlefish.substack.com/p/tbm-389-overthinker
1•vinhnx•3m ago•0 comments

How to Birth a Symbient

https://www.greig.cc/how-to-birth-a-symbient/
1•3stripe•3m ago•0 comments

To Be a Leader of Systems

https://hazelweakly.me/blog/to-be-a-leader-of-systems/
1•mooreds•4m ago•0 comments

Optimistic UI for AI coding: writing to disk with snapshot undo

https://blog.ayechat.ai/blog/2025-11-09-ayechat-optimistic-workflow/
1•acro-v•4m ago•1 comments

Listening is always hard, and it only gets harder at scale

https://another.rodeo/feedback/
1•mooreds•5m ago•0 comments

What Good Execution Looks Like

https://yusufaytas.com/what-good-execution-looks-like/
1•vinhnx•8m ago•0 comments

Grok broken with "Try Grok-4.1" popover that blocks usage due to a UI bug

https://old.reddit.com/r/grok/comments/1p19cna/grok_is_broken_for_me_today_get_try_grok_41/
1•karmakaze•9m ago•0 comments

Read the Fucking Manual

https://blainsmith.com/articles/read-the-fucking-manual/
1•speckx•10m ago•0 comments

Railway Oriented Programming

https://fsharpforfunandprofit.com/rop/
1•sandruso•11m ago•0 comments

New ultrasound technology can non-invasively measure blood viscosity

https://medicalxpress.com/news/2025-11-ultrasound-technology-invasively-blood-viscosity.html
1•PaulHoule•11m ago•0 comments

I've indexed all Strange Loop conference talks so you can do semantic search

https://devblogs.sh/library/strangeloop
1•iillexial•12m ago•0 comments

Build a coding agent with GPT 5.1

https://cookbook.openai.com/examples/build_a_coding_agent_with_gpt-5.1
1•vinhnx•12m ago•0 comments

After Auroras, What to Know about the Sun and Its Solar Cycle

https://www.scientificamerican.com/article/after-spectacular-auroras-what-to-know-about-the-sun-a...
1•quapster•12m ago•0 comments

Mastodon CEO steps down with €1M payout and a deep sigh

https://www.theregister.com/2025/11/19/mastodon_ceo_steps_down/
1•SanjayMehta•13m ago•0 comments

How Cloudflare uses Rust to serve (and break) millions of websites

https://kerkour.com/how-cloudflare-uses-rust
1•unsolved73•13m ago•0 comments

Red Bull Racing's secret weapon? An engineer who treats workflows like lap times

https://techcrunch.com/2025/11/11/red-bull-racings-secret-weapon-an-engineer-who-treats-workflows...
1•mooreds•14m ago•0 comments

Europe is scaling back its landmark privacy and AI laws

https://www.theverge.com/news/823750/european-union-ai-act-gdpr-changes
3•ksec•14m ago•0 comments

Ask HN: What operating systems, apps, etc. had your favorite UI designs?

2•pixelworm•15m ago•1 comments

How to Stay Sane in a World That Rewards Insanity

https://www.joanwestenberg.com/p/how-to-stay-sane-in-a-world-that-rewards-insanity
4•enbywithunix•15m ago•0 comments

Ask HN: How would you build the next Silicon Valley?

1•inodeman•16m ago•1 comments

Show HN: Web Agents like Browser-use and Browserbase need a way to authenticate

https://github.com/auth-agent/auth-agent
2•hkpatel3•16m ago•0 comments

Hack Review-A code review tool like coderabbit

https://github.com/DragonSenseiGuy/hack-review
1•dragonsenseiguy•19m ago•1 comments

Show HN: Token Economics Calculator for AI inference hardware

https://www.tensordyne.ai/token-economics-calculator
6•paul_td•19m ago•0 comments

Antigravity for Professionals

https://antigravity.google/use-cases/professional
1•eibrahim•20m ago•1 comments

Flu vaccine providing important protection despite new subclade

https://www.gov.uk/government/news/flu-vaccine-providing-important-protection-despite-new-subclade
3•osivertsson•20m ago•0 comments