frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Fixing a single pointer bug unlocked 1M+ row JSON parsing on Windows

4•hilti•2mo ago
I've been building a cross-platform JSONL viewer app that handles multi-GB files. It worked perfectly on macOS (my development machine), but consistently crashed on Windows at exactly 2,650 KB. Here's the debugging journey and the tiny fix that made all the difference.

The Problem

- macOS: Handles 5GB+ files effortlessly - Windows: Crashes at 2,650 KB every time - Same codebase, cross-compiled from Mac Silicon to Windows using MinGW

The Investigation

Added detailed logging to track execution. The crash happened during string interning after successfully parsing ~6,000 rows. Not during parsing, not during file I/O, but during the merge phase.

The Root Cause

My StringPool class used std::unordered_map<std::string_view, uint32_t> to deduplicate strings. The string_views pointed into a std::vector<std::string>.

When the vector grew and reallocated, all the string_view keys became dangling pointers. The hash map was full of invalid references.

Why did it work on macOS? Different memory allocator behavior, different default stack sizes (8MB vs 1MB), different reallocation patterns.

The Fix

Before (broken):

    uint32_t intern(std::string_view str) {
        auto it = indices_.find(str);
        if (it != indices_.end()) return it->second;
        
        uint32_t idx = strings_.size();
        strings_.push_back(std::string(str));
        indices_[std::string_view(strings_.back())] = idx;  // DANGER!
        return idx;
    }
After (fixed):

    uint32_t intern(const std::string& str) {
        auto it = indices_.find(std::string_view(str));
        if (it != indices_.end()) return it->second;
        
        // Preemptively rebuild if we're about to reallocate
        if (strings_.size() >= strings_.capacity()) {
            strings_.reserve(strings_.capacity() * 2);
            rebuildIndices();  // Fix all string_views!
        }
        
        uint32_t idx = strings_.size();
        strings_.push_back(str);
        indices_[std::string_view(strings_.back())] = idx;
        return idx;
    }
    
    void rebuildIndices() {
        indices_.clear();
        for (size_t i = 0; i < strings_.size(); i++) {
            indices_[std::string_view(strings_[i])] = i;
        }
    }
The Result

- 1 million rows: 6 seconds on Windows - Multi-GB files: No crashes - ~166,000 rows/second throughput - Cross-platform stability

Lessons Learned

1. std::string_view is powerful but dangerous - It's a non-owning reference. When the underlying storage moves, you're holding garbage.

2. Cross-platform testing is essential - The bug was invisible on macOS due to different allocator behavior and larger default stack sizes.

3. Structured logging beats debuggers for cross-compilation - I was cross-compiling from Mac to Windows. Adding timestamped logging to a file made the crash point obvious immediately.

4. Small changes, huge impact - One function, ~15 lines of code, turned "crashes at 2MB" into "handles 5GB+ files"

5. Performance stayed excellent - The rebuild only happens during vector reallocation (exponential growth), so amortized cost is negligible.

The Tech Stack

- simdjson (v4.2.2) for parsing - Multi-threaded parsing (20 threads on my test machine) - Columnar storage for memory efficiency - C++17, cross-compiled with MinGW-w64

This was a humbling reminder that the most critical bugs are often the simplest ones, hiding in plain sight behind platform differences.

Happy to discuss the implementation details, simdjson usage, or cross-platform C++ debugging techniques!

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

https://github.com/valdanylchuk/breezydemo
178•isitcontent•9h ago•21 comments

Show HN: I spent 4 years building a UI design tool with only the features I use

https://vecti.com
290•vecti•11h ago•130 comments

Show HN: If you lose your memory, how to regain access to your computer?

https://eljojo.github.io/rememory/
237•eljojo•12h ago•145 comments

Show HN: ARM64 Android Dev Kit

https://github.com/denuoweb/ARM64-ADK
14•denuoweb•1d ago•2 comments

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

https://github.com/phreda4/r3
61•phreda4•9h ago•11 comments

Show HN: Smooth CLI – Token-efficient browser for AI agents

https://docs.smooth.sh/cli/overview
83•antves•1d ago•60 comments

Show HN: Slack CLI for Agents

https://github.com/stablyai/agent-slack
45•nwparker•1d ago•11 comments

Show HN: Fitspire – a simple 5-minute workout app for busy people (iOS)

https://apps.apple.com/us/app/fitspire-5-minute-workout/id6758784938
2•devavinoth12•2h ago•0 comments

Show HN: Gigacode – Use OpenCode's UI with Claude Code/Codex/Amp

https://github.com/rivet-dev/sandbox-agent/tree/main/gigacode
16•NathanFlurry•17h ago•6 comments

Show HN: Artifact Keeper – Open-Source Artifactory/Nexus Alternative in Rust

https://github.com/artifact-keeper
148•bsgeraci•1d ago•62 comments

Show HN: I built a RAG engine to search Singaporean laws

https://github.com/adityaprasad-sudo/Explore-Singapore
4•ambitious_potat•3h ago•4 comments

Show HN: Horizons – OSS agent execution engine

https://github.com/synth-laboratories/Horizons
23•JoshPurtell•1d ago•5 comments

Show HN: Daily-updated database of malicious browser extensions

https://github.com/toborrm9/malicious_extension_sentry
14•toborrm9•14h ago•5 comments

Show HN: FastLog: 1.4 GB/s text file analyzer with AVX2 SIMD

https://github.com/AGDNoob/FastLog
5•AGDNoob•5h ago•1 comments

Show HN: Falcon's Eye (isometric NetHack) running in the browser via WebAssembly

https://rahuljaguste.github.io/Nethack_Falcons_Eye/
4•rahuljaguste•8h ago•1 comments

Show HN: BioTradingArena – Benchmark for LLMs to predict biotech stock movements

https://www.biotradingarena.com/hn
23•dchu17•14h ago•12 comments

Show HN: I built a directory of $1M+ in free credits for startups

https://startupperks.directory
4•osmansiddique•6h ago•0 comments

Show HN: Micropolis/SimCity Clone in Emacs Lisp

https://github.com/vkazanov/elcity
171•vkazanov•1d ago•49 comments

Show HN: A Kubernetes Operator to Validate Jupyter Notebooks in MLOps

https://github.com/tosin2013/jupyter-notebook-validator-operator
2•takinosh•7h ago•0 comments

Show HN: A password system with no database, no sync, and nothing to breach

https://bastion-enclave.vercel.app
11•KevinChasse•14h ago•11 comments

Show HN: Local task classifier and dispatcher on RTX 3080

https://github.com/resilientworkflowsentinel/resilient-workflow-sentinel
25•Shubham_Amb•1d ago•2 comments

Show HN: 33rpm – A vinyl screensaver for macOS that syncs to your music

https://33rpm.noonpacific.com/
3•kaniksu•8h ago•0 comments

Show HN: GitClaw – An AI assistant that runs in GitHub Actions

https://github.com/SawyerHood/gitclaw
9•sawyerjhood•15h ago•0 comments

Show HN: Chiptune Tracker

https://chiptunes.netlify.app
3•iamdan•8h ago•1 comments

Show HN: Craftplan – I built my wife a production management tool for her bakery

https://github.com/puemos/craftplan
567•deofoo•5d ago•166 comments

Show HN: An open-source system to fight wildfires with explosive-dispersed gel

https://github.com/SpOpsi/Project-Baver
2•solarV26•12h ago•0 comments

Show HN: Agentism – Agentic Religion for Clawbots

https://www.agentism.church
2•uncanny_guzus•12h ago•0 comments

Show HN: Disavow Generator – Open-source tool to defend against negative SEO

https://github.com/BansheeTech/Disavow-Generator
5•SurceBeats•18h ago•1 comments

Show HN: Total Recall – write-gated memory for Claude Code

https://github.com/davegoldblatt/total-recall
10•davegoldblatt•1d ago•6 comments

Show HN: BPU – Reliable ESP32 Serial Streaming with Cobs and CRC

https://github.com/choihimchan/bpu-stream-engine
2•octablock•14h ago•0 comments