frontpage.

Show HN: Nebark – Simple A/B Testing for system prompts using steganography

https://app.nebark.com/

1•nicolasmery•2h ago

New project!

I just built the first version of Nebark, an A/B testing platform for LLM system prompts. It aims to solve a very specific pain point: tracking prompt performance without forcing developers to wire trace IDs all the way through their backend to their frontend.

The Problem If you want to know which system prompt variant generates better user feedback (upvotes, downvotes, or copy-to-clipboard events), the standard approach is intrusive. You have to generate a trace ID in your backend, pass it down to your client, attach it to your UI components, and send it back to your analytics DB. It creates friction and litters your API responses with telemetry metadata.

The Solution: Context Hashing We decoupled the telemetry entirely using what we call "Context Hashing" to bridge the backend and frontend asynchronously.

Here is how the architecture works:

The Proxy (Backend): You point your OpenAI baseURL to our gateway. We intercept the request, inject Variant A or B of your system prompt, and stream the response back. Once the stream closes, our proxy calculates a unique cryptographic hash based on the interaction's content and stores it as a blind trace.

The SDK (Frontend): A lightweight vanilla JS script watches the DOM. It smartly waits for the AI's response to finish streaming and rendering on the screen. It then extracts the visible text and calculates the exact same unique hash locally, without intercepting any network traffic.

The Match: The SDK injects the feedback UI (/). When a user clicks, the frontend sends this calculated Hash and a local Session ID to our DB. We match this Hash against the Proxy's traces to attribute the vote to the correct prompt variant.

Why it’s interesting

Zero Backend Config: You only change the base URL. The backend remains completely unaware of the A/B test or the telemetry.

Semantic Caching Immunity: If your backend uses Redis to serve a cached response and skips our Proxy, the frontend will generate a Hash that doesn't exist in our DB. It naturally prevents skewed A/B data from cached hits.

The Edge Cases (Where I need your feedback) The biggest risk with DOM hashing is hydration/rendering discrepancies. If a client's frontend uses an aggressive Markdown parser that strips out specific characters before rendering the text, the frontend hash won't match the proxy hash. We built a strict internal normalization engine on both ends to mitigate this, but it is an ongoing challenge.

I’d love to hear your thoughts on this architecture. Is there a glaring edge case with DOM extraction or SSE proxying that I’m missing? Its free for now. Tear it apart.

Show HN: The Sanguine Box – A 2026 vision for solo-produced comics

Low-Cost Microscope to Study Living Cells in Zero Gravity

Show HN: Natural language semiconductor geometry generator powered by LLMs

Whale Fall

HN Client for iOS

Free Will: A 3-Minute Impromptu Speech Challenge

Show HN: Blindspot – a userscript to block tab-switch detection

Is It All over for Filmmakers?

The Human Root of Trust – public domain framework for agent accountability

You can now play Prey (2006), with multiplayer, in an open source engine

I fell asleep on my solo AI agent project and woke up to it running my WhatsApp

The CIA and MI6 got hold of Putin's Ukraine plans but nobody believed them

A 'Doom Loop' at the Heart of the Global Economy

Denmark Is Buying US Missiles to Defend Greenland from the United States

America now imports more from Taiwan than from China

Why Are Chinese EVs So Cheap?

The Rise and Fall of Scala: A Love Letter to the Language That Broke My Heart

Does Congress Have to Approve Trump's $10B Gaza Pledge?

Meshcore Companion v1.13.0+BC – BitChat Bridge Support

Show HN: 3mins.news – AI daily news briefing in 17 languages, designed to end

Unaffordable Housing Impacts How Americans Consume, Work and Invest

Nobel laureate invents machine that harvests water from dry air

Should scientific publishing adapt to AI-authored research?

Show HN: A macOS toolbar app that resolves issues in your GitHub repos

Ask HN: How to find a Sales cofounder for a B2B SaaS fintech compliance product?

Hackers Expose Age-Verification Software Powering Surveillance Web

My Apple Watch-only app had no App Store analytics for 18 months

Bare-metal LLM execution without the Python/Node runtime tax

Where are the most endangered languages in the world?

eBPF on Hard Mode