frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Show HN: Engineering Perception with Combinatorial Memetics

1•alan_sass•3m ago•1 comments

Show HN: Steam Daily – A Wordle-like daily puzzle game for Steam fans

https://steamdaily.xyz
1•itshellboy•5m ago•0 comments

The Anthropic Hive Mind

https://steve-yegge.medium.com/the-anthropic-hive-mind-d01f768f3d7b
1•spenvo•5m ago•0 comments

Just Started Using AmpCode

https://intelligenttools.co/blog/ampcode-multi-agent-production
1•BojanTomic•6m ago•0 comments

LLM as an Engineer vs. a Founder?

1•dm03514•7m ago•0 comments

Crosstalk inside cells helps pathogens evade drugs, study finds

https://phys.org/news/2026-01-crosstalk-cells-pathogens-evade-drugs.html
2•PaulHoule•8m ago•0 comments

Show HN: Design system generator (mood to CSS in <1 second)

https://huesly.app
1•egeuysall•8m ago•1 comments

Show HN: 26/02/26 – 5 songs in a day

https://playingwith.variousbits.net/saturday
1•dmje•9m ago•0 comments

Toroidal Logit Bias – Reduce LLM hallucinations 40% with no fine-tuning

https://github.com/Paraxiom/topological-coherence
1•slye514•11m ago•1 comments

Top AI models fail at >96% of tasks

https://www.zdnet.com/article/ai-failed-test-on-remote-freelance-jobs/
4•codexon•12m ago•1 comments

The Science of the Perfect Second (2023)

https://harpers.org/archive/2023/04/the-science-of-the-perfect-second/
1•NaOH•13m ago•0 comments

Bob Beck (OpenBSD) on why vi should stay vi (2006)

https://marc.info/?l=openbsd-misc&m=115820462402673&w=2
2•birdculture•16m ago•0 comments

Show HN: a glimpse into the future of eye tracking for multi-agent use

https://github.com/dchrty/glimpsh
1•dochrty•17m ago•0 comments

The Optima-l Situation: A deep dive into the classic humanist sans-serif

https://micahblachman.beehiiv.com/p/the-optima-l-situation
2•subdomain•17m ago•1 comments

Barn Owls Know When to Wait

https://blog.typeobject.com/posts/2026-barn-owls-know-when-to-wait/
1•fintler•18m ago•0 comments

Implementing TCP Echo Server in Rust [video]

https://www.youtube.com/watch?v=qjOBZ_Xzuio
1•sheerluck•18m ago•0 comments

LicGen – Offline License Generator (CLI and Web UI)

1•tejavvo•21m ago•0 comments

Service Degradation in West US Region

https://azure.status.microsoft/en-gb/status?gsid=5616bb85-f380-4a04-85ed-95674eec3d87&utm_source=...
2•_____k•21m ago•0 comments

The Janitor on Mars

https://www.newyorker.com/magazine/1998/10/26/the-janitor-on-mars
1•evo_9•23m ago•0 comments

Bringing Polars to .NET

https://github.com/ErrorLSC/Polars.NET
3•CurtHagenlocher•25m ago•0 comments

Adventures in Guix Packaging

https://nemin.hu/guix-packaging.html
1•todsacerdoti•26m ago•0 comments

Show HN: We had 20 Claude terminals open, so we built Orcha

1•buildingwdavid•26m ago•0 comments

Your Best Thinking Is Wasted on the Wrong Decisions

https://www.iankduncan.com/engineering/2026-02-07-your-best-thinking-is-wasted-on-the-wrong-decis...
1•iand675•26m ago•0 comments

Warcraftcn/UI – UI component library inspired by classic Warcraft III aesthetics

https://www.warcraftcn.com/
1•vyrotek•27m ago•0 comments

Trump Vodka Becomes Available for Pre-Orders

https://www.forbes.com/sites/kirkogunrinde/2025/12/01/trump-vodka-becomes-available-for-pre-order...
1•stopbulying•29m ago•0 comments

Velocity of Money

https://en.wikipedia.org/wiki/Velocity_of_money
1•gurjeet•31m ago•0 comments

Stop building automations. Start running your business

https://www.fluxtopus.com/automate-your-business
1•valboa•36m ago•1 comments

You can't QA your way to the frontier

https://www.scorecard.io/blog/you-cant-qa-your-way-to-the-frontier
1•gk1•37m ago•0 comments

Show HN: PalettePoint – AI color palette generator from text or images

https://palettepoint.com
1•latentio•37m ago•0 comments

Robust and Interactable World Models in Computer Vision [video]

https://www.youtube.com/watch?v=9B4kkaGOozA
2•Anon84•41m ago•0 comments
Open in hackernews

An Open-Source Framework for Building Stable and Reliable LLM-Powered Systems

https://chatbot-testing-framework.readthedocs.io/en/latest/
2•alexostrovskyy•4mo ago

Comments

alexostrovskyy•4mo ago
I think many of us have felt the pain of building a cool LLM-powered application or RAG pipeline, only to find it's too brittle and unpredictable for real-world use. The core problem is that they are black boxes. When they fail, it's hard to know why.

I've been focused on this problem of "productionizing" AI workflows. It's not just about testing; it's about deep observability, performance tuning, and building systems you can trust to be stable.

I wrote up a guide on a methodology I've found very effective. It's based on an open-source framework that uses decorators to trace the entire execution path of a chatbot. This gives you the data to:

- Pinpoint Performance Bottlenecks: See the exact latency of every LLM call, tool use, and retrieval step. - Automate Quality Control: Use an LLM-as-a-judge to programmatically check for hallucinations (groundedness), safety violations, and adherence to custom rules. - Create a Feedback Loop for Improvement: When you change a prompt or logic, you can run the test suite and get a concrete report on whether performance and reliability have improved or worsened.

You can read the guide here: - LangChain-based application: https://alexostrovskyy.com/the-glass-box-why-your-chatbot-ne..., - LlamaIndex-based application: https://alexostrovskyy.com/production-llm-chatbot-tracing-an...

I’ve created this open-source project to use in my projects and help other creators.

My goal is to create a framework (open-source) that can help us build stable, trustworthy AI systems, not just clever demos.

I'd be very interested to hear feedback from other engineers and creators.