frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

EVs Are a Failed Experiment

https://spectator.org/evs-are-a-failed-experiment/
1•ArtemZ•4m ago•1 comments

MemAlign: Building Better LLM Judges from Human Feedback with Scalable Memory

https://www.databricks.com/blog/memalign-building-better-llm-judges-human-feedback-scalable-memory
1•superchink•5m ago•0 comments

CCC (Claude's C Compiler) on Compiler Explorer

https://godbolt.org/z/asjc13sa6
1•LiamPowell•7m ago•0 comments

Homeland Security Spying on Reddit Users

https://www.kenklippenstein.com/p/homeland-security-spies-on-reddit
2•duxup•10m ago•0 comments

Actors with Tokio (2021)

https://ryhl.io/blog/actors-with-tokio/
1•vinhnx•11m ago•0 comments

Can graph neural networks for biology realistically run on edge devices?

https://doi.org/10.21203/rs.3.rs-8645211/v1
1•swapinvidya•23m ago•1 comments

Deeper into the shareing of one air conditioner for 2 rooms

1•ozzysnaps•25m ago•0 comments

Weatherman introduces fruit-based authentication system to combat deep fakes

https://www.youtube.com/watch?v=5HVbZwJ9gPE
2•savrajsingh•26m ago•0 comments

Why Embedded Models Must Hallucinate: A Boundary Theory (RCC)

http://www.effacermonexistence.com/rcc-hn-1-1
1•formerOpenAI•28m ago•2 comments

A Curated List of ML System Design Case Studies

https://github.com/Engineer1999/A-Curated-List-of-ML-System-Design-Case-Studies
3•tejonutella•32m ago•0 comments

Pony Alpha: New free 200K context model for coding, reasoning and roleplay

https://ponyalpha.pro
1•qzcanoe•36m ago•1 comments

Show HN: Tunbot – Discord bot for temporary Cloudflare tunnels behind CGNAT

https://github.com/Goofygiraffe06/tunbot
1•g1raffe•39m ago•0 comments

Open Problems in Mechanistic Interpretability

https://arxiv.org/abs/2501.16496
2•vinhnx•44m ago•0 comments

Bye Bye Humanity: The Potential AMOC Collapse

https://thatjoescott.com/2026/02/03/bye-bye-humanity-the-potential-amoc-collapse/
2•rolph•49m ago•0 comments

Dexter: Claude-Code-Style Agent for Financial Statements and Valuation

https://github.com/virattt/dexter
1•Lwrless•50m ago•0 comments

Digital Iris [video]

https://www.youtube.com/watch?v=Kg_2MAgS_pE
1•vermilingua•55m ago•0 comments

Essential CDN: The CDN that lets you do more than JavaScript

https://essentialcdn.fluidity.workers.dev/
1•telui•56m ago•1 comments

They Hijacked Our Tech [video]

https://www.youtube.com/watch?v=-nJM5HvnT5k
1•cedel2k1•1h ago•0 comments

Vouch

https://twitter.com/mitchellh/status/2020252149117313349
34•chwtutha•1h ago•5 comments

HRL Labs in Malibu laying off 1/3 of their workforce

https://www.dailynews.com/2026/02/06/hrl-labs-cuts-376-jobs-in-malibu-after-losing-government-work/
4•osnium123•1h ago•1 comments

Show HN: High-performance bidirectional list for React, React Native, and Vue

https://suhaotian.github.io/broad-infinite-list/
2•jeremy_su•1h ago•0 comments

Show HN: I built a Mac screen recorder Recap.Studio

https://recap.studio/
1•fx31xo•1h ago•1 comments

Ask HN: Codex 5.3 broke toolcalls? Opus 4.6 ignores instructions?

1•kachapopopow•1h ago•0 comments

Vectors and HNSW for Dummies

https://anvitra.ai/blog/vectors-and-hnsw/
1•melvinodsa•1h ago•0 comments

Sanskrit AI beats CleanRL SOTA by 125%

https://huggingface.co/ParamTatva/sanskrit-ppo-hopper-v5/blob/main/docs/blog.md
1•prabhatkr•1h ago•1 comments

'Washington Post' CEO resigns after going AWOL during job cuts

https://www.npr.org/2026/02/07/nx-s1-5705413/washington-post-ceo-resigns-will-lewis
4•thread_id•1h ago•1 comments

Claude Opus 4.6 Fast Mode: 2.5× faster, ~6× more expensive

https://twitter.com/claudeai/status/2020207322124132504
1•geeknews•1h ago•0 comments

TSMC to produce 3-nanometer chips in Japan

https://www3.nhk.or.jp/nhkworld/en/news/20260205_B4/
3•cwwc•1h ago•0 comments

Quantization-Aware Distillation

http://ternarysearch.blogspot.com/2026/02/quantization-aware-distillation.html
2•paladin314159•1h ago•0 comments

List of Musical Genres

https://en.wikipedia.org/wiki/List_of_music_genres_and_styles
1•omosubi•1h ago•0 comments
Open in hackernews

Ask HN: Good LLM Observability Platforms?

6•seany62•3mo ago
My company has been through 3 different "LLM Observability" vendors and they each have failed to give us the one (simple) thing we want. Willing to pay for this.

The ONLY thing we care about is the ability to: - Log an LLM completion, and be able to press a button that lets us re-run the exact same completion in a UI (industry seems to call this the "playground"). We can rerun this completion exactly how it was in production.

What we DO NOT care about: - "datasets" - "scores" - "prompt enhancers"

Comments

uaas•3mo ago
I am curious, what’s the point of re-running these interactions on a UI?
muzani•3mo ago
Reproduction I suppose. I would like the same things as OP too.

LLM outputs are qualitative; they can't really be automatically scored and prompt enhancements tend to multiply the bug. It can solve a problem, but introduce a new one. It's practical just to do it manually.

thiago_fm•3mo ago
I'm sure if you ask Claude Code exactly that, they will develop what you want.

Tell it to create an API for the LLM data ingestion, then integrate with it on your software.

BTW, this is far from what an LLM Observability tool will offer you. You are a bit confused what O11Y is.

debadyutirc•3mo ago
What entails the LLM Completion are you talking sequence of prompts with files / mcp servers. Could you share a bit more, cause I have spent some time with this and have something that might be precisely what you are asking for...
Wonnk13•3mo ago
When I think of LLM / Agent observability I think of some combination of open telemetry and like Influxdb, but I don't think that's what your asking for?