frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Deep Dive into G-Eval: How LLMs Evaluate Themselves

https://medium.com/@zlatkov/deep-dive-into-g-eval-how-llms-evaluate-themselves-743624d22bf7
9•zlatkov•1h ago

Comments

eeasss•1h ago
Are there any llms in particular that work best with g-evals?
zlatkov•1h ago
I haven’t come across any research showing that a specific LLM consistently outperforms others for this. It generally works best with strong reasoning models that produce consistent outputs.
lyuata•1h ago
LLM Benchmark leaderboard for common evals sounds like a fun idea to me.
kirchoni•1h ago
Interesting overview, though I still wonder how stable G-Eval really is across different model families. Auto-CoT helps with consistency, but I’ve seen drift even between API versions of the same model.
zlatkov•1h ago
That's true. Even small API or model version updates can shift evaluation behavior. G-Eval helps reduce that variance, but it doesn’t eliminate it completely. I think long-term stability will probably require some combination of fixed reference models and calibration datasets.

Study: The Musk Partisan Effect on Tesla Sales

https://www.nber.org/papers/w34413
2•BeetleB•1m ago•0 comments

Simple net worth tracker built in vanilla JavaScript

https://ballpark.cc
1•rnmp•3m ago•0 comments

A new tool for understanding chromosome abnormalities in the eggs of older women

https://news.yale.edu/2025/11/03/new-tool-understanding-chromosome-abnormalities-eggs-older-women
1•gmays•5m ago•0 comments

A Practical Experiment in Building an AI Agent Swarm

https://obie.medium.com/a-practical-experiment-in-building-an-ai-agent-swarm-d9f7e989f8f2
1•obiefernandez•6m ago•0 comments

TidesDB – A persistent key-value store for fast storage

https://tidesdb.com
1•alexpadula•6m ago•0 comments

Myna: Monospace typeface designed for symbol-heavy programming languages

https://github.com/sayyadirfanali/Myna
1•todsacerdoti•7m ago•0 comments

Sorry, Pixel 9 and 10 owners: Google won't be fixing that speakerphone issue

https://www.androidauthority.com/pixel-10-speakerphone-bug-3612429/
1•josephcsible•9m ago•0 comments

How to check if a .onion is alive or dead (with Hidden Service Descriptors)

https://tech.michaelaltfield.net/2025/11/05/onion-service-alive-dead/
2•maltfield•10m ago•1 comments

OpenAI for Science

https://openai.com/science/
1•wavelander•10m ago•0 comments

World Economic Forum chief warns of three possible 'bubbles' in global economy

https://www.reuters.com/world/americas/world-economic-forum-chief-warns-three-possible-bubbles-gl...
2•speckx•13m ago•0 comments

Perplexity AI accuses Amazon of bullying with Comet legal threat

https://www.cnbc.com/2025/11/04/perplexity-ai-amazon-bullying-comet-browser.html
1•kjhughes•13m ago•0 comments

Can Melatonin Cause Heart Failure? What to Know About Claims of Health Risks

https://www.nytimes.com/2025/11/05/well/melatonin-heart-health-study.html
2•donohoe•14m ago•0 comments

From userscript to Chrome extension: say goodbye to Greasemonkey, Tampermonkey

https://prahladyeri.github.io/blog/2022/10/converting-userscripts-to-chrome-extensions.html
1•toomuchtodo•14m ago•0 comments

Amazon sues AI startup over browser's automated shopping and buying feature

https://www.theguardian.com/technology/2025/nov/05/amazon-perplexity-ai-lawsuit
1•barbazoo•14m ago•2 comments

Zensical – A modern static site generator by the creators of Material for MkDocs

https://zensical.org/docs/get-started/
1•kalendos•14m ago•0 comments

Stop vibe coding your unit tests

https://www.andy-gallagher.com/blog/stop-vibe-coding-your-unit-tests/
1•AirMax98•15m ago•0 comments

Grammarly is changing its name to Superhuman

https://www.theverge.com/news/808472/grammarly-superhuman-ai-rebrand-relaunch
1•ilamont•19m ago•1 comments

Multi-View Omnidirectional Vision/Structured Light for High-Precision Mapping

https://www.mdpi.com/1424-8220/25/20/6485
1•PaulHoule•19m ago•0 comments

Knife Juggling: How We're Building a New Free AI Normal

https://cto.new/blog/knife-juggling-how-we-re-building-a-new-free-ai-normal
1•janpio•20m ago•0 comments

Runc container breakouts: CVE-2025-31133, CVE-2025-52565, and CVE-2025-52881

https://www.openwall.com/lists/oss-security/2025/11/05/3
1•eyberg•20m ago•0 comments

Text rendering and effects using GPU-computed distances

https://blog.pkh.me/p/47-text-rendering-and-effects-using-gpu-computed-distances.html
1•mpweiher•21m ago•0 comments

Google Maps navigation gets a powerful boost with Gemini

https://blog.google/products/maps/gemini-navigation-features-landmark-lens/
2•ChrisArchitect•22m ago•0 comments

Using XDP for Egress Traffic

https://www.loopholelabs.io/blog/xdp-for-egress-traffic
1•todsacerdoti•23m ago•0 comments

I want a good parallel language

https://docs.google.com/presentation/d/1Kz8UIS5-ynJvDumKPNMX-5XUMsDhkbyWsL_jl1d4yrM/edit?slide=id...
1•mpweiher•23m ago•0 comments

Structured data access layer for AI agents

https://docs.pylar.ai
1•Hoshang07•23m ago•0 comments

Ask HN: Should LLM be able to translate C to rust as easy as English to Japanese

2•tonyplee•23m ago•2 comments

Streaming AI Agent Desktops with Gaming Protocols

https://blog.helix.ml/p/technical-deep-dive-on-streaming
1•quesobob•24m ago•0 comments

'Code quality' doesn't matter because it won't make you successful

https://www.businessinsider.com/block-cto-code-quality-sucess-solving-problems-dhanji-prasanna-20...
1•flail•24m ago•0 comments

Show HN: Bookkeeping tool we built after missing our tax deadline

https://www.layernext.ai
1•bmadduma•25m ago•2 comments

Show HN: We tested 9 AI models with 37K+ security tests

https://www.modelred.ai/leaderboard
1•NabilModelRed•26m ago•0 comments