news newest ask show jobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: I benchmarked how good LLMs are at proofreading English

https://github.com/reviseio/errata-bench

3•artursapek•1h ago

Comments

kouteiheika•1h ago

It'd be nice if you could add a separate leaderboard for open-weight models on your results page (or add the ability to filter-out proprietary models).

Also, why use an agent for this? This doesn't make much sense to me, considering it's supposed to be "measuring how well models can find and fix errors in human-written text" -- here you're just as much measuring the model's agentic capabilities as you're measuring its ability to correct the text.

I suppose this is somewhat of an interesting benchmark too, but if I were interested in cost-effective proofreading of a ton of text I'd just do it the old fashioned way: split my text into chunks, write a nice prompt telling the model to proofread the given text and return me the result, attach the prompt to each chunk of text to proofread, and let it rip.

Context Is Finite. Who Maintains It?

https://blog.gchinis.com/posts/2026/04/self-organizing-agents/

1•gchinis•19s ago•0 comments

Release PiClaw v2.0.1 – Ferrix

https://github.com/rcarmo/piclaw/releases/tag/v2.0.1

1•rcarmo•4m ago•0 comments

Trump ousts National Science Board members

https://www.washingtonpost.com/science/2026/04/25/national-science-board-members-dismissed/

2•acdha•5m ago•0 comments

Is the World Ready for a Car Without a Rear Window?

https://www.wsj.com/lifestyle/cars/polestar-4-dan-neil-review-407f11a6

1•bookofjoe•7m ago•1 comments

Why your AI assistant is suddenly selling to you

https://www.economist.com/business/2026/04/19/why-your-ai-assistant-is-suddenly-selling-to-you

1•edward•8m ago•0 comments

Automate TLS for Dynamic Domains with Traefik and Hetzner DNS

https://matija.eu/posts/dynamic-domain-certs-traefik-hetzner/

1•mmunj•8m ago•0 comments

AI Might Be Lying to Your Boss

https://williamoconnell.me/blog/post/ai-ide/

3•annjose•9m ago•0 comments

Go quirks: function closures capturing mutable references

https://rednafi.com/go/closure-mutable-refs/

1•Brajeshwar•10m ago•0 comments

Can you stop beans from making you gassy?

https://www.seriouseats.com/how-to-reduce-bean-gas-tested-11883862

1•jstrieb•12m ago•0 comments

Show HN: The Order of the Agents – Make Codex and Claude Create the Perfect PRD

https://github.com/btahir/agent-order

1•bilater•14m ago•0 comments

RFC: Oden: The Server-First, JavaScript-Esque Runtime

https://rfchub.com/phobos/rfc5-oden-the-server-first-javascript-esque-runtime

1•tlhunter•18m ago•0 comments

The U.K. Smoking Ban Is Illiberal

https://www.theatlantic.com/ideas/2026/04/case-against-uk-smoking-ban/686949/

2•JumpCrisscross•18m ago•1 comments

Building Semantic Version Control in Rust

https://therohansharma.com/semantic-version-control-rust

1•lukastyrychtr•19m ago•0 comments

Logs say success. The system says otherwise

https://blog.bridgexapi.io/why-your-logs-say-everything-worked-even-when-it-didnt

1•Bridgexapi•20m ago•1 comments

Show HN: Good AI Task – a tool for asking AI what it can and can't do

https://goodaitask.com

1•jmt710•21m ago•0 comments

Nicholas Carlini – Black-hat LLMs [video]

https://www.youtube.com/watch?v=1sd26pWhfmg

6•simonebrunozzi•25m ago•0 comments

Show HN: Useknockout open source background removal API 40× cheaper -remove.bg

https://github.com/useknockout/api

3•tlorents•25m ago•0 comments

Show HN: AI Visibility Monitor – Track if your site gets cited by GPT/Claude

https://github.com/WorkSmartAI-alt/ai-visibility-monitor

3•balance006•26m ago•0 comments

Check Cloudflare D1, R2, Workers usage – see remaining limits for today/month

https://dialtoneapp.com/cloudflare

3•fcpguru•28m ago•1 comments

LLM-Rosetta: Zero-Dep API Translator for OpenAI, Anthropic, Google and Streaming

https://github.com/Oaklight/llm-rosetta

2•Oaklight•32m ago•0 comments

Artifacts Are Alive (and Photographs Are Dead)

https://worksonmymachine.ai/p/artifacts-are-alive-and-photographs

5•Stwerner•34m ago•1 comments

Show HN: Mapping Sonnet's thinking process via flame charts

https://adamsohn.com/lambda-variance/

2•dataviz1000•34m ago•0 comments

You're about to feel the AI money squeeze

https://www.theverge.com/ai-artificial-intelligence/917380/ai-monetization-anthropic-openai-token...

4•negura•35m ago•1 comments

Adding a team was the wrong strategic decision

https://learnings.aleixmorgadas.dev/p/adding-a-team-was-the-wrong-strategic

2•milkglass•36m ago•0 comments

The Stanford Freshmen Who Want to Rule the World

https://www.theatlantic.com/ideas/2026/04/stanford-students-power/686920/

8•apparent•37m ago•1 comments

Zerodep: Performant single-file, zero-dep Python modules (protobuf, YAML, etc.)

https://github.com/Oaklight/zerodep

2•Oaklight•42m ago•0 comments

Self-Hosted AI Red Team Tools

https://aetherverseintel.gumroad.com/l/vpzqnk

2•valuria•46m ago•0 comments

Agentic AI Chip Design Built a Full RISC-V Core

https://spectrum.ieee.org/ai-chip-design

2•rbanffy•46m ago•0 comments

Microsoft Reportedly Looking at Rebasing Azure Linux on Fedora

https://www.phoronix.com/news/MS-Azure-Linux-Fedora-Based

3•rbanffy•47m ago•0 comments

With TPU 8, Google Makes GenAI Systems Better, Not Just Bigger

https://www.nextplatform.com/compute/2026/04/24/with-tpu-8-google-makes-genai-systems-much-better...

3•rbanffy•48m ago•0 comments