news newest ask show jobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: I benchmarked how good LLMs are at proofreading English

https://github.com/reviseio/errata-bench

3•artursapek•3h ago

Comments

kouteiheika•2h ago

It'd be nice if you could add a separate leaderboard for open-weight models on your results page (or add the ability to filter-out proprietary models).

Also, why use an agent for this? This doesn't make much sense to me, considering it's supposed to be "measuring how well models can find and fix errors in human-written text" -- here you're just as much measuring the model's agentic capabilities as you're measuring its ability to correct the text.

I suppose this is somewhat of an interesting benchmark too, but if I were interested in cost-effective proofreading of a ton of text I'd just do it the old fashioned way: split my text into chunks, write a nice prompt telling the model to proofread the given text and return me the result, attach the prompt to each chunk of text to proofread, and let it rip.

artursapek•1h ago

Good idea about the leaderboard for open vs closed models!

Point taken on using an agent. I went that route because part of the goal for this benchmark is to inform which models I push in my agentic word processor, which uses tools for focused proofreading/editing. It's much faster and generally cheaper to use tools for surgical changes on large documents, rather than having the model spit out the entire document with all issues corrected. So yes, I am trying to measure agentic abilities here.

A simple one-pass full-rewrite test would also make an interesting benchmark, though.

Show HN: Kloak, A secret manager that keeps K8s workload away from secrets

https://getkloak.io/

23•neo2006•3h ago•18 comments

Show HN: Talisman – A Android instrument played with two thumbs

https://talisman.by-igor.com/

4•ycosynot•1h ago•1 comments

Show HN: A Karpathy-style LLM wiki your agents maintain (Markdown and Git)

https://github.com/nex-crm/wuphf

217•najmuzzaman•13h ago•100 comments

Show HN: Good AI Task – a tool for asking AI what it can and can't do

https://goodaitask.com

4•jmt710•2h ago•1 comments

Show HN: Useknockout open source background removal API 40× cheaper -remove.bg

https://github.com/useknockout/api

3•tlorents•2h ago•0 comments

Show HN: AI Visibility Monitor – Track if your site gets cited by GPT/Claude

https://github.com/WorkSmartAI-alt/ai-visibility-monitor

4•balance006•2h ago•0 comments

Show HN: Mapping Sonnet's thinking process via flame charts

https://adamsohn.com/lambda-variance/

3•dataviz1000•2h ago•0 comments

Show HN: A faster, drop-in replacement for Tailscale's DERP relay

https://hyper-derp.dev/blog/hyper-derp-announcement/

3•KRuskowski•2h ago•1 comments

Show HN: I benchmarked how good LLMs are at proofreading English

https://github.com/reviseio/errata-bench

3•artursapek•3h ago•2 comments

Show HN: I've built a nice home server OS

https://lightwhale.asklandd.dk/

168•Zta77•1d ago•69 comments

Show HN: Quay – Menu-bar Git sync

https://code.sailorslog.io/quay

3•sailingcode•3h ago•0 comments

Show HN: Browser Harness – Gives LLM freedom to complete any browser task

https://github.com/browser-use/browser-harness

117•gregpr07•1d ago•57 comments

Show HN: SVG Fitter – Rust+WASM Vectorizer

https://svg.axk.sh

3•xlii•5h ago•0 comments

Show HN: Gova – The declarative GUI framework for Go

https://github.com/NV404/gova

136•aliezsid•1d ago•27 comments

Show HN: 1gbps Tokenizer written in Assembly. 20x faster than HuggingFace

https://github.com/dogmaticdev/SIMD-Tokenizer

3•dogmaticdev•5h ago•1 comments

Show HN: Honker – Postgres NOTIFY/LISTEN Semantics for SQLite

https://github.com/russellromney/honker

300•russellthehippo•2d ago•78 comments

Show HN: Odozi – open-source iOS journaling app

https://odozi.app

5•jlarks32•6h ago•0 comments

Show HN: Agent Vault – Open-source credential proxy and vault for agents

https://github.com/Infisical/agent-vault

150•dangtony98•3d ago•55 comments

Show HN: SherifDB, a databe written in Golang under 500 LOC

https://emmanuel326.github.io/blogs/sheriffdb.html

3•Nya-kundi•7h ago•2 comments

Show HN: WhiskeySour – A 10x faster drop-in replacement for BeautifulSoup

7•ayas_behera•7h ago•1 comments

Show HN: Tolaria – Open-source macOS app to manage Markdown knowledge bases

https://github.com/refactoringhq/tolaria

298•lucaronin•2d ago•137 comments

Show HN: Agent MCP Studio – build multi-agent MCP systems in a browser tab

https://www.agentmcp.studio

11•stealthtsdb•15h ago•3 comments

Show HN: VT Code – Rust TUI coding agent with multi-provider support

https://github.com/vinhnx/VTCode

14•vinhnx•18h ago•2 comments

Show HN: Nimbus – Browser with Claude Code UX

https://usenimbus.app/

17•pycassa•1d ago•3 comments

Show HN: HNswered – watches for replies to your Hacker News posts and comments

https://github.com/adam-s/HNswered

21•dataviz1000•1d ago•23 comments

Show HN: Xtrace – Unix-Style macOS Profiling for Instruments (CPU/GPU/Memory)

https://github.com/Kr1sso/xtrace-skill

2•Krisso•12h ago•0 comments

Show HN: Kaniop – Kubernetes Operator for Kanidm

https://github.com/pando85/kaniop

3•pando85•12h ago•1 comments

Show HN: Yumi - All-in-one workspace OS for thinking, ideating, and daily work

https://askyumi.app

3•yumi-dev•13h ago•2 comments

Show HN: Werner – A native Markdown editor for macOS with four view modes

https://apps.apple.com/ua/app/werner/id6758157982?mt=12

3•artem2471•14h ago•2 comments

Show HN: leaf – a terminal Markdown previewer with a GUI-like experience

https://github.com/RivoLink/leaf

44•RivoLink•1d ago•22 comments