frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Show HN: One-click AI employee with its own cloud desktop

https://cloudbot-ai.com
1•fainir•1m ago•0 comments

Show HN: Poddley – Search podcasts by who's speaking

https://poddley.com
1•onesandofgrain•2m ago•0 comments

Same Surface, Different Weight

https://www.robpanico.com/articles/display/?entry_short=same-surface-different-weight
1•retrocog•4m ago•0 comments

The Rise of Spec Driven Development

https://www.dbreunig.com/2026/02/06/the-rise-of-spec-driven-development.html
2•Brajeshwar•8m ago•0 comments

The first good Raspberry Pi Laptop

https://www.jeffgeerling.com/blog/2026/the-first-good-raspberry-pi-laptop/
2•Brajeshwar•8m ago•0 comments

Seas to Rise Around the World – But Not in Greenland

https://e360.yale.edu/digest/greenland-sea-levels-fall
1•Brajeshwar•8m ago•0 comments

Will Future Generations Think We're Gross?

https://chillphysicsenjoyer.substack.com/p/will-future-generations-think-were
1•crescit_eundo•12m ago•0 comments

State Department will delete Xitter posts from before Trump returned to office

https://www.npr.org/2026/02/07/nx-s1-5704785/state-department-trump-posts-x
2•righthand•15m ago•0 comments

Show HN: Verifiable server roundtrip demo for a decision interruption system

https://github.com/veeduzyl-hue/decision-assistant-roundtrip-demo
1•veeduzyl•16m ago•0 comments

Impl Rust – Avro IDL Tool in Rust via Antlr

https://www.youtube.com/watch?v=vmKvw73V394
1•todsacerdoti•16m ago•0 comments

Stories from 25 Years of Software Development

https://susam.net/twenty-five-years-of-computing.html
2•vinhnx•17m ago•0 comments

minikeyvalue

https://github.com/commaai/minikeyvalue/tree/prod
3•tosh•21m ago•0 comments

Neomacs: GPU-accelerated Emacs with inline video, WebKit, and terminal via wgpu

https://github.com/eval-exec/neomacs
1•evalexec•26m ago•0 comments

Show HN: Moli P2P – An ephemeral, serverless image gallery (Rust and WebRTC)

https://moli-green.is/
2•ShinyaKoyano•30m ago•1 comments

How I grow my X presence?

https://www.reddit.com/r/GrowthHacking/s/UEc8pAl61b
2•m00dy•32m ago•0 comments

What's the cost of the most expensive Super Bowl ad slot?

https://ballparkguess.com/?id=5b98b1d3-5887-47b9-8a92-43be2ced674b
1•bkls•32m ago•0 comments

What if you just did a startup instead?

https://alexaraki.substack.com/p/what-if-you-just-did-a-startup
5•okaywriting•39m ago•0 comments

Hacking up your own shell completion (2020)

https://www.feltrac.co/environment/2020/01/18/build-your-own-shell-completion.html
2•todsacerdoti•42m ago•0 comments

Show HN: Gorse 0.5 – Open-source recommender system with visual workflow editor

https://github.com/gorse-io/gorse
1•zhenghaoz•42m ago•0 comments

GLM-OCR: Accurate × Fast × Comprehensive

https://github.com/zai-org/GLM-OCR
1•ms7892•43m ago•0 comments

Local Agent Bench: Test 11 small LLMs on tool-calling judgment, on CPU, no GPU

https://github.com/MikeVeerman/tool-calling-benchmark
1•MikeVeerman•44m ago•0 comments

Show HN: AboutMyProject – A public log for developer proof-of-work

https://aboutmyproject.com/
1•Raiplus•45m ago•0 comments

Expertise, AI and Work of Future [video]

https://www.youtube.com/watch?v=wsxWl9iT1XU
1•indiantinker•45m ago•0 comments

So Long to Cheap Books You Could Fit in Your Pocket

https://www.nytimes.com/2026/02/06/books/mass-market-paperback-books.html
4•pseudolus•45m ago•2 comments

PID Controller

https://en.wikipedia.org/wiki/Proportional%E2%80%93integral%E2%80%93derivative_controller
1•tosh•50m ago•0 comments

SpaceX Rocket Generates 100GW of Power, or 20% of US Electricity

https://twitter.com/AlecStapp/status/2019932764515234159
2•bkls•50m ago•0 comments

Kubernetes MCP Server

https://github.com/yindia/rootcause
1•yindia•51m ago•0 comments

I Built a Movie Recommendation Agent to Solve Movie Nights with My Wife

https://rokn.io/posts/building-movie-recommendation-agent
4•roknovosel•51m ago•0 comments

What were the first animals? The fierce sponge–jelly battle that just won't end

https://www.nature.com/articles/d41586-026-00238-z
2•beardyw•59m ago•0 comments

Sidestepping Evaluation Awareness and Anticipating Misalignment

https://alignment.openai.com/prod-evals/
1•taubek•1h ago•0 comments
Open in hackernews

GDPVal: Measuring the performance of our models on real-world tasks

https://openai.com/index/gdpval/
42•BGyss•4mo ago

Comments

westurner•4mo ago
"GDPVal: Measuring AI model performance on real world economically viable tasks" (2025) https://cdn.openai.com/pdf/d5eb7428-c4e9-4a33-bd86-86dd4bcf1...

GDP? GlobalGoals ... The Sustainable Development Goals (SDGs) include 17 goals, 169 targets, and over 230 indicators.

For strategic alignment,

Strategic alignment: https://en.wikipedia.org/wiki/Strategic_alignment

Sustainable Development Goals: https://en.wikipedia.org/wiki/Sustainable_Development_Goals

To produce the SDGs, IIUC they clustered the world's problems as an international collaborative exercise; to succeed the MDGs (2000-2015).

Each country voluntarily produces an annual SDG report on their progress on their Targets according to the Indicators.

IMHO, Priorities should include clean energy and AI efficiency, given the growth projections for energy use of AI (and our electrical bills given continued expected supply shortages of energy)

Which real-word SDG tasks can be AI eval'd?

Snuggly73•4mo ago
Apparently producing a react component that returns a piece of html with aria tags set up. Long horizon my ass.
westurner•4mo ago
Did the LLM in that case suggest adopting an open-source UI library that already has tests for and implements support for W3C ARIA accessibility features, like React-Aria or other alternatives?

Or did it just do the job as prompted and not mention suggestions for continuous improvement like reusing tested open source components?

Snuggly73•4mo ago
Not sure how it went in their tests - I've tried Opus and GPT5 and it was few lines of react + tests, so I guess 'no'
nextworddev•4mo ago
Couldn’t find their open source evals dataset
Snuggly73•4mo ago
https://huggingface.co/datasets/openai/gdpval/viewer/default...
nextworddev•4mo ago
thanks!
esafak•4mo ago
They reported the competitors' performance for a change. Especially curious because OpenAI is not first. Kudos?
CuriouslyC•4mo ago
Claude's low noise message style and good commonsense baiting people into thinking they can rely on it for hard stuff.