frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Can Quantum-Mechanical Description of Physical Reality Be Considered Complete? [pdf]

https://cds.cern.ch/record/405662/files/PhysRev.47.777.pdf
1•northlondoner•42s ago•1 comments

Kessler Syndrome Has Started [video]

https://www.tiktok.com/@cjtrowbridge/video/7602634355160206623
1•pbradv•3m ago•0 comments

Complex Heterodynes Explained

https://tomverbeure.github.io/2026/02/07/Complex-Heterodyne.html
1•hasheddan•3m ago•0 comments

EVs Are a Failed Experiment

https://spectator.org/evs-are-a-failed-experiment/
1•ArtemZ•15m ago•3 comments

MemAlign: Building Better LLM Judges from Human Feedback with Scalable Memory

https://www.databricks.com/blog/memalign-building-better-llm-judges-human-feedback-scalable-memory
1•superchink•16m ago•0 comments

CCC (Claude's C Compiler) on Compiler Explorer

https://godbolt.org/z/asjc13sa6
1•LiamPowell•18m ago•0 comments

Homeland Security Spying on Reddit Users

https://www.kenklippenstein.com/p/homeland-security-spies-on-reddit
2•duxup•20m ago•0 comments

Actors with Tokio (2021)

https://ryhl.io/blog/actors-with-tokio/
1•vinhnx•22m ago•0 comments

Can graph neural networks for biology realistically run on edge devices?

https://doi.org/10.21203/rs.3.rs-8645211/v1
1•swapinvidya•34m ago•1 comments

Deeper into the shareing of one air conditioner for 2 rooms

1•ozzysnaps•36m ago•0 comments

Weatherman introduces fruit-based authentication system to combat deep fakes

https://www.youtube.com/watch?v=5HVbZwJ9gPE
2•savrajsingh•36m ago•0 comments

Why Embedded Models Must Hallucinate: A Boundary Theory (RCC)

http://www.effacermonexistence.com/rcc-hn-1-1
1•formerOpenAI•38m ago•2 comments

A Curated List of ML System Design Case Studies

https://github.com/Engineer1999/A-Curated-List-of-ML-System-Design-Case-Studies
3•tejonutella•42m ago•0 comments

Pony Alpha: New free 200K context model for coding, reasoning and roleplay

https://ponyalpha.pro
1•qzcanoe•47m ago•1 comments

Show HN: Tunbot – Discord bot for temporary Cloudflare tunnels behind CGNAT

https://github.com/Goofygiraffe06/tunbot
1•g1raffe•49m ago•0 comments

Open Problems in Mechanistic Interpretability

https://arxiv.org/abs/2501.16496
2•vinhnx•55m ago•0 comments

Bye Bye Humanity: The Potential AMOC Collapse

https://thatjoescott.com/2026/02/03/bye-bye-humanity-the-potential-amoc-collapse/
2•rolph•59m ago•0 comments

Dexter: Claude-Code-Style Agent for Financial Statements and Valuation

https://github.com/virattt/dexter
1•Lwrless•1h ago•0 comments

Digital Iris [video]

https://www.youtube.com/watch?v=Kg_2MAgS_pE
1•vermilingua•1h ago•0 comments

Essential CDN: The CDN that lets you do more than JavaScript

https://essentialcdn.fluidity.workers.dev/
1•telui•1h ago•1 comments

They Hijacked Our Tech [video]

https://www.youtube.com/watch?v=-nJM5HvnT5k
1•cedel2k1•1h ago•0 comments

Vouch

https://twitter.com/mitchellh/status/2020252149117313349
36•chwtutha•1h ago•6 comments

HRL Labs in Malibu laying off 1/3 of their workforce

https://www.dailynews.com/2026/02/06/hrl-labs-cuts-376-jobs-in-malibu-after-losing-government-work/
4•osnium123•1h ago•1 comments

Show HN: High-performance bidirectional list for React, React Native, and Vue

https://suhaotian.github.io/broad-infinite-list/
2•jeremy_su•1h ago•0 comments

Show HN: I built a Mac screen recorder Recap.Studio

https://recap.studio/
1•fx31xo•1h ago•1 comments

Ask HN: Codex 5.3 broke toolcalls? Opus 4.6 ignores instructions?

1•kachapopopow•1h ago•0 comments

Vectors and HNSW for Dummies

https://anvitra.ai/blog/vectors-and-hnsw/
1•melvinodsa•1h ago•0 comments

Sanskrit AI beats CleanRL SOTA by 125%

https://huggingface.co/ParamTatva/sanskrit-ppo-hopper-v5/blob/main/docs/blog.md
1•prabhatkr•1h ago•1 comments

'Washington Post' CEO resigns after going AWOL during job cuts

https://www.npr.org/2026/02/07/nx-s1-5705413/washington-post-ceo-resigns-will-lewis
4•thread_id•1h ago•1 comments

Claude Opus 4.6 Fast Mode: 2.5× faster, ~6× more expensive

https://twitter.com/claudeai/status/2020207322124132504
1•geeknews•1h ago•0 comments
Open in hackernews

GDPVal: Measuring the performance of our models on real-world tasks

https://openai.com/index/gdpval/
42•BGyss•4mo ago

Comments

westurner•4mo ago
"GDPVal: Measuring AI model performance on real world economically viable tasks" (2025) https://cdn.openai.com/pdf/d5eb7428-c4e9-4a33-bd86-86dd4bcf1...

GDP? GlobalGoals ... The Sustainable Development Goals (SDGs) include 17 goals, 169 targets, and over 230 indicators.

For strategic alignment,

Strategic alignment: https://en.wikipedia.org/wiki/Strategic_alignment

Sustainable Development Goals: https://en.wikipedia.org/wiki/Sustainable_Development_Goals

To produce the SDGs, IIUC they clustered the world's problems as an international collaborative exercise; to succeed the MDGs (2000-2015).

Each country voluntarily produces an annual SDG report on their progress on their Targets according to the Indicators.

IMHO, Priorities should include clean energy and AI efficiency, given the growth projections for energy use of AI (and our electrical bills given continued expected supply shortages of energy)

Which real-word SDG tasks can be AI eval'd?

Snuggly73•4mo ago
Apparently producing a react component that returns a piece of html with aria tags set up. Long horizon my ass.
westurner•4mo ago
Did the LLM in that case suggest adopting an open-source UI library that already has tests for and implements support for W3C ARIA accessibility features, like React-Aria or other alternatives?

Or did it just do the job as prompted and not mention suggestions for continuous improvement like reusing tested open source components?

Snuggly73•4mo ago
Not sure how it went in their tests - I've tried Opus and GPT5 and it was few lines of react + tests, so I guess 'no'
nextworddev•4mo ago
Couldn’t find their open source evals dataset
Snuggly73•4mo ago
https://huggingface.co/datasets/openai/gdpval/viewer/default...
nextworddev•4mo ago
thanks!
esafak•4mo ago
They reported the competitors' performance for a change. Especially curious because OpenAI is not first. Kudos?
CuriouslyC•4mo ago
Claude's low noise message style and good commonsense baiting people into thinking they can rely on it for hard stuff.