frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Crypto firm accidentally sends $40B in Bitcoin to users

https://finance.yahoo.com/news/crypto-firm-accidentally-sends-40-055054321.html
1•CommonGuy•34s ago•0 comments

Magnetic fields can change carbon diffusion in steel

https://www.sciencedaily.com/releases/2026/01/260125083427.htm
1•fanf2•1m ago•0 comments

Fantasy football that celebrates great games

https://www.silvestar.codes/articles/ultigamemate/
1•blenderob•1m ago•0 comments

Show HN: Animalese

https://animalese.barcoloudly.com/
1•noreplica•1m ago•0 comments

StrongDM's AI team build serious software without even looking at the code

https://simonwillison.net/2026/Feb/7/software-factory/
1•simonw•2m ago•0 comments

John Haugeland on the failure of micro-worlds

https://blog.plover.com/tech/gpt/micro-worlds.html
1•blenderob•2m ago•0 comments

Velocity

https://velocity.quest
1•kevinelliott•3m ago•1 comments

Corning Invented a New Fiber-Optic Cable for AI and Landed a $6B Meta Deal [video]

https://www.youtube.com/watch?v=Y3KLbc5DlRs
1•ksec•4m ago•0 comments

Show HN: XAPIs.dev – Twitter API Alternative at 90% Lower Cost

https://xapis.dev
1•nmfccodes•5m ago•0 comments

Near-Instantly Aborting the Worst Pain Imaginable with Psychedelics

https://psychotechnology.substack.com/p/near-instantly-aborting-the-worst
1•eatitraw•11m ago•0 comments

Show HN: Nginx-defender – realtime abuse blocking for Nginx

https://github.com/Anipaleja/nginx-defender
2•anipaleja•11m ago•0 comments

The Super Sharp Blade

https://netzhansa.com/the-super-sharp-blade/
1•robin_reala•13m ago•0 comments

Smart Homes Are Terrible

https://www.theatlantic.com/ideas/2026/02/smart-homes-technology/685867/
1•tusslewake•14m ago•0 comments

What I haven't figured out

https://macwright.com/2026/01/29/what-i-havent-figured-out
1•stevekrouse•15m ago•0 comments

KPMG pressed its auditor to pass on AI cost savings

https://www.irishtimes.com/business/2026/02/06/kpmg-pressed-its-auditor-to-pass-on-ai-cost-savings/
1•cainxinth•15m ago•0 comments

Open-source Claude skill that optimizes Hinge profiles. Pretty well.

https://twitter.com/b1rdmania/status/2020155122181869666
2•birdmania•15m ago•1 comments

First Proof

https://arxiv.org/abs/2602.05192
2•samasblack•17m ago•1 comments

I squeezed a BERT sentiment analyzer into 1GB RAM on a $5 VPS

https://mohammedeabdelaziz.github.io/articles/trendscope-market-scanner
1•mohammede•18m ago•0 comments

Kagi Translate

https://translate.kagi.com
2•microflash•19m ago•0 comments

Building Interactive C/C++ workflows in Jupyter through Clang-REPL [video]

https://fosdem.org/2026/schedule/event/QX3RPH-building_interactive_cc_workflows_in_jupyter_throug...
1•stabbles•20m ago•0 comments

Tactical tornado is the new default

https://olano.dev/blog/tactical-tornado/
2•facundo_olano•22m ago•0 comments

Full-Circle Test-Driven Firmware Development with OpenClaw

https://blog.adafruit.com/2026/02/07/full-circle-test-driven-firmware-development-with-openclaw/
1•ptorrone•22m ago•0 comments

Automating Myself Out of My Job – Part 2

https://blog.dsa.club/automation-series/automating-myself-out-of-my-job-part-2/
1•funnyfoobar•22m ago•1 comments

Dependency Resolution Methods

https://nesbitt.io/2026/02/06/dependency-resolution-methods.html
1•zdw•23m ago•0 comments

Crypto firm apologises for sending Bitcoin users $40B by mistake

https://www.msn.com/en-ie/money/other/crypto-firm-apologises-for-sending-bitcoin-users-40-billion...
1•Someone•24m ago•0 comments

Show HN: iPlotCSV: CSV Data, Visualized Beautifully for Free

https://www.iplotcsv.com/demo
2•maxmoq•25m ago•0 comments

There's no such thing as "tech" (Ten years later)

https://www.anildash.com/2026/02/06/no-such-thing-as-tech/
2•headalgorithm•25m ago•0 comments

List of unproven and disproven cancer treatments

https://en.wikipedia.org/wiki/List_of_unproven_and_disproven_cancer_treatments
1•brightbeige•26m ago•0 comments

Me/CFS: The blind spot in proactive medicine (Open Letter)

https://github.com/debugmeplease/debug-ME
1•debugmeplease•26m ago•1 comments

Ask HN: What are the word games do you play everyday?

1•gogo61•29m ago•1 comments
Open in hackernews

Problems in LLM Benchmarking and Evaluation

https://www.xent.tech/blog/problems-in-llm-benchmarking-and-evaluation
14•acegod•5mo ago

Comments

sbilstein•5mo ago
Personally I think LLM benchmarks make agents worse. All these companies chase the benchmarks, overfit, and think being able to cheat at the math olympiad is gonna get us to AGI. Instead researchers should peer in and get me an agent that can reliably count the number of "i"'s in mississippi.
upperhalfplane•5mo ago
I don't quite think they cheat at math olympiads, but obviously there are blindspots for the unspectacular tasks. That being said, Mississippi is both a good and a bad question to ask. On the one hand, it's "the bare minimum" to require, on the other hand, is it really a feat? Like, most models can write a piece of code that would compute that. If you show me a task I'm not designed to solve (like count the number of i's in this text), the smart thing is actually to write a program to count them (which LLMs can do).

The best way to measure intelligence is probably to have a model know its strengths and weaknesses, and deal with them in an efficient way. And the most important thing for eval is that ability.

tianlong•5mo ago
What's the TLDR about how to solve that benchmark problem?
upperhalfplane•5mo ago
TLDR: games are a good way to go.