frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Ask HN: What are the main measures of AI progress?

3•ericlamb89•1mo ago
I’m interested in how AI progress is currently evaluated and trying to build a list of the major approaches people actually use.

I’m aware that all of these measures have limitations and that many are controversial or imperfect. My goal is discovery and understanding, not to defend or attack any particular framework.

I’d love to hear:

- What measures, benchmarks, or methodologies you think belong on this list

- What you see as their key strengths and failure modes

- How (or whether) you personally use them to interpret AI progress

Comments

johnnyfived•1mo ago
There'd first have to be an intense evaluation and standardization process for AI / measuring AGI now. All current benchmarks are tailored to one use case (e.g. SWE) or are evaluations that can be gamed and manipulated.

I think this would take the form of something more abstract instead of concrete with raw numbers, like a revised Turing Test.

kayo_20211030•1mo ago
Yeah. I think the Turing Test has passed its sell-by date. As all things inevitably do. I'd be interested in how the "revised Turing Test" you propose looks. I'm not smart enough to know what that'd be, but it'd be interesting as a starting point.
johnnyfived•1mo ago
It's a great question that I haven't seen discussed on HN yet (though I'm not that active), I think this crowd is still very attuned to interesting but more deterministic problems technically.

This might sound basic but I keep coming back to this idea again and again. Alex Garland really did have the right idea with Ex-Machina, where in the film Caleb claims that he purposefully designed Ava (the AI robot) to have all the internal mechanisms shown, so people would understand always they are interacting with a machine. The point of his Turing test was to show whether they could see past the machine and still empathize with it as a human.

Zram as Swap

https://wiki.archlinux.org/title/Zram#Usage_as_swap
1•seansh•3m ago•0 comments

Green’s Dictionary of Slang - Five hundred years of the vulgar tongue

https://greensdictofslang.com/
1•mxfh•5m ago•0 comments

Nvidia CEO Says AI Capital Spending Is Appropriate, Sustainable

https://www.bloomberg.com/news/articles/2026-02-06/nvidia-ceo-says-ai-capital-spending-is-appropr...
1•virgildotcodes•7m ago•2 comments

Show HN: StyloShare – privacy-first anonymous file sharing with zero sign-up

https://www.styloshare.com
1•stylofront•9m ago•0 comments

Part 1 the Persistent Vault Issue: Your Encryption Strategy Has a Shelf Life

1•PhantomKey•13m ago•0 comments

Show HN: Teleop_xr – Modular WebXR solution for bimanual robot teleoperation

https://github.com/qrafty-ai/teleop_xr
1•playercc7•15m ago•1 comments

The Highest Exam: How the Gaokao Shapes China

https://www.lrb.co.uk/the-paper/v48/n02/iza-ding/studying-is-harmful
1•mitchbob•20m ago•1 comments

Open-source framework for tracking prediction accuracy

https://github.com/Creneinc/signal-tracker
1•creneinc•22m ago•0 comments

India's Sarvan AI LLM launches Indic-language focused models

https://x.com/SarvamAI
2•Osiris30•23m ago•0 comments

Show HN: CryptoClaw – open-source AI agent with built-in wallet and DeFi skills

https://github.com/TermiX-official/cryptoclaw
1•cryptoclaw•25m ago•0 comments

ShowHN: Make OpenClaw respond in Scarlett Johansson’s AI Voice from the Film Her

https://twitter.com/sathish316/status/2020116849065971815
1•sathish316•28m ago•2 comments

CReact Version 0.3.0 Released

https://github.com/creact-labs/creact
1•_dcoutinho96•29m ago•0 comments

Show HN: CReact – AI Powered AWS Website Generator

https://github.com/creact-labs/ai-powered-aws-website-generator
1•_dcoutinho96•30m ago•0 comments

The rocky 1960s origins of online dating (2025)

https://www.bbc.com/culture/article/20250206-the-rocky-1960s-origins-of-online-dating
1•1659447091•35m ago•0 comments

Show HN: Agent-fetch – Sandboxed HTTP client with SSRF protection for AI agents

https://github.com/Parassharmaa/agent-fetch
1•paraaz•37m ago•0 comments

Why there is no official statement from Substack about the data leak

https://techcrunch.com/2026/02/05/substack-confirms-data-breach-affecting-email-addresses-and-pho...
8•witnessme•41m ago•1 comments

Effects of Zepbound on Stool Quality

https://twitter.com/ScottHickle/status/2020150085296775300
2•aloukissas•44m ago•1 comments

Show HN: Seedance 2.0 – The Most Powerful AI Video Generator

https://seedance.ai/
2•bigbromaker•47m ago•0 comments

Ask HN: Do we need "metadata in source code" syntax that LLMs will never delete?

1•andrewstuart•53m ago•1 comments

Pentagon cutting ties w/ "woke" Harvard, ending military training & fellowships

https://www.cbsnews.com/news/pentagon-says-its-cutting-ties-with-woke-harvard-discontinuing-milit...
6•alephnerd•56m ago•2 comments

Can Quantum-Mechanical Description of Physical Reality Be Considered Complete? [pdf]

https://cds.cern.ch/record/405662/files/PhysRev.47.777.pdf
1•northlondoner•56m ago•1 comments

Kessler Syndrome Has Started [video]

https://www.tiktok.com/@cjtrowbridge/video/7602634355160206623
2•pbradv•59m ago•0 comments

Complex Heterodynes Explained

https://tomverbeure.github.io/2026/02/07/Complex-Heterodyne.html
4•hasheddan•59m ago•0 comments

MemAlign: Building Better LLM Judges from Human Feedback with Scalable Memory

https://www.databricks.com/blog/memalign-building-better-llm-judges-human-feedback-scalable-memory
1•superchink•1h ago•0 comments

CCC (Claude's C Compiler) on Compiler Explorer

https://godbolt.org/z/asjc13sa6
2•LiamPowell•1h ago•0 comments

Homeland Security Spying on Reddit Users

https://www.kenklippenstein.com/p/homeland-security-spies-on-reddit
38•duxup•1h ago•9 comments

Actors with Tokio (2021)

https://ryhl.io/blog/actors-with-tokio/
1•vinhnx•1h ago•0 comments

Can graph neural networks for biology realistically run on edge devices?

https://doi.org/10.21203/rs.3.rs-8645211/v1
1•swapinvidya•1h ago•1 comments

Deeper into the shareing of one air conditioner for 2 rooms

1•ozzysnaps•1h ago•0 comments

Weatherman introduces fruit-based authentication system to combat deep fakes

https://www.youtube.com/watch?v=5HVbZwJ9gPE
3•savrajsingh•1h ago•0 comments