frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

I Was Trapped in Chinese Mafia Crypto Slavery [video]

https://www.youtube.com/watch?v=zOcNaWmmn0A
1•mgh2•1m ago•0 comments

U.S. CBP Reported Employee Arrests (FY2020 – FYTD)

https://www.cbp.gov/newsroom/stats/reported-employee-arrests
1•ludicrousdispla•3m ago•0 comments

Show HN: I built a free UCP checker – see if AI agents can find your store

https://ucphub.ai/ucp-store-check/
1•vladeta•8m ago•1 comments

Show HN: SVGV – A Real-Time Vector Video Format for Budget Hardware

https://github.com/thealidev/VectorVision-SVGV
1•thealidev•10m ago•0 comments

Study of 150 developers shows AI generated code no harder to maintain long term

https://www.youtube.com/watch?v=b9EbCb5A408
1•lifeisstillgood•10m ago•0 comments

Spotify now requires premium accounts for developer mode API access

https://www.neowin.net/news/spotify-now-requires-premium-accounts-for-developer-mode-api-access/
1•bundie•13m ago•0 comments

When Albert Einstein Moved to Princeton

https://twitter.com/Math_files/status/2020017485815456224
1•keepamovin•14m ago•0 comments

Agents.md as a Dark Signal

https://joshmock.com/post/2026-agents-md-as-a-dark-signal/
1•birdculture•16m ago•0 comments

System time, clocks, and their syncing in macOS

https://eclecticlight.co/2025/05/21/system-time-clocks-and-their-syncing-in-macos/
1•fanf2•17m ago•0 comments

McCLIM and 7GUIs – Part 1: The Counter

https://turtleware.eu/posts/McCLIM-and-7GUIs---Part-1-The-Counter.html
1•ramenbytes•20m ago•0 comments

So whats the next word, then? Almost-no-math intro to transformer models

https://matthias-kainer.de/blog/posts/so-whats-the-next-word-then-/
1•oesimania•21m ago•0 comments

Ed Zitron: The Hater's Guide to Microsoft

https://bsky.app/profile/edzitron.com/post/3me7ibeym2c2n
2•vintagedave•24m ago•1 comments

UK infants ill after drinking contaminated baby formula of Nestle and Danone

https://www.bbc.com/news/articles/c931rxnwn3lo
1•__natty__•25m ago•0 comments

Show HN: Android-based audio player for seniors – Homer Audio Player

https://homeraudioplayer.app
2•cinusek•25m ago•0 comments

Starter Template for Ory Kratos

https://github.com/Samuelk0nrad/docker-ory
1•samuel_0xK•27m ago•0 comments

LLMs are powerful, but enterprises are deterministic by nature

2•prateekdalal•30m ago•0 comments

Make your iPad 3 a touchscreen for your computer

https://github.com/lemonjesus/ipad-touch-screen
2•0y•36m ago•1 comments

Internationalization and Localization in the Age of Agents

https://myblog.ru/internationalization-and-localization-in-the-age-of-agents
1•xenator•36m ago•0 comments

Building a Custom Clawdbot Workflow to Automate Website Creation

https://seedance2api.org/
1•pekingzcc•38m ago•1 comments

Why the "Taiwan Dome" won't survive a Chinese attack

https://www.lowyinstitute.org/the-interpreter/why-taiwan-dome-won-t-survive-chinese-attack
2•ryan_j_naughton•39m ago•0 comments

Xkcd: Game AIs

https://xkcd.com/1002/
1•ravenical•40m ago•0 comments

Windows 11 is finally killing off legacy printer drivers in 2026

https://www.windowscentral.com/microsoft/windows-11/windows-11-finally-pulls-the-plug-on-legacy-p...
1•ValdikSS•41m ago•0 comments

From Offloading to Engagement (Study on Generative AI)

https://www.mdpi.com/2306-5729/10/11/172
1•boshomi•43m ago•1 comments

AI for People

https://justsitandgrin.im/posts/ai-for-people/
1•dive•44m ago•0 comments

Rome is studded with cannon balls (2022)

https://essenceofrome.com/rome-is-studded-with-cannon-balls
1•thomassmith65•49m ago•0 comments

8-piece tablebase development on Lichess (op1 partial)

https://lichess.org/@/Lichess/blog/op1-partial-8-piece-tablebase-available/1ptPBDpC
2•somethingp•51m ago•0 comments

US to bankroll far-right think tanks in Europe against digital laws

https://www.brusselstimes.com/1957195/us-to-fund-far-right-forces-in-europe-tbtb
4•saubeidl•52m ago•0 comments

Ask HN: Have AI companies replaced their own SaaS usage with agents?

1•tuxpenguine•54m ago•0 comments

pi-nes

https://twitter.com/thomasmustier/status/2018362041506132205
1•tosh•57m ago•0 comments

Show HN: Crew – Multi-agent orchestration tool for AI-assisted development

https://github.com/garnetliu/crew
1•gl2334•57m ago•0 comments
Open in hackernews

Measuring AI Ability to Complete Long Tasks

https://spectrum.ieee.org/large-language-model-performance
47•pseudolus•7mo ago

Comments

greenchair•7mo ago
thanks that was a good one!
revskill•7mo ago
Is there any limit ?
coderatlarge•7mo ago
“ If the idea of LLMs improving themselves strikes you as having a certain singularity-robocalypse quality to it, Kinniment wouldn’t disagree with you. But she does add a caveat: “You could get acceleration that is quite intense and does make things meaningfully more difficult to control without it necessarily resulting in this massively explosive growth,” she says. It’s quite possible, she adds, that various factors could slow things down in practice. “Even if it were the case that we had very, very clever AIs, this pace of progress could still end up bottlenecked on things like hardware and robotics.” “
tbalsam•7mo ago
The only limit is yourself

Source: One of the most classic internet websites, zombo.com (sound on)

tbalsam•7mo ago
For those curious: https://en.m.wikipedia.org/wiki/Zombo.com
LorenDB•7mo ago
Why would you benchmark the LLMs for 50% success? I expect 100% success, or nearly so, to make an LLM a practical replacement for s human. 50% success is far too unreliable.

Edit: notice that I said "100%, or nearly so". I realize that 100% is an unrealistic metric for an LLM, but come on, the robots should be at least as competent as the humans they replace, and ideally much more so.

the__alchemist•7mo ago
100% - that's quite the metric!
wobblyasp•7mo ago
Yah I'm not really getting why that was chosen. Maybe not 100%, but something closer to 75% would be totally workable
elif•7mo ago
So every time you try to fix a bug, you succeed on the first try in the expected amount of time?

Of course not.

This idea that AI should be correct 100% of the time is like expecting autonomous vehicles to have a 0% crash rate to be successful. It is just a coping metric which allows humans to feel superior. In reality they already outperform humans in terms of crash rates

tbalsam•7mo ago
There are versions of this kind of benchmark with a higher threshold, however, it only seems to adjust the timetables by a linear amount, so you're only buying 1-2 years or so depending on what you want that % success rate to be.
tsilmanr•7mo ago
50% success threshold has the lowest variance. If you'd choose 100% or even 90%, the plot would have much higher variance and difficult to discern trend. (And would require many more test samples)
gcanyon•7mo ago
I'm not saying 50% is the right number, but I can imagine that a 100% requirement would produce very noisy results, while 50% might achieve a much smoother graph that produces a (hopefully) better view into the future. It might be reasonable to look at where a 95-100% result appears relative to the 50% result for past evaluations, and then project out from the projected 50% results in the future by the same amount -- but again, it might be the case that looking for 100% results relative to 50% leads to a very noisy result that can't be generalized into the future well.
stocksinsmocks•7mo ago
How many humans achieve that level of success? Even with QA processes that quintuple development cost, is it even 100%? The software I regularly use is not even 95% defect free.

I am pleasantly amused to see it’s the cutting edge tech bros shaking their fist at the young LLMs on their lawn. We got free intern-quality work. Take the win.

nickpeterson•7mo ago
The Skynet Funding Bill is passed. The system goes on-line August 4th, 1997. Human decisions are removed from strategic defense. Skynet begins to learn at a geometric rate. It becomes self-aware at 2:14 a.m. Eastern time, August 29th
fendy3002•7mo ago
Because I always believe that Pareto Principle applies in most aspect of computing: https://en.wikipedia.org/wiki/Pareto_principle, I believe it'll also apply on this case too, and I find that it tracks with the progress of LLM/AIs.

Breaking over 80% accuracy and solving the rest of 20% problem will be the main challenge of next-gen (or next-2gen) LLM, not to mention they still have tasks to bring down the computing costs.

EDIT: that said, solving 80% of problems with 80% of accuracy with significant time saving is a solution that's worth to account, though we need to keep sceptical because the rest 20% may be gotten much worse because the 80% solved is in bad quality.

Yoric•7mo ago
There is a big difference between LLMs and most other tech improvements, though: with most technologies that I can think of that solve 80% of the problem, it's easy to find out whether the technology works. When you're working with an LLM, though, it's really hard to know whether the answer is correct/usable or not.
fl0id•7mo ago
I call BS. That graph seems very misleading, like just getting faster for me is not improving exponentially. By improving exponentially most ppl would understand getting smarter
ecocentrik•7mo ago
It's already very misleading that they have used "Answering a question" as a the most trivial task to anchor their trend line. In the middle of their trend line they have humans taking 8 minutes to "find a fact on the web". Both of those tasks have a large variance in time requirements and outcomes.
dom96•7mo ago
It takes a human 167 hours to start a new company? What does that even mean?
AnimalMuppet•7mo ago
Legal paperwork, maybe?

But if so... no, I do not want an LLM filling out the legal paperwork that my company depends on.

untitled2•7mo ago
Classic mistake is that if 1 worker will produce 10 products a day, 10 workers will produce 100. Fact is what one software developer will do in a week, ten will do in a year. Copypasta can be fast and very inaccuare today -- it will be faster and much more inaccurate later.
timr•7mo ago
For those people who won’t read anything more than the headline, this is a silly paper based on a metric that considers only “task completion time” at “a specified degree of reliability, such as 50 percent” for “human programmers”.

Then, in a truly genius stroke of AI science, the current article extrapolates this to infinity and beyond, while hand-waving away the problem of “messiness”, which clearly calls the extrapolation into question:

> At the heart of the METR work is a metric the researchers devised called “task-completion time horizon.” It’s the amount of time human programmers would take, on average, to do a task that an LLM can complete with some specified degree of reliability, such as 50 percent. A plot of this metric for some general-purpose LLMs going back several years [main illustration at top] shows clear exponential growth, with a doubling period of about seven months. The researchers also considered the “messiness” factor of the tasks, with “messy” tasks being those that more resembled ones in the “real world,” according to METR researcher Megan Kinniment. Messier tasks were more challenging for LLMs [smaller chart, above]

dang•7mo ago
What would be a more accurate and neutral headline?
Y_Y•7mo ago
The paper and blog posts referenced are both called "Measuring AI Ability to Complete Long Tasks”, this might do better.
timr•7mo ago
Agreed. Or "AI models are getting faster", which seems defensible.
recursivecaveat•7mo ago
They're not saying that the models are getting faster. They're saying that the models are becoming capable at all of completing tasks that take humans longer and longer. The task completion time for humans is a proxy for complexity of the task, or some notion of how far the model can get without human intervention.
dang•7mo ago
Ok, belatedly changed. Thanks!
pu_pe•7mo ago
We can see exponential improvement in LLM performance in all sorts of metrics. The key question is whether this improvement will be sustained in coming years.
donkey_brains•7mo ago
I’m sure someone more knowledgeable and well-spoken than I will provide a more scathing takedown of this article soon, but even I can laugh at its breathless endorsement of some very dubious claims with no supporting evidence.

“AI might write a decent novel by 2030”? Have you read the absolute dreck they produce today? An LLM will NEVER produce a decent novel, for the same reason it will never independently create a decent game or movie: It can’t read the novel, play the game, or watch the movie, and have an emotional response to it or gauge it’s entertainment value. It has no way to judge if a work of art will have an emotional impact on its audience or dial in the art to enhance that impact or make a statement that resonates with people. Only people can do that.

All in all, this article is unscientific, filled with hand-waving “and then a miracle occurs”, and meaningless graphs that in no way indicate that LLMs will undergo the kind of step change transformation needed to reliably and independently accomplish complex tasks this decade. The study authors themselves give the game away when they use “50% success rate” as the yardstick for an LLM. You know what we call a human with a 50% success rate in the professional world? Fired.

I don’t think it was responsible of IEEE to publish this article and I expect better from the organization.

ysofunny•7mo ago
the LLMs will do to novels something else:

I think it'll be possible to publish a "book" as a series of prompts.

which the LLMs can expand out into the narrative story.

it's a novel you can chat with. the new novel for the post-LLMs era is more like publishing the whole author... which then you can "intervew" as an LLM (reminiscent of Harry Potter when Ron's sister find the evil journal, and she basically "chats" with the notebook)

kcplate•7mo ago
No idea why you are getting downvoted for this, it seems to me this would be exactly the kind of thing you could do…even hallucinations would contribute in a meaningful way.
kcplate•7mo ago
Likely due to my nearly 40 years experience in the tech industry, and knowing where we were then compared to where we are now—I am floored by what LLMs are doing and how much better they are even in the last 2 years I have been tracking on them.

That said, I will make no definitive statements like “never” and “can’t” as it relates to AI in the next 5 years because it is already doing things that I would have thought unlikely just 5 years ago…and frankly would have thought functionally impossible back 40 years ago.

timr•7mo ago
LLMs are cool and they're amazing for what they are, but the hype is just ridiculous right now, and the extrapolation fallacy is still a fallacy. Without a good structural reason to assume exponential growth (e.g. organism reproduction, which is itself not actually exponential), it's kind of the Godwin's Law of AI debate: the first person to say "if we only project forward X years..." terminates the conversation.

I appreciate your unwillingness to say "never" here, but I think the parent comment deserves credit for calling out something important that rarely gets discussed: the importance of emotion for producing great art. This is one of the classic themes of Asimov's entire Robot ouvre, which spends many books digging into the differences between (far more advanced) AI and actual human intelligence.

There are fundamental, definable, structural deficiencies that separate LLMs from human thought, it's plainly incorrect to pretend otherwise, and the...extrapolationists...are neglecting that we have no idea how to solve these problems.

kcplate•7mo ago
> deserves credit for calling out something important that rarely gets discussed: the importance of emotion for producing great art

That’s subjective though. An opinion I agree with, but still subjective.

I think it’s within the realm of possibility given the advances we have seen so far that a near future AI given enough input describing emotion could simulate it just enough for people to accept that it created a “decent” work. Likely undetectable by most people as AI created vs human created.

> we have no idea how to solve these problems.

Yet. Obviously time and talent moved us from Eliza to ChatGPT/Gemini. Is it really unlikely that time and talent can’t push us over the artificial emotion precipice as well? I am not betting against it.

timr•7mo ago
> Yet. Obviously time and talent moved us from Eliza to ChatGPT/Gemini.

Time being the big word there. Eliza came out in the 1960s.

>Is it really unlikely that time and talent can’t push us over the artificial emotion precipice as well? I am not betting against it.

Nor am I. I'm just not making hyperbolic claims concerning the timeline. For all we know, the next major advance will take another 60 years.

kcplate•7mo ago
> the next major advance will take another 60 years

It might, or it might be next week.

dang•7mo ago
I thought there had been more threads about this but could only find the following. Others?

Predictions from the METR AI scaling graph are based on a flawed premise - https://news.ycombinator.com/item?id=43885051 - May 2025 (25 comments)

AI's Version of Moore's Law - https://news.ycombinator.com/item?id=43835146 - April 2025 (1 comment)

Forecaster reacts: METR's bombshell paper about AI acceleration - https://news.ycombinator.com/item?id=43758936 - April 2025 (74 comments)

Measuring AI Ability to Complete Long Tasks – METR - https://news.ycombinator.com/item?id=43423691 - March 2025 (1 comment)

satisfice•7mo ago
These comments are a balm to my soul. But usually when I make them I get voted down for being mean to AI.
Y_Y•7mo ago
Lies, damn lies, statistics, confident LLM hallucinations, tech hype journalism
chmod775•7mo ago
What sort of nonsense chart is that? I can trivially come up with tasks that a competent human can complete in a minute, but that LLMs will absolutely face-plant on. In fact I could probably make that line go any direction I wanted to.

Does that tell us anything useful? No. They're LLMs, not chess engines, "word count" software, or a game of hangman*. You might as well add "make a sandwich" to the list of tasks.

Also 50% is the bar? In most jobs trainees only start actually being worth their wage once they reach about 99%, anything below wastes the time of someone more competent.

I wonder how much money is being collectively wasted on trying to shove LLMs into areas where you'd really need AGI, rather than focusing resources on improving LLMs for those areas where they're actually useful.

* Though I do recommend attempting to play hangman with an LLM. It's highly entertaining.

actuallyalys•7mo ago
I feel like it takes a human a month to write a novel or start up a company only of you’re talking about a very constrained version of the task. Like people write novels in a month — that’s the whole premise of National Novel Writing Month or NaNoWriMo — but they aren’t finished products, they’re first drafts.

Similarly, while I’m sure you could make good progress on starting a business in a month, it seems like that would take longer to genuinely complete from start to finish. Also, it seems like it’s necessarily a task that relies on external factors: Waiting for approval to come from various agencies, hiring employees, waiting for other parties to sign contracts, etc.

bgwalter•7mo ago
"By 2030, the most advanced LLMs should be able to complete, with 50 percent reliability, a software-based task that takes humans a full month of 40-hour workweeks."

That is nothing. "git clone" can, with 100% reliability, "complete" tasks in a minute that take over 1,000,000 man hours. It even keeps the license.

It is a shame the IEEE now promotes this theft.

bgwalter•7mo ago
Since neural networks can approximate any function, surely RSA-4096 will soon be factored with exponential progress!
spacemadness•7mo ago
What the hell IEEE? I expect better from you than this fluff. “It might write a decent novel by 2030” is a pure garbage take.
jruohonen•7mo ago
It is kind of sad that IEEE too is neck-deep in the hype cycle. And have they even heard about the concept of validity; i.e., I don't think the metrics are reasonable.