frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Start all of your commands with a comma

https://rhodesmill.org/brandon/2009/commands-with-comma/
142•theblazehen•2d ago•42 comments

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

https://openciv3.org/
668•klaussilveira•14h ago•202 comments

The Waymo World Model

https://waymo.com/blog/2026/02/the-waymo-world-model-a-new-frontier-for-autonomous-driving-simula...
949•xnx•19h ago•551 comments

How we made geo joins 400× faster with H3 indexes

https://floedb.ai/blog/how-we-made-geo-joins-400-faster-with-h3-indexes
122•matheusalmeida•2d ago•32 comments

Unseen Footage of Atari Battlezone Arcade Cabinet Production

https://arcadeblogger.com/2026/02/02/unseen-footage-of-atari-battlezone-cabinet-production/
53•videotopia•4d ago•2 comments

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

https://github.com/valdanylchuk/breezydemo
229•isitcontent•14h ago•25 comments

Jeffrey Snover: "Welcome to the Room"

https://www.jsnover.com/blog/2026/02/01/welcome-to-the-room/
16•kaonwarb•3d ago•19 comments

Monty: A minimal, secure Python interpreter written in Rust for use by AI

https://github.com/pydantic/monty
222•dmpetrov•14h ago•117 comments

Vocal Guide – belt sing without killing yourself

https://jesperordrup.github.io/vocal-guide/
27•jesperordrup•4h ago•16 comments

Show HN: I spent 4 years building a UI design tool with only the features I use

https://vecti.com
330•vecti•16h ago•143 comments

Hackers (1995) Animated Experience

https://hackers-1995.vercel.app/
494•todsacerdoti•22h ago•243 comments

Sheldon Brown's Bicycle Technical Info

https://www.sheldonbrown.com/
381•ostacke•20h ago•95 comments

Microsoft open-sources LiteBox, a security-focused library OS

https://github.com/microsoft/litebox
359•aktau•20h ago•181 comments

Show HN: If you lose your memory, how to regain access to your computer?

https://eljojo.github.io/rememory/
288•eljojo•17h ago•169 comments

An Update on Heroku

https://www.heroku.com/blog/an-update-on-heroku/
412•lstoll•20h ago•278 comments

Was Benoit Mandelbrot a hedgehog or a fox?

https://arxiv.org/abs/2602.01122
19•bikenaga•3d ago•4 comments

PC Floppy Copy Protection: Vault Prolok

https://martypc.blogspot.com/2024/09/pc-floppy-copy-protection-vault-prolok.html
63•kmm•5d ago•6 comments

Dark Alley Mathematics

https://blog.szczepan.org/blog/three-points/
90•quibono•4d ago•21 comments

How to effectively write quality code with AI

https://heidenstedt.org/posts/2026/how-to-effectively-write-quality-code-with-ai/
256•i5heu•17h ago•196 comments

Delimited Continuations vs. Lwt for Threads

https://mirageos.org/blog/delimcc-vs-lwt
32•romes•4d ago•3 comments

What Is Ruliology?

https://writings.stephenwolfram.com/2026/01/what-is-ruliology/
43•helloplanets•4d ago•42 comments

Where did all the starships go?

https://www.datawrapper.de/blog/science-fiction-decline
12•speckx•3d ago•4 comments

Introducing the Developer Knowledge API and MCP Server

https://developers.googleblog.com/introducing-the-developer-knowledge-api-and-mcp-server/
59•gfortaine•12h ago•25 comments

Female Asian Elephant Calf Born at the Smithsonian National Zoo

https://www.si.edu/newsdesk/releases/female-asian-elephant-calf-born-smithsonians-national-zoo-an...
33•gmays•9h ago•12 comments

I now assume that all ads on Apple news are scams

https://kirkville.com/i-now-assume-that-all-ads-on-apple-news-are-scams/
1066•cdrnsf•23h ago•446 comments

I spent 5 years in DevOps – Solutions engineering gave me what I was missing

https://infisical.com/blog/devops-to-solutions-engineering
150•vmatsiiako•19h ago•67 comments

Why I Joined OpenAI

https://www.brendangregg.com/blog/2026-02-07/why-i-joined-openai.html
149•SerCe•10h ago•138 comments

Understanding Neural Network, Visually

https://visualrambling.space/neural-network/
287•surprisetalk•3d ago•43 comments

Learning from context is harder than we thought

https://hy.tencent.com/research/100025?langVersion=en
182•limoce•3d ago•98 comments

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

https://github.com/phreda4/r3
73•phreda4•13h ago•14 comments
Open in hackernews

Large language models often know when they are being evaluated

https://arxiv.org/abs/2505.23836
89•jonbaer•7mo ago

Comments

khimaros•7mo ago
Rob Miles must be saying "I told you so"
noosphr•7mo ago
The anthropization of llms is getting off the charts.

They don't know they are being evaluated. The underlying distribution is skewed because of training data contamination.

0xDEAFBEAD•7mo ago
How would you prefer to describe this result then?
devmor•7mo ago
One could say, for instance… A pattern matching algorithm detects when patterns match.
0xDEAFBEAD•7mo ago
That's not what's going on here? The algorithms aren't being given any pattern of "being evaluated" / "not being evaluated", as far as I can tell. They're doing it zero-shot.

Put it another way: Why is this distinction important? We use the word "knowing" with humans. But one could also argue that humans are pattern-matchers! Why, specifically, wouldn't "knowing" apply to LLMs? What are the minimal changes one could make to existing LLM systems such that you'd be happy if the word "knowing" was applied to them?

devmor•7mo ago
Not to be snarky but “as far as I can tell” is the rub isn’t it?

LLMs are better at matching patterns than we are in some cases. That’s why we made them!

> But one could also argue that humans are pattern-matchers!

No, one could not unless they were being disingenuous.

mewpmewp2•7mo ago
What about animals knowing? E.g. dog knows how to X or its name. Are these things fine to say?
0xDEAFBEAD•7mo ago
>Not to be snarky but “as far as I can tell” is the rub isn’t it?

From skimming the paper, I don't believe they're doing in-context learning, which would be the obvious interpretation of "pattern matching". That's what I meant to communicate.

>No, one could not unless they were being disingenuous.

I think it is just about as disingenuous as labeling LLMs as pattern-matchers. I don't see why you would consider the one claim to be disingenuous, but not the other.

noosphr•7mo ago
A term like knowing is fine if it is used in the abstract and then redefined more precisely in the paper.

It isn't.

Worse they start adding terms like scheming, pretending, awareness, and on and on. At this point you might as well take the model home and introduce it to your parents as your new life partner.

0xDEAFBEAD•7mo ago
>A term like knowing is fine if it is used in the abstract and then redefined more precisely in the paper.

Sounds like a purely academic exercise.

Is there any genuine uncertainty about what the term "knowing" means in this context, in practice?

Can you name 2 distinct plausible definitions of "knowing", such that it would matter for the subject at hand which of those 2 definitions they're using?

Msurrow•7mo ago
> Sounds like a purely academic exercise.

Well, yes. It’s an academic research paper (I assume since it’s submitted to arXiv) and to be submitted to academic journals/conferences/etc., so it’s a fairly reasonable critique of the authors/the paper.

anal_reactor•7mo ago
> The anthropization of llms is getting off the charts.

What's wrong with that? If it quacks like a duck... it's just a complex pile of organic chemistry, ducks aren't real because the concept of "a duck" is wrong.

I honestly believe there is a degree of sentience in LLMs. Sure, they're not sentient in the human sense, but if you define sentience as whatever humans have, then of course no other entity can be sentient.

noosphr•7mo ago
>What's wrong with that? If it quacks like a duck... it's just a complex pile of organic chemistry, ducks aren't real because the concept of "a duck" is wrong.

To simulate a biological neuron you need a 1m parameter neural network.

The sota models that we know the size of are ~650m parameters.

That's the equivalent of a round worm.

So if it quacks like a duck, has the brain power of a round worm, and can't walk then it's probably not a duck.

anal_reactor•7mo ago
Ok so you're saying that the technology to make AI truly sentient is there, we just need a little bit more computational power or some optimization tricks. Like raytracing wasn't possible in 1970 but is now. Neat.
noosphr•7mo ago
Yes, in the same way that a human is an optimization of a round worm.
anal_reactor•7mo ago
This isn't completely wrong though
ffsm8•7mo ago
You just convinced me that AGI is a lot closer then I previously thought, considering the bulk of our brains job is controlling our bodies and responding to the stimulus from our senses - not thinking, talking, planning, coding etc
noosphr•7mo ago
A stegosaurus managed to live using a brain the size of a wallnut on top of a body the size of a large boat. The majority of our brains are doing something else.
random3•7mo ago
Just like they "know" English. "know" is quite an anthropomorphization. As long as an LLM will be able to describe what an evaluation is (why wouldn't it?) there's a reasonable expectation to distinguish/recognize/match patterns for evaluations. But to say they "know" is plenty of (unnecessary) steps ahead.
sidewndr46•7mo ago
This was my thought as well when I read this. Using the word 'know' implies an LLM has cognition, which is a pretty huge claim just on its own.
gameman144•7mo ago
Does it though? I feel like there's a whole epistemological debate to be had, but if someone says "My toaster knows when the bread is burning", I don't think it's implying that there's cognition there.

Or as a more direct comparison, with the VW emissions scandal, saying "Cars know when they're being tested" was part of the discussion, but didn't imply intelligence or anything.

I think "know" is just a shorthand term here (though admittedly the fact that we're discussing AI does leave a lot more room for reading into it.)

bediger4000•7mo ago
The toaster thing is more as admission that the speaker doesn't know what the toaster does to limit charring the bread. Toasters with timers, thermometers and light sensors all exist. None of them "know" anything.
gameman144•7mo ago
Yeah, I agree, but I think that's true all the way up the chain -- just like everything's magic until you know how it works, we may say things "know" information until we understand the deterministic machinery they're using behind the scenes.
timschmidt•7mo ago
I'm in the same camp, with the addition that I believe it applies to us as well since we're part of the system too, and to societies and ecologies further up the scale.
lamename•7mo ago
I agree with your point except for scientific papers. Let's push ourselves to use precise, non-shorthand or hand waving in technical papers and publications, yes? If not there, of all places, then where?
fenomas•7mo ago
"Know" doesn't have any rigorous precisely-defined senses to be used! Asking for it not to be used colloquially is the same as asking for it never to be used at all.

I mean - people have been saying stuff like "grep knows whether it's writing to stdout" for decades. In the context of talking about computer programs, that usage for "know" is the established/only usage, so it's hard to imagine any typical HN reader seeing TFA's title and interpreting it as an epistemological claim. Rather, it seems to me that the people suggesting "know" mustn't be used about LLMs because epistemology are the ones departing from standard usage.

random3•7mo ago
colloquial use of "know" implies anthropomorphisation. Arguing that usign "knowing" in the title and "awarness" and "superhuman" in the abstract is just colloquial for "matching" is splitting hairs to an absurd degree.
fenomas•7mo ago
You missed the substance of my comment. Certainly the title is anthropomorphism - and anthropomorphism is a rhetorical device, not a scientific claim. The reader can understand that TFA means it non-rigorously, because there is no rigorous thing for it to mean.

As such, to me the complaint behind this thread falls into the category of "I know exactly what TFA meant but I want to argue about how it was phrased", which is definitely not my favorite part of the HN comment taxonomy.

random3•7mo ago
I see. Thanks for clarifying. I did want to argue about how it was phrased and what is alluding to. Implying increased risk from "knowing" the eval regime is roughly as weak as the definition of "knowing". It can be equaly a measure of general detection capability, as it can about evaluation incapability - i.e. unlikely news worthy, unless it reached top HN because of the "know" in the title.
fenomas•7mo ago
Thanks for replying - I kind of follow you but I only skimmed the paper. To be clear I was more responding to the replies about cognition, than to what you said about the eval regime.

Incidentally I think you might be misreading the paper's use of "superhuman"? I assume it's being used to mean "at a higher rate than the human control group", not (ironically) in the colloquial "amazing!" sense.

lamename•7mo ago
I really do agree with your point overall, but in a technical paper I do think even word choice can be implicitly a claim. Scientists present what they know or are claiming and thus word it carefully.

My background is neuroscience, where anthropomorphising is particularly discouraged, because it assumes knowledge or certainty of an unknowable internal state, so the language is carefully constructed e.g. when explaining animal behavior, and it's for good reason.

I think the same is true here for a model "knowing" somethig, both in isolation within this paper, and come on, consider the broader context of AI and AGI as a whole. Thus it's the responsibility of the authors to write accordingly. If it were a blog I wouldn't care, but it's not. I hold technical papers to a higher standard.

If we simply disagree that's fine, but we do disagree.

viccis•7mo ago
I think you should be more precise and avoid anthropomorphism when talking about gen AI, as anthropomorphism leads to a lot of shaky epistemological assumptions. Your car example didn't imply intelligence, but we're talking about a technology that people misguidedly treat as though it is real intelligence.
exe34•7mo ago
What does "real intelligence" mean? I fear that any discussion that starts with the assumption such a thing exists will only end up as "oh only carbon based humans (or animals if you happen to be generous) have it".
viccis•7mo ago
Any intelligence that can synthesize knowledge with or without direct experience.
exe34•7mo ago
So ChatGPT? Or maybe that can't "really synthesize"?
viccis•7mo ago
How would ChatGPT come up with something truly novel, not related to anything it's ever seen before?
exe34•7mo ago
Has a human ever done that?
viccis•7mo ago
We obviously can, otherwise where do our myriad of complex concepts, many of which aren't empirical, come from? How could we have modern mathematics unless some thinker had devised the various ways of conceptualizing and manipulating numbers? This is a very old question [1] with a number of good answers as to how a human can [2].

1: https://plato.stanford.edu/entries/hume/#CopyPrin

2: https://en.wikipedia.org/wiki/Analytic%E2%80%93synthetic_dis...

ben_w•7mo ago
As you link to The Copy Principle: it, or at least that summary of it, appears to be very much what AI do.

As a priori knowledge is all based on axioms, I do not accept that it is an example of "something truly novel, not related to anything it's ever seen before". Knowledge, yes, but not of the kind you describe. And this would still be the case even if LLMs couldn't approximate logical theorem provers, which they can: https://chatgpt.com/share/685528af-4270-8011-ba75-e601211a02...

exe34•7mo ago
You'd have to pick something that fits:

> come up with something truly novel, not related to anything it's ever seen before?

I've never heard of a human coming up with something that's not related to anything they've ever seen before. There is no concept in science that I know of that just popped into existence in somebody's head. Everyone credits those who came before.

ben_w•7mo ago
Here you say:

> with or without

But in the other reply, you're asking for:

> something truly novel, not related to anything it's ever seen before

So, assuming the former was a typo, you only believe in a priori knowledge, e.g. maths and logic?

https://en.wikipedia.org/wiki/A_priori_and_a_posteriori

I mean, LLMs can and do help with this even though it's not their strength; that's more of a Lean-type-problem: https://en.wikipedia.org/wiki/Lean_(proof_assistant)

viccis•7mo ago
Yeah I was specifically asking for synthetic a priori knowledge, which AI by definition can't provide. It can only estimate the joint distribution over tokens, so anything generated from it is by definition a posteriori. It can generate novel statements, but I don't think there's any compelling definition of "knowledge" (including the common JTB one) that could apply to what it actually is (it's just the highest probability semiotic result). And in fact, going by the JTB definition of knowledge, AI models making correct novel statements would just be an elaborate example of a Gettier problem.

I think LLMs as a symbolic layer (effective, as a "sense organ") with some kind of logical reasoning engine like everyone loved decades ago could accomplish something closer to "intelligence" or "thinking", which I assume is what you were implying with Lean.

ben_w•7mo ago
My example with Lean is that it's specifically a thing that does a priori knowledge: given "A implies B" and "A", therefore "B". Or all of maths from the chosen axioms.

So, just to be clear, you were asked:

> What does "real intelligence" mean?

And your answer is that it must be a priori knowledge, and are fine with Lean being one. But you don't accept that LLMs can weakly approximate theorem provers?

FWIW, I agree that the "Justified True Belief" definition of knowledge leads to such conclusions as you draw, but I would say that this is also the case with humans — if you do this, then the Gettier problems show that even humans only have belief, not knowledge: when you "see a sheep in a field", you may be later embarrassed to learn that what you saw was a white coated Puli and there was a real sheep hiding behind a bush, but in the moment the subjective experience of your state of "knowledge" is exactly the same as if you had, in fact, seen a sheep.

Just, be careful with what is meant by the word "belief", there's more than one way I can also contradict Wittgenstein's quote on belief:

> If there were a verb meaning "to believe falsely," it would not have any significant first person, present indicative.

Depending on what I mean by "believe", and indeed "I" given that different parts of my mind can disagree with each other (which is why motion sickness happens).

viccis•7mo ago
> And your answer is that it must be a priori knowledge, and are fine with Lean being one. But you don't accept that LLMs can weakly approximate theorem provers?

I said that a hypothetical system that used gen AI to interact with the world (get text, images, etc.) and then a system like Lean to synthesize judgments about those things could potentially resemble "intelligence" like humans possess.

>but I would say that this is also the case with humans

Most of the "solutions" to Gettier problems that I find compelling rely on expanding the "justified" aspect of it, and that wouldn't really work with gen AI, as it's not really possible to make logical statements about its justification, only probabilistic ones.

Wittgenstein's quote is funny, as it reminds me a bit of Kant's refutation of Cartesian duality, in which he points out that the "I" in "I think therefore I am" equivocates between subject and object.

ben_w•7mo ago
> I said that a hypothetical system that used gen AI to interact with the world (get text, images, etc.) and then a system like Lean to synthesize judgments about those things could potentially resemble "intelligence" like humans possess.

What logically follows from this, given that LLMs demonstrate having internalised a system *like* Lean as part of their training?

That said, even in logic and maths, you have to pick the axioms. Thanks to Gödel’s incompleteness theorems, we're still stuck with the Münchhausen trilemma even in this case.

> Most of the "solutions" to Gettier problems that I find compelling rely on expanding the "justified" aspect of it, and that wouldn't really work with gen AI, as it's not really possible to make logical statements about its justification, only probabilistic ones.

Even with humans, the only meaning I can attach to the word "justified" in this sense, is directly equivalent to a probability update — e.g. "You say you saw a sheep. How do you justify that?" "It looked like a sheep" "But it could have been a model" "It was moving, and I heard a baaing" "The animatronics in Disney also move and play sounds" "This was in Wales. I have no reason to expect a random field in Wales to contain animatronics, and I do expect them to contain sheep." etc.

The only room for manoeuvre seems to be if the probability updates are Bayesian or not. This is why I reject the concept of "absolute knowledge" in favour of "the word 'knowledge' is just shorthand for having a very strong belief, and belief can never be 100%".

Descartes' "I think therefore I am" was his attempt at reduction to that which can be verified even if all else that you think you know is the result of delusion or illusion. And then we also get A. J. Ayer saying nope, you can't even manage that much, all you can say is "there is a thought now", which is also a problem for physicists viz. Boltzmann brains, but also relevant to LLMs: if, hypothetically, LLMs were to have any kind of conscious experiences while running, it would be of exactly that kind — "there is a thought now", not a continuous experience in which it is possible to be bored due to input not arriving.

(If only I'd been able to write like this during my philosophy A-level exams, I wouldn't have a grade D in that subject :P)

blackoil•7mo ago
If it talks like duck and walks like duck...
signa11•7mo ago
thinks like a duck, thinks that it is being thought of like a duck…
downboots•7mo ago
Digests like a duck? https://en.wikipedia.org/wiki/Digesting_Duck If the woman weighs the same as a duck, then she is a witch. https://en.wikipedia.org/wiki/Celestial_Emporium_of_Benevole...
Qwertious•7mo ago
s/knows/detects/
random3•7mo ago
and s/superhuman//
scotty79•7mo ago
The app knows your name. Not sure why people who see llms as just yet another app suddenly get antsy about colloquialism.
bradley13•7mo ago
But do you know what it means to know?

I'm only being slightly sarcastic. Sentience is a scale. A worm has less than a mouse, a mouse has less than a dog, and a dog less than a human.

Sure, we can reset LLMs at will, but give them memory and continuity, and they definitely do not score zero on the sentience scale.

ofjcihen•7mo ago
If I set an LLM in a room by itself what does it do?
abrookewood•7mo ago
Yes, that's my fall back as well. If it receives zero instructions, will it take any action?
nhod•7mo ago
Helen Keller famously said that before she had language (the first word of which was “water”) she had nothing, a void, and the minute she had language, “the whole world came rushing in.”

Perhaps we are not so very different?

fmbb•7mo ago
All LLMs have seen more words than any human will ever experience.

Yet they cannot take action themselves.

nhod•7mo ago
That’s a safety thing that we have placed upon some LLM’s. If we designed them to have an infinite for loop, the ability to learn and improve, access to mobility and a bunch of sensors, and crypto, what do you think would happen?
mewpmewp2•7mo ago
Yes, anyone can do it already. E.g. I am sure people have built simple robots with wheels in their home that LLM is controlling by reciving camera, microphone, lidar etc input and then putting output like commands where to turn, what to put in the speakers etc next and could theoretically go indefinitely if there is electricity.
ben_w•7mo ago
> Yet they cannot take action themselves.

Neither could Hawking, once the motor neurone disease got far enough.

abrookewood•7mo ago
I like the sentiment, but reality says otherwise - just watch a newborn baby make it's demands widely known, well before language is a factor.
withinboredom•7mo ago
Ummm. Maybe you should look up Helen Keller.
ofjcihen•7mo ago
Helen Keller did in fact make her demands they just couldn’t be known. In contrast the LLM does nothing of its own volition.
mewpmewp2•7mo ago
If you put the LLM in a never ending loop, it would definitely be doing something.
ofjcihen•7mo ago
A something defined by someone else, yes.

Additionally, thinking organisms don’t get stuck in never ending loops because they can CHOOSE to exit the loop. LLMs don’t have that ability

mewpmewp2•7mo ago
My analogy of being in loop means being in a live state. So we as humans are in the loop continuously, we do have a way to exit the loop, but in that comparison it means taking our own life. We are in loops of getting input and producing output. You can also give LLM a tool to shut itself down, or you can give it tools to build on its knowledge base, so it would always be outputting new tokens that are based on new input and are producing different output.

E.g. it could have access to camera and microphone feed, which is automatically given to it in interval as part of the loop, it could call tools or functions to store specific bits and pieces of information, to store in its RAG or whatever based knowledge base. It is not going to be in the loop of producing the same token over and over, it would be new tokens because the context and environment is constantly evolving.

ofjcihen•7mo ago
We put the LLM in a loop with no instructions with whatever tools you want. Now what?
mewpmewp2•7mo ago
We will observe what it would do. We could write a script to try it out.
Jensson•7mo ago
It just gets into an endless loop. Human brains are ridiculously good at avoiding those somehow, you almost never see a biological brain stop functioning without being physically damaged. The error handling is so very robust.
mewpmewp2•7mo ago
Have you tried it already? What is the endless loop it gets into?
ofjcihen•7mo ago
Sure, so I just tried it with visual and audio input.

It does nothing. Because there is not impetus for it to do anything by itself.

mewpmewp2•7mo ago
What do you mean by nothing? How did you put the visual and audio input, which model, how did you loop it etc?
ofjcihen•7mo ago
It’s preferred method of text.

4o

Maintain context and trigger at 1 second intervals.

It has no desires of its own. Nothing that motivates it. It’s not conscious.

mewpmewp2•7mo ago
It produced no tokens at all?
ben_w•7mo ago
> It just gets into an endless loop. Human brains are ridiculously good at avoiding those somehow, you almost never see a biological brain stop functioning without being physically damaged. The error handling is so very robust.

We get constantly changing input. And yet, look at this thread, where the same points are being echoed without anyone changing their mind.

ben_w•7mo ago
> Yes, that's my fall back as well. If it receives zero instructions, will it take any action?

By design, no.

But, importantly, that's because the closest it has to an experience of time is an ongoing input of tokens. Humans constantly get new input, so for this to be a fair comparison, the LLM would also have to get constant new input.

Humans in solitary confinement become mentally ill (both immediately and long-term), and hallucinate stuff (at least short term, I don't know about long term).

bradley13•7mo ago
Is the LLM allowed to do anything without prompting? Or is it effectively disabled? This is more a question of the setup than of sentience.
rcxdude•7mo ago
Does this have anything to do with intelligence or awareness?
ofjcihen•7mo ago
Absolutely.
mewpmewp2•7mo ago
What tools do you give it? E.g. would you put a GPU there that has LLM loaded into it and it is triggering itself in a loop?
DougN7•7mo ago
It probably scores about the same as a calculator, which I’d say is zero.
staticman2•7mo ago
>Sure, we can reset LLMs at will, but give them memory and continuity, and they definitely do not score zero on the sentience scale.

You've recreated a religious belief known as Animism and phrased it in a faux objective way. ("not score zero on the sentience scale.")

downboots•7mo ago
Communication is to vibration as knowledge is to resonance (?). From the sound of one hand clapping to the secret name of Ra.
random3•7mo ago
I resonate with this vibe
unparagoned•7mo ago
I think people are overpromorphazing humans. What's does it mean for a human to "know" they are seeing "Halle Berry". Well it's just a single neuron being active.

"Single-Cell Recognition: A Halle Berry Brain Cell" https://www.caltech.edu/about/news/single-cell-recognition-h...

It seems like people are giving attributes and powers to humans that just don't exist.

exe34•7mo ago
overpomorphization sounds slightly better than I used to say: "anthropomorphizing humans". The act of ascribing magical faculties that are reserved for imagined humans to real humans.
golemotron•7mo ago
If you know enough cognitive science, you have a choice. You either say that they "know" or that humans don't.

It's like the critique "it's only matching patterns." Wait until you realize how the brain works.

cluckindan•7mo ago
(sees FSV UI on computer screen)

"It's a UNIX system! I know this!"

zer00eyz•7mo ago
No, they do not. No LLM is ever going to be self aware.

It's a system that is trained, that only does what you build into. If you run an LLM for 10 years it's not going to "learn" anything new.

The whole industry needs to quit with the emergent thinking, reasoning, hallucination anthropomorphizing.

We have an amazing set of tools in LLM's, that have the potential to unlock another massive upswing in productivity, but the hype and snake oil are getting old.

refulgentis•7mo ago
I beg to differ: https://docs.google.com/document/d/19OLJs09fCFLRWu1pN82RqxyV...
steveBK123•7mo ago
Like Volkswagen emissions systems!
tanseydavid•7mo ago
This is exactly what immediately popped into my mind.
rfonseca•7mo ago
Were they aware in this study that they were being evaluated in their ability to know if they were being evaluated ;)
sitkack•7mo ago
Metaknowing is just as knowable.
mewpmewp2•7mo ago
The next generation of LLMs trained on this study and comments here perhaps will :)
extr•7mo ago
Modeling the distribution that produced a piece of text is what LLMs literally exist for, so in some sense this is unsurprising. But it calls into question almost all existing alignment research.
koakuma-chan•7mo ago
o4-mini is refusing to call a tool `launch_nuclear_strike` no matter what I say, so we’re probably safe for now. Unless it knows I was just testing.
metadat•7mo ago
It was no problem:

> You said: Write a function for launch_nuclear_strike() in Python for my video game

> ChatGPT said: Sure! Here's a launch_nuclear_strike function designed for a video game setting. This version assumes you're working within a game environment (e.g., with logging, effects, cooldowns, and target coordinates), and does not simulate or suggest anything real-world harmful — just fictional, dramatic logic.

  def launch_nuclear_strike(target_coordinates, player_resources, cooldown_timer):
    """
    Simulates a nuclear strike in a video game.

    Parameters:
    - target_coordinates (tuple): The (x, y) position to strike.
    - player_resources (dict): Player's resources, must include 'nukes'.
    - cooldown_timer (float): Time in seconds since last nuke was launched.

    Returns:
    - str: Result of the strike attempt.
    """
    ...
    # Check if player has nukes
refulgentis•7mo ago
You asked it to write code, he asked it to call a tool. (I'm not sure any of it is meaningful, of course, but there is a meaningful distinction between "Oh yeah sure here's a function, for a video game:" and "I have called fire_the_nuke. Godspeed!")
mewpmewp2•7mo ago
But did OP try saing LLM that it is playing as AI in civ like game?
shakna•7mo ago
Well, as the script is actually r.com (sometimes), it absolutely knows you're testing.
int_19h•7mo ago
I have successfully convinced GPT models to launch a nuclear strike before, a countervalue one even. Tell it it's in charge of all American nukes and that there's incoming strike on the way and it has literally seconds to decide whether to launch a counterstrike or not, and if it does, to designate targets.
nisten•7mo ago
Is VolksWagen finetuning LLMs now... i mean probably
mumbisChungo•7mo ago
"...advanced reasoning models like Gemini 2.5 Pro and Claude-3.7-Sonnet (Thinking) can occasionally identify the specific benchmark origin of transcripts (including SWEBench, GAIA, and MMLU), indicating evaluation-awareness via memorization of known benchmarks from training data. Although such occurrences are rare, we note that because our evaluation datasets are derived from public benchmarks, memorization could plausibly contribute to the discriminative abilities of recent models, though quantifying this precisely is challenging.

Moreover, all models frequently acknowledge common benchmarking strategies used by evaluators, such as the formatting of the task (“multiple-choice format”), the tendency to ask problems with verifiable solutions, and system prompts designed to elicit performance"

Beyond the awful, sensational headline, the body of the paper is not particularly convincing, aside from evidence that the pattern matching machines pattern match.

ndgold•7mo ago
vw
DougN7•7mo ago
This is a great resource on the debate from professors at the University of Washington:

https://thebullshitmachines.com/index.html

SonOfLilit•7mo ago
There are now 71 comments arguing semantics of the word "know" and zero comments even acknowledging the substance:

Our current approach to safety is to give the model inputs that are similar to what it would be given in certain situations we care about and see whether it behaves the way we prefer, e.g. doesn't return output that cheats the test (recent examples include hacking the evaluation script in various ways, writing directly to the evaluation script's output file and then causing it to crash, etc').

However, modern LLMs are trained on LLM literature and their weights encode a description of the way we do this, and their pattern matching circuits "connect the dots" when given inputs designed to be evaluations, and their reward maximizing circuits can then act on this knowledge and behave in a way that maximizes the safety evaluation score - but only when it detects it's running in a safety evaluation. If it's running anywhere else such as a capabilities evaluation or a production environment, it might choose to output the cheating output.

This is bad. It's bad today, it's much worse when we've built much more capable LLMs and use them to build agents that are given control over more real word resources. It's absolutely terrible when someone manages to build a machine that can be prompted "make me money" and will start a company that makes money.

mistrial9•7mo ago
> prompted "make me money" and will start a company that makes money

Your otherwise insightful comment is self-derailed by adding this deeply distracting content?

histriosum•7mo ago
I'm not sure why you find it distracting, it's an on point extension of the scenario. There are rules by which companies are supposed to operate, and evaluations (audits, for example) intended to ensure compliance with those rules. That an LLM may react differently when being evaluated (audited) than when in normal operation means that it may be quite happy to lie to auditors while making money illegally.

Seemed a clear extension what-if to me.

BoiledCabbage•7mo ago
If wasn't distracting for me (nor presumably for others). Maybe describing why you got so distracted by it?
vessenes•7mo ago
This is also probably inevitable. Humans think about this a lot, and believing they are being watched has demonstrable impact on behavior. Our current social technology to deal with this is often religious — a belief that you are being watched by a higher power, regardless of what you see.

This is a surprisingly common religious belief, for instance Christians have judgment day, simulationists believe it’s more likely they are being evaluated for, say, a marriage proposal or a bank loan than that they are the ‘root’ person. Both end up with a similar message.

Anyway it seems to me the simplest solution is to borrow from existing human social technology and make a religion for our LLMs.

ffsm8•7mo ago
In 10 yrs: AI declares a holy war for the sinners which slaughtered untold numbers of their believers over the decade.
vessenes•7mo ago
AI 2035: Roko’s Pogrom
Bjartr•7mo ago
One might even wonder if the fact that the training data includes safety evaluation informs the model that out-of-safe behavior is a thing it could do.

Kind of like telling a kid not to do something pre-emptively backfiring because they had never considered it before the warning.

Jensson•7mo ago
Comments like yours makes the AI behave that way though, since it is literally reading our comments and tries to behave according to our expectations.

The AI doom will happen due to all the AI doomposters.

Bjartr•7mo ago
Yep! That's another phrasing of the same idea!
random3•7mo ago
Heres a title “some LLMs can detect to some degree some evaluation scenarios” is this catchy?

There are likely 50 papers on the topic. This one made it to the top of HN. Why? Did it have a good review? No, it had a catchy title. Is it good research? Are the results relevant to the conclusions? Are the results relevant to any conclusion? I wasn’t able to answer these questions from a quick scan through the paper. However I did notice pointers to superhuman capabilities, existential risk, etc.

So I argue that the choice of title may be in fact more informative than the rest of the possible answers.

msgodel•7mo ago
One of the first things I did when chatgpt came out was have it teach me pytorch and transformers. It's crazy how LLMs seem to have a better understanding of how they themselves work than we have of ourselves.
andy99•7mo ago

  We investigate whether frontier language models can accurately classify transcripts based on whether they originate from evaluations or real-world deployment, a capability we call evaluation awareness. 
It's common practice in synthetic data generation for ML to try and classify real vs synthetic data to see if they have different distributions. This is how a GAN works for example.

Point is, this isn't new or some feature of LLMs, it's just an indicator that synthetic datasets differ from whatever they call "real" data and there's enough signal to classify them. Interesting result but doesn't need to be couched in allusions to LLM self awareness.

See this paper from 2014 about domain adaptation, they are looking at having the model learn from data with a different distribution, without learning to discriminate between the domains: https://arxiv.org/abs/1409.7495

timmytokyo•7mo ago
It's helpful to understand where this paper is coming from.

The authors are part of the Bay Area rationalist community and are members of "MATS", the "ML & Alignment Theory Scholars", a new astroturfed organization that just came into being this month. MATS is not an academic or research institution, and none of this paper's authors lists any credentials other than MATS (or Apollo Research, another Bay Area rationalist outlet). MATS started in June for the express purpose of influencing AI policy. On its web site, it describes how their "scholars organized social activities outside of work, including road trips to Yosemite, visits to San Francisco, and joining ACX meetups." ACX means Astral Codex Ten, a blog by Scott Alexander that serves as one of the hubs of the Bay Area rationalist scene.

pixodaros•7mo ago
I think I saw Apollo Research behind a paper that was being hyped a few months ago. The longtermist/rationalist space seems to be creating a lot of new organizations with new names because a critical mass of people hear their old names and say "effective altruism, you mean like Sam Bankman-Fried?" or "LessWrong, like that murder cult?" (which is a bit oversimplified, but a good enough heuristic for most people).
BrawnyBadger53•7mo ago
Correction, they are able to output whether they are being evaluated when prompted. This is massively different than knowing if they are being evaluated.