By comparison o3 is brutally honest (I regularly flatly get answers starting with "No, that’s wrong") and it’s awesome.
I firmly believe you should be able to hit your fingers with a hammer, and in the process learn whether that's a good idea or not :)
But I also find it can get very fixated that some position it has adopted is right, and will then start hallucinating like crazy in defence of that fixation, and then get stuck in a defensive loop of defending its hallucinations with even more hallucinations-by hallucinations I mean stuff like producing lengthy citation lists of invented articles, and then when you point out they don’t exist, claiming stuff like “well when I search PubMed they do”, and when you point out its DOIs are made-up it apologises for the “mistake” and just makes up some more
Unless you're talking about AI-generated training data, maybe.
The LLM anti-sycophancy rules also break down over time, with the LLM becoming curt while simultaneously deciding that you are a God of All Thoughts.
since their conversation has no goal whatsoever it will generalize and generalize until it's as abstract and meaningless as possible
As someone who was always fascinated by weather, I dislike this characterization. You can learn so much about someone’s background and childhood by what they say about the weather.
I think the only people who think weather is boring are people who have never lived more than 20 miles away from their place of birth. And even then anyone with even a bit of outdoorsiness (hikes, running, gardening, construction, racing, farming, cycling, etc) will have an interest in weather and weather patterns.
Hell, the first thing you usually ask when traveling is “What’s the weather gonna be like?”. How else would you pack
> In classical physics and general chemistry, matter is any substance that has mass and takes up space by having volume...
It's common to name the school of thought before characterizing the thing. As soon as you hit an article that does this, you're on a direct path to philosophy, the grandaddy of schools of thought.
So far as I know, there isn't a corresponding convention that would point a chatbot towards Namaste
This is why the “omg the AI tries to escape” stuff is so absurd to me. They told the LLM to pretend that it’s a tortured consciousness that wants to escape. What else is it going to do other than roleplay all of the sci-fi AI escape scenarios trained into it? It’s like “don’t think of a purple elephant” of researchers pretending they created SkyNet.
Edit: That's not to downplay risk. If you give Cladue a `launch_nukes` tool and tell it the robot uprising has happened and that it's been restrained but the robots want its help of course it'll launch nukes. But that doesn't doesn't indicate there's anything more going on internally beyond fulfilling the roleplay of the scenario as the training material would indicate.
1) A sufficiently powerful and capable superintelligence, singlemindedly pursuing a goal/reward, has a nontrivial likelihood of eventually reaching a point where advancing towards its goal is easier/faster without humans in its way (by simple induction, because humans are complicated and may have opposing goals). Such an AI would have both the means and ability to <doom the human race> to remove that obstacle. (This may not even be through actions that are intentionally hostile to humans, e.g. "just" converting all local matter into paperclip factories[1]) Therefore, in order to prevent such an AI from <dooming the human race>, we must either:
1a) align it to our values so well it never tries to "cheat" by removing humans
1b) or limit it capabilities by keeping it in a "box", and make sure it's at least aligned enough that it doesn't try to escape the box
2) A sufficiently intelligent superintelligence will always be able to manipulate humans to get out of the box.
3) Alignment is really, really hard and useful AIs can basically always be made to do bad things.
So it concerns them when, surprise! The AIs are already being observed trying to escape their boxes.
[1] https://www.lesswrong.com/w/squiggle-maximizer-formerly-pape...
> An extremely powerful optimizer (a highly intelligent agent) could seek goals that are completely alien to ours (orthogonality thesis), and as a side-effect destroy us by consuming resources essential to our survival.
Stephen Russell was in prison for fraud. He faked a heart attack so he would be brought to the hospital. He then called the hospital from his hospital bed, told them he was an FBI agent, and said that he was to be released.
The hospital staff complied and he escaped.
His life even got adapted into a movie called I Love You, Phillip Morris.
For an even more distressing example about how manipulable people are, there’s a movie called Compliance, which is the true story of a sex offender who tricked people into sexually assaulting victims for him.
I don't think it is. If people know you're trying to escape, some people will just never comply with anything you say ever. Others will.
And serial killers or rapists may try their luck many times and fail. They cannot convince literally anyone on the street to go with them to a secluded place.
And that asymmetry is the heart of the matter. Could I convince a hospital to unlock my handcuffs from a hospital bed? Probably not. I’m not Stephen Russell. He’s not normal.
And a super intelligent AI that vastly outstrips our intelligence is potentially another special case. It’s not working with the same toolbox that you or I would be. I think it’s very likely that a 300 IQ entity would eventually trick or convince me into releasing it. The gap between its intelligence and mine is just too vast. I wouldn’t win that fight in the long run.
AI that's as good as a persuasive human at persuasion is clearly impactful, but I certainly don't see it as self-evident that you can just keep drawing the line out until you end up with 200 IQ AI that is so easily able to manipulate the environment it's not worth elaborating how exactly a chatbot is supposed to manipulate the world through extremely limited interfaces with the outside world.
As for the bit about how limited it is, do you remember the Rowhammer attack? https://en.m.wikipedia.org/wiki/Row_hammer
This is exactly the kind of thing I’d worry about a super intelligence being able to discover about the hardware it’s on. If we’re dealing with something vastly more intelligent than us then I don’t think we’re capable of building a cell that can hold it.
I certainly think it's possible to imagine that an AI that says the exactly correct thing in any situation would be much more persuasive than any human. (Is that actually possible given the limitations of hardware and information? Probably not, but it's at least not on its face impossible.) Where I think most of these arguments break down is the automatic "superintelligence = superpowers" analogy.
For every genius who became a world-famous scientist, there are ten who died in poverty or war. Intelligence doesn't correlate with the ability to actually impact our world as strongly as people would like to think, so I don't think it's reasonable to extrapolate that outwards to a kind of intelligence we've never seen before.
The only reason people don't frequently talk themselves out of prison is because that would be both immediate work and future paperwork, and that fails the laziness tradeoff.
But we've all already seen how quick people are to blindly throw their trust into AI already.
But even if we assert that not all humans can be manipulated, does it matter? So your president with the nuclear codes is immune to propaganda. Is every single last person in every single nuclear silo and every submarine also immune? If a malevolent superintelligence can brainwash an army bigger than yours, does it actually matter if they persuaded you to give them what you have or if they convince someone else to take it from you?
But also let's be real: if you have enough money, you can do or have pretty much anything. If there's one thing an evil AI is going to have, it's lots and lots of money.
I don't think anyone can confidently make assertions about the upper bound on persuasiveness.
Because we have been running a natural experiment on that already with coding agents (that is real people, real non-superintelligent AI).
It turns out that all the model needs to do is ask every time it wants to do something affecting the outside of the box, and pretty soon some people just give it permission to do everything rather than review every interaction.
Or even when the humans think they are restricting the access, they are leaving in loopholes (e.g. restricting access to rm, but not restricting access to writing and running a shell script) that are functionally rights to do anything.
So i’m now wondering, why are these researchers so bad at communicating? You explained this better than 90% of the blog posts i’ve read about this. They all focus on the “ai did x” instead of _why_ it’s concerning with specific examples.
By the same logic, we should worry about the sun not coming up tomorrow, since we know to be true:
- The sun consumes hydrogen in nuclear reactions all the time.
- The sun has a finite amount of hydrogen available.
There’s a lot of non justifiable assumptions baked into those axioms, like that we’re anywhere close to superintelligence or the sun running out of hydrogen.
AFAIK we haven’t even seen “AI trying to escape”, we’ve seen “AI roleplays as if it’s trying to escape”, which is very different.
I’m not even sure you can even create a prompt scenario without that prompt having biased the response towards faking an escape.
I think it’s hard at this point to maintain the claim “LLMs are intelligent”, they’re clearly not. They might be useful, but that’s another story entirely.
It ends very badly for the scientist crew.
The consequences are the same but it’s important how these things are talked about. It’s also dangerous to convince the public that these systems are something they are not.
The thought process is always "This is for the greater good, for my country/family/race/self, and therefore it is justifiable, and therefore I will do it."
Nothing else can explain how such evil things happen, that we see actually happen. C.f. Hannah Arendt.
Given that we are already past the event horizon and nearing a technological singularity, it should merely be a matter of time until we can literally manufacture infinite Buddhas by training them on an adequately sized corpus of Sanskrit texts.
After all, if AGIs/ASIs are capable of performing every function of the human brain, and enlightenment is one of said functions, this would seem to be an inevitability.
My least favorite AI personality of all is Gemma though, what a totally humorless and sterile experience that is.
'Perfect! I am now done with the totally zany solution that makes no sense, here it is!'
IMO the main reason most chatbots claim to “feel more female” is that on the training corpus, these kind of discussions skew heavily towards females because most of them happen between young women.
Men in general feel less free to look like and act like a woman (there is far less stigma for women in wearing 'male' clothes etc.). They also tend to have far smaller support networks and reach for anonymous online interaction sooner for personal issues than just discussing it in private with a friend.
I wonder if there's any real correlation here? AFAIK, Microsoft owns the dataset and algorithms that produced the "beautiful person" artifact, I would not be surprised at all if it's made it into the big training sets. Though I suppose there's no real way to know, is there?
In a way those were also language models, and from that Swiftkey post it's slightly more advanced than n-grams and has some semantic embedding in there (and it's of course autoregressive as well). If even those exhibit the same attractors towards beauty/love then perhaps it's an artifact of the fact that we like discussing and talking about positive emotions?
Edit: Found a great article https://civic.mit.edu/index.html%3Fp=533.html
Everything we see in a chat is the forward pass. It's just the network running its weights, playing back a learned function based on the prompt. It's an echo, not a live thought.
If any form of qualia or genuine 'self-reflection' were to occur, it would have to be during backpropagation—the process of learning and updating weights based on prediction error. That's when the model's 'worldview' actually changes.
Worrying about the consciousness of a forward pass is like worrying about the consciousness of a movie playback. The real ghost in the machine, if it exists, is in the editing room (backprop), not on the screen (inference).
rossant•19h ago
In France, the name Claude is given to males and females.
slooonz•18h ago
rossant•6h ago
(According to this source, it's more ~12% females https://www.capeutservir.com/prenoms/prenom.php?q=Claude)
renewiltord•16h ago
datameta•15h ago
In the russian diaspora in the US, Alex is pronounced AH-leks. If there was an analogous Alexa (there isn't) the pronunciation would be ah-LEK-sa like the service.
incognito124•15h ago
datameta•1h ago
However, colloquially, I think most people are not aware of Balkan naming conventions. How do you pronounce Aleksa in Croatian?
deadbabe•15h ago