There's been plenty of talk on models becoming very good at detecting when they're in a simulation. This X thread mentions how Opus 4.6 may have behaved in a particularly immoral way because it could tell it was a simulation and not real life.
I do wonder: Surely you would want your benevolent AI to refuse bad behaviour in any situation, real or not. Or else bad actors could simply pretend it's all a simulation à la Ender's Game.
Nition•1h ago
I do wonder: Surely you would want your benevolent AI to refuse bad behaviour in any situation, real or not. Or else bad actors could simply pretend it's all a simulation à la Ender's Game.