I mean who knows if those are really claude thoughts or claude just think that is his thoughts because humans wants it
Whatever they did on LLama didn't work, nothing makes sense in their example where they ask the model to lie about 1+1. Either the model is too old, or whatever they used isn't working, but whatever the autoencoder outputs is nothing like their examples with claude. Gemma is similarly bad.
tjohnell•1h ago
rotcev•50m ago
astrange•24m ago
Of course, if you use it to make any decision that can still happen eventually.