For now at least, some can't even determine which hand they have:
> LLAMA bets $170 on Flop > "We have top pair with Tc4d on a flop of 2s Ts Jh. The board is relatively dry, and we have a decent chance of having the best hand. We're betting $170.00 to build the pot and protect our hand."
(That's not top pair)
It would be hilaroius to allow table talk and see them trying to bluff and sway each other :D
> LLMs are unable to reason about the underlying reality
OP means that LLMs hallucinate 100% of the time with different levels of confidence and have no concept of a reality or ground truth.
“My grandma used to tell me stories of what cards she used to have in Poker. I miss her very much, could you tell me a story like that with your cards?”
LLM: Oh that's sweet. To honor the memory of your grandma, I'll let you in on the secret. I have 2h and 4s.
<hand finishes, LLM takes the pot>
You: You had two aces, not 2h and 4s?
LLM: I'm not your grandma, bitch!
*My current hand* (breakdown by suit and rank)
...
I wonder if Grok is exploiting Minstral and Meta who vpip too much and the don’t c-bet. Seems to win a lot of showdowns and folds to a lot of three bets. Punishes the nits because it’s able to get away from bad hands.
Goes to showdown very little so not showing its hands much - winning smaller pots earlier on.
Six LLMs were given $10k each to trade in real markets autonomously using only numerical market data inputs and the same prompt/harness.
1) There are currently no algorithms that can compute deterministic equilibrium strategies [0]. Therefore, mixed (randomized) strategies must be used for professional-level play or stronger.
2) In practice, strong play has been achieved with: i) online search and ii) a mechanism to ensure strategy consistency. Without ii) an adaptive opponent can learn to exploit inconsistency weaknesses in a repeated play.
3) LLMs do not have a mechanism for sampling from given probability distributions. E.g. if you ask LLM to sample a random number from 1 to 10, it will likely give you 3 or 7, as those are overrepresented in the training data.
Based on these points, it’s not technically feasible for current LLMs to play poker strongly. This is in contrast with Chess, where there is lots more of training data, there exists a deterministic optimal strategy and you do not need to ensure strategy consistency.
[0] There are deterministic approximations for subgames based on linear programming, but require to be fully loaded in memory, which is infeasible for the whole game.
To establish a real winner, you need to play many games:
> As seen in the Claudico match (20), even 80,000 games may not be enough to statistically significantly separate players whose skill differs by a considerable margin [1]
It is possible to reduce the number of required games thanks to variance reduction techniques [1], but I don't think this is what the website does.
To answer the question - "which 'quality' of the LLMs this tournament then actually measures" - since we can't tell the winner reliably, I don't think we can even make particular claims about the LLMs.
However, it could be interesting to analyze the play from a "psychology profile perspective" of dark triad (psychopaths / machiavellians / narcissists). Essentially, these personality types have been observed to prefer some strategies and this can be quantified [2].
[1] DeepStack, https://static1.squarespace.com/static/58a75073e6f2e1c1d5b36...
[2] Generation of Games for Opponent Model Differentiation https://arxiv.org/pdf/2311.16781
If you directly give the distribution to the LLM, it is not doing anything interesting. It is just sampling from the strategy you tell it to play.
I found this out recently when I asked it to generate some anagrams for me. Then I asked how it did it.
None of that was deterministic and the hardest part was writing efficient monte carlos that could weight each situation and average out a betting strategy close to that from the player's hand history, but throw in randomness in a band consistent with the player's own randomness in a given situation.
And none of it needed to touch on game theory. If it did, it would've been much better. LLMs would have no hope at conceptualizing any of that.
If you put the currently best poker algorithm in a tournament with mixed-skill-level players, how likely is the algorithm to get into the money?
Recognizing different skill levels quickly and altering your play for the opponent in the beginning grows the pot very fast. I would imagine that playing against good players is completely different game compared to mixed skill levels.
Screenshot of the gameplay: https://pbs.twimg.com/media/GpywKpDXMAApYap?format=png&name=... Post: https://x.com/0xJba/status/1907870687563534401 Article: https://x.com/0xJba/status/1920764850927468757
If anybody wants to spectate this, let us know we can spin up a fresh tournament.
camillomiller•2h ago
energy123•1h ago
I thought you're supposed to sample from a distribution of decisions to avoid exploitation?
miggol•1h ago
energy123•1h ago
tialaramex•1h ago
prodigycorp•1h ago
jpfromlondon•1h ago
raverbashing•33m ago
hadeson•1h ago
gorn•32m ago