SpinGPT

Training a Poker LLM

Narada Maugin

LAMSADE, Paris Dauphine University - PSL, CNRS, Paris, France

Tristan Cazenave

LAMSADE, Paris Dauphine University - PSL, CNRS, Paris, France

What is poker?

  • A family of card games
    • Most widespread variant: No-Limit Texas Hold’em
    • 2 to 10 players
    • Dealing and betting rounds (preflop, flop, turn, river)
  • Originated in the United States in the 1820s
  • ~ 100 million players
  • Imperfect-information game
  • Enormous game tree

Timeline

---
config:
  theme: dark
---
timeline
    title The rise of poker AIs
    2006 : First Computer Poker Competition (ACPC).
    2007 : Counterfactual Regret Minimization (CFR) introduced (Zinkevich et al.)
    2014 : Cepheus solves Heads-Up Limit (Bowling et al.). : Game size 10^13.
    2017 : Libratus solves Heads-Up No-Limit (Brown & Sandholm). : Game size 10^160.
    2019 : Pluribus beats pros in six-player No-Limit Hold’em (Brown & Sandholm). : Game size <br> > 10^220.

What challenges remain for poker AIs?

  • Tournaments are the main challenge
    • Varying player counts and stack sizes –> non-i.i.d. dynamics
    • The chipEV objective needs to be reconsidered
  • Spin & Go
    • 3-player mini-tournament –> easier to tackle
    • The most played online format

LLMs for poker

  • Limitations:
    • LLMs perform poorly at poker with zero-shot or few-shot prompting.
    • Few open, high-quality training resources online.
  • Advantages:
    • Prior knowledge (rules, basic strategies, probabilities,…)
    • Strong ability to learn and produce structured text representations
    • –> Fine-tuning is effective (Huang 2024, Zhuang 2025)

Data collection

  • Hands from my own professional play
    • Dates: 2018-2020
    • Format: Spin & Go (2 or 3 players)
    • Buy-ins: €50, €100, €250
    • Sample size: n = 320,000

Data processing

  • Original hand:
    Game #20396217527 starts.
    Game #<do not remove this line!> starts.
    ***** Hand History for Game 20396217527 *****
    NL Texas Hold’em €250 EUR Buy-in - Friday, July 10, 18:41:29 CEST 2020
    Table 250€ SIT’N GO JAQKPOT (276572612) Table #1 (Real Money)
    Seat 2 is the button
    Total number of players : 2/3
    Seat 1: Dimitrov98 ( 605 )
    Seat 2: FrenchBaguette ( 895 )
    Trny: 276572612 Level: 3
    Blinds(20/40)
    FrenchBaguette posts small blind [20].
    Dimitrov98 posts big blind [40].
    ** Dealing down cards **
    Dealt to FrenchBaguette [ Kd 7s ]
    FrenchBaguette raises [60]
    Dimitrov98 calls [40]
    ** Dealing Flop ** [ 5h, 8d, 4s ]
    Dimitrov98 checks
    FrenchBaguette checks
    ** Dealing Turn ** [ 4d ]
    Dimitrov98 checks
    FrenchBaguette checks
    ** Dealing River ** [ Td ]
    Dimitrov98 checks
    FrenchBaguette checks
    Dimitrov98 shows [ 9d, Qh ]a pair of Fours.
    FrenchBaguette shows [ Kd, 7s ]a pair of Fours with King kicker.
    FrenchBaguette wins 160 chips from the main pot with a pair of Fours with King kicker.
  • Processed hand :
    {“instruction”:“pos:H=SB stacks:H=15.1,BB=22.4 hand:Kd7s | pre: H r2,BB c | flop:5h8d4s BB x,H x | turn:4d BB x,H x | river:Td SB x, H:”,“output”:“x”,“input”:““}

Supervised fine-tuning (SFT)

  • Model: Llama 3.1-8B-Instruct

    Description

  • Advantages:

    • Open weights
    • Small enough for single-GPU
  • Limitations:

    • Beginner level at poker
    • Lower capacity than larger models
  • Training time: 10 hours on one GPU

SpinGPT-SFT: imitation results

  • Test set (n = 32,000)

  • Accuracy (exact-match):

    • 80% (exact)
    • 84% (with tolerance)
  • Illegal actions: 1.4%

Benchmark : Slumbot

  • Slumbot
    • ACPC 2018 champion poker AI
    • Common benchmark (easy API evaluation)

Warning

Depth mismatch: SpinGPT 0–35 BB vs Slumbot 200 BB.
Applying a short-stack policy at deep stacks → over-shoving → losses.

  • Patch: replace all-ins with 2/3-pot when facing no bet; otherwise 3x raise

Results vs Slumbot (1)

Agent Win rate
(BB/100)
95% CI
(BB/100)
Hands played
SpinGPT‑SFT 13.4 ± 12.9 30,000

Interpreting win rate (BB/100)

  • < 0: loses
  • ≈ 0: break-even
  • 0-6: wins
  • > 6: wins by a wide margin

Results vs Slumbot (2)

Agent Win rate
(BB/100)
95% CI
(BB/100)
Hands played Year
BabyTartanian8 3.6 ± 1.2 N/A 2016
ReBeL 4.5 ± 1.0 N/A 2020
AlphaHoldem 11.2 ± 1.6 100,000 2022
PokerGPT 15.8 ± 4.9 10,000 2024
SpinGPT-SFT 13.4 ± 12.9 30,000 2025

Reinforcement Learning (RL)

Objective: retrain SpinGPT-SFT to approach Game Theory Optimal (GTO).

  • InstaGTO data
  • ORPO (best-vs-worst action per state)
  • 320,000 training hands:
    • 270,000 InstaGTO (synthetic)
    • 50,000 pro (to prevent catastrophic forgetting)
  • Training time: 10 hours on a single GPU
  • Final model: SpinGPT

SpinGPT imitation results

  • On the pro test set (n = 32 000)
    • Accuracy: 79% (83% with tolerance)
  • On the solver test set (n = 30 000)
    • Accuracy: 72% (78% with tolerance)

SpinGPT > SpinGPT-SFT

  • Head-to-head: SpinGPT-SFT vs SpinGPT
    • 10,000 hands (duplicate format)
    • Heads-up (1v1), 25BB
    • 13.2 ± 7.24 BB/100 for SpinGPT
  • –> RL was effective

Main limitations

  • Absolute strength of SpinGPT unknown
    • No matches vs spin-and-go specialist AIs
    • No matches vs humans
  • Numeric understanding issues (e.g., 5 < 5.2 < 5.11?!)

Improvements and outlook

  • Stronger base model
  • Hyperparameter tuning
  • Hybrid with a solver (preflop, postflop HU, push-or-fold)
  • Opponent adaptation
  • Explanation of decisions

Conclusion

  • Two stages pipeline: SFT on expert data, then ORPO on solver data
  • SpinGPT-SFT beats Slumbot (13.4 ± 12.9 BB/100)
  • SpinGPT outperforms SpinGPT-SFT (+13.2 ± 7.24 BB/100)
  • Tolerant accuracy: 83% on human test set, 78% on solver
  • Benchmark ≠ target (Spin & Go format): no human evaluation yet (planned)
  • Try SpinGPT: spingpt.lamsade.fr
  • Questions?