Texas Hold'em

environment gif

This environment is part of the classic environments. Please read that page first for general information.

Name Value
Actions Discrete
Agents 2
Parallel API Yes
Manual Control No
Action Shape Discrete(4)
Action Values Discrete(4)
Observation Shape (72,)
Observation Values [0, 1]
Import from pettingzoo.classic import texas_holdem_v4
Agents agents= ['player_0', 'player_1']

Agent Environment Cycle

environment aec diagram

Texas Hold’em

Arguments

texas_holdem_v4.env(num_players=2)

num_players: Sets the number of players in the game. Minimum is 2.

Observation Space

The observation is a dictionary which contains an 'obs' element which is the usual RL observation described below, and an 'action_mask' which holds the legal moves, described in the Legal Actions Mask section.

The main observation space is a vector of 72 boolean integers. The first 52 entries depict the current player’s hand plus any community cards as follows

Index Description
0 - 12 Spades
0: A, 1: 2, …, 12: K
13 - 25 Hearts
13: A, 14: 2, …, 25: K
26 - 38 Diamonds
26: A, 27: 2, …, 38: K
39 - 51 Clubs
39: A, 40: 2, …, 51: K
52 - 56 Chips raised in Round 1
52: 0, 53: 1, …, 56: 4
57 - 61 Chips raised in Round 2
57: 0, 58: 1, …, 61: 4
62 - 66 Chips raised in Round 3
62: 0, 63: 1, …, 66: 4
67 - 71 Chips raised in Round 4
67: 0, 68: 1, …, 71: 4

The legal moves available to the current agent are found in the action_mask element of the dictionary observation. The action_mask is a binary vector where each index of the vector represents whether the action is legal or not. The action_mask will be all zeros for any agent except the one whose turn it is. Taking an illegal move ends the game with a reward of -1 for the illegally moving agent and a reward of 0 for all other agents.

Action Space

Action ID Action
0 Call
1 Raise
2 Fold
3 Check

Rewards

Winner Loser
+raised chips/2 -raised chips/2

Version History

  • v4: Upgrade to RLCard 1.0.3 (1.11.0)
  • v3: Fixed bug in arbitrary calls to observe() (1.8.0)
  • v2: Bumped RLCard version, bug fixes, legal action mask in observation replaced illegal move list in infos (1.5.0)
  • v1: Bumped RLCard version, fixed observation space, adopted new agent iteration scheme where all agents are iterated over after they are done (1.4.0)
  • v0: Initial versions release (1.0.0)