This environment is part of the classic environments. Please read that page first for general information.
|Observation Shape||(6, 5, 15)|
Dou Dizhu, or Fighting the Landlord, is a shedding game involving 3 players and a deck of cards plus 2 jokers with suits being irrelevant. Heuristically, one player is designated the “Landlord” and the others become the “Peasants”. The objective of the game is to be the first one to have no cards left. If the first person to have no cards left is part of the “Peasant” team, then all members of the “Peasant” team receive a reward (+1). If the “Landlord” wins, then only the “Landlord” receives a reward (+1).
The “Landlord” plays first by putting down a combination of cards. The next player, may pass or put down a higher combination of cards that beat the previous play.
Our implementation wraps RLCard and you can refer to its documentation for additional details. Please cite their work if you use this game in research.
opponents_hand_visible: Set to
True to observe the entire observation space as described in
Observation Space below. Setting it to
False will remove any observation of the opponent’ hands and the observation space will only include planes 0, 2, 3, and 4.
The observation is a dictionary which contains an
'obs' element which is the usual RL observation described below, and an
'action_mask' which holds the legal moves, described in the Legal Actions Mask section.
The main Observation Space is encoded in 6 planes each with 5x15 entries. For each plane, the 5 rows represent 0, 1, 2, 3, or 4 cards of the same rank and the 15 columns represents all possible ranks (“3, 4, 5, 6, 7, 8, 9, 10, J, Q, K, A, 2, Black Joker, and Red Joker”). The meaning of each plane is described in the table below:
|0||Current Player’s hand|
|1||Union of the other players’ hand|
|2 - 4||Recent three actions (listed in order, with Plane 2 being the most recent action)|
|5||Union of all played card|
|Plane Row Index||Description|
|0||0 matching cards of same rank|
|1||1 matching cards of same rank|
|2||2 matching cards of same rank|
|3||3 matching cards of same rank|
|4||4 matching cards of same rank|
|Plane Column Index||Description|
The legal moves available to the current agent are found in the
action_mask element of the dictionary observation. The
action_mask is a binary vector where each index of the vector represents whether the action is legal or not. The
action_mask will be all zeros for any agent except the one whos turn it is. Taking an illegal move ends the game with a reward of -1 for the illegally moving agent and a reward of 0 for all other agents.
The raw size of the action space of Dou Dizhu is 27,472. Because of this, our implementation of Dou Dizhu abstracts the action space into 309 actions as shown below. Actions are abstracted by only focusing on the major combination and ignoring the kicker (e.g. a trio with single “4445” would be represented by “444*”). As a reminder, suits are irrelevant in Dou Dizhu.
|Action Type||Description||Number of Actions||Number of Actions after Abstraction||Action ID||Example|
|Solo||Any single card||15||15||0-14||
|Pair||Two matching cards of equal rank||13||13||15-27||
|Trio||Three matching cards of equal rank||13||13||28-40||
|Trio with single||Three matching cards of equal rank + single card as the kicker (e.g. 3334)||182||13||41-53||
|Trio with pair||Three matching cards of equal rank + pair of cards as the kicker (e.g. 33344)||156||13||54-66||
|Chain of solo||At least five consecutive solo cards||36||36||67-102||
|Chain of pair||At least three consecutive pairs||52||52||103-154||
|Chain of trio||At least two consecutive trios||45||45||155-199||
|Plane with solo||Two consecutive trios + a distinct kicker for each trio (e.g. 33344456)||21822||38||200-237||
|Plane with pair||Two consecutive trios + 2 distinct pairs (e.g. 3334445566)||2939||30||238-267||
|Quad with solo||Four matching cards of equal rank + 2 distinct solo cards (e.g 333345)||1326||13||268-280||
|Quad with pair||Four matching cards of equal rank + 2 distinct pairs (e.g 33334455)||858||13||281-293||
|Bomb||Four matching cards of equal rank||13||13||294-306||
|Rocket||Black Joker + Red Joker||1||1||307||
For example, you would use action
0 to play a single “3” card or action
30 to play a trio of “5”.
We modified the reward structure compared to RLCard. Instead of rewarding
0 to the losing player, we assigned a
-1 reward to the losing agent.