Mahjong

environment gif

This environment is part of the classic environments. Please read that page first for general information.

Name Value
Actions Discrete
Agents 4
Parallel API false
Manual Control No
Action Shape Discrete(38)
Action Values Discrete(38)
Observation Shape (6, 34, 4)
Observation Values [0, 1]
Import from pettingzoo.classic import mahjong_v0
Agents agents= ['player_0', 'player_1', 'player_2', 'player_3']

Agent Environment Cycle

environment aec diagram

Mahjong

Mahjong is a tile-based game with 4 players. It uses a deck of 136 tiles that is comprised of 4 identical sets of 34 unique tiles. The objective is to form 4 sets and a pair with the 14th drawn tile. If no player wins, no player receives a reward.

Our implementation wraps RLCard and you can refer to its documentation for additional details. Please cite their work if you use this game in research.

Observation Space

The observation space has a (6, 34, 4) shape with the first index representing the encoding plane. The contents of each plane are described in the table below:

Plane Description
0 Current Player’s hand
1 Played tiles on the table
2 Public piles of player_0
3 Public piles of player_1
4 Public piles of player_2
5 Public piles of player_3
Encoding per Plane
Plane Row Index Description
0 - 8 Bamboo
0: 1, 1: 2, …, 8: 9
9 - 17 Characters
9: 1, 10: 2, …, 17: 9
18 - 26 Dots
18: 1, 19: 2, …, 26: 9
27 Dragons Green
28 Dragons Red
29 Dragons White
30 Winds East
31 Winds West
32 Winds North
33 Winds South
Plane Column Index Description
0 Tile Set 1
1 Tile Set 2
2 Tile Set 3
3 Tile Set 4

Action Space

The action space, as described by RLCard, is

Action ID Action
0 - 8 Bamboo
0: 1, 1: 2, …, 8: 9
9 - 17 Characters
9: 1, 10: 2, …, 17: 9
18 - 26 Dots
18: 1, 19: 2, …, 26: 9
27 Dragons Green
28 Dragons Red
29 Dragons White
30 Winds East
31 Winds West
32 Winds North
33 Winds South
34 Pong
35 Chow
36 Gong
37 Stand

For example, you would use action 34 to pong or action 37 to stand.

Rewards

Winner Loser
+1 -1

The legal moves available for each agent, found in env.infos[agent]['legal_moves'], are updated after each step. Taking an illegal move ends the game with a reward of -1 for the illegally moving agent and a reward of 0 for all other agents.