This environment is part of the classic environments. Please read that page first for general information.
Import | from pettingzoo.classic.chess_v5 |
Actions | Discrete |
Parallel API | Yes |
Manual Control | No |
Agents | agents= ['player_0', 'player_1'] |
Agents | 2 |
Action Shape | Discrete(4672) |
Action Values | Discrete(4672) |
Observation Shape | (8,8,20) |
Observation Values | [0,1] |
Chess is one of the oldest studied games in AI. Our implementation of the observation and action spaces for chess are what the AlphaZero method uses, with two small changes.
The observation is a dictionary which contains an 'obs'
element which is the usual RL observation described below, and an 'action_mask'
which holds the legal moves, described in the Legal Actions Mask section.
Like AlphaZero, the main observation space is an 8x8 image representing the board. It has 20 channels representing:
Like AlphaZero, the board is always oriented towards the current agent (the currant agent’s king starts on the 1st row). In other words, the two players are looking at mirror images of the board, not the same board.
Unlike AlphaZero, the observation space does not stack the observations previous moves by default. This can be accomplished using the frame_stacking
argument of our wrapper.
The legal moves available to the current agent are found in the action_mask
element of the dictionary observation. The action_mask
is a binary vector where each index of the vector represents whether the action is legal or not. The action_mask
will be all zeros for any agent except the one whose turn it is. Taking an illegal move ends the game with a reward of -1 for the illegally moving agent and a reward of 0 for all other agents.
From the AlphaZero chess paper:
[In AlphaChessZero, the] action space is a 8x8x73 dimensional array. Each of the 8×8 positions identifies the square from which to “pick up” a piece. The first 56 planes encode possible ‘queen moves’ for any piece: a number of squares [1..7] in which the piece will be moved, along one of eight relative compass directions {N, NE, E, SE, S, SW, W, NW}. The next 8 planes encode possible knight moves for that piece. The final 9 planes encode possible underpromotions for pawn moves or captures in two possible diagonals, to knight, bishop or rook respectively. Other pawn moves or captures from the seventh rank are promoted to a queen.
We instead flatten this into 8×8×73 = 4672 discrete action space.
You can get back the original (x,y,c) coordinates from the integer action a
with the following expression: (a/(8*73), (a/73)%8, a%(8*8))
Winner | Loser | Draw |
---|---|---|
+1 | -1 | 0 |