This environment is part of the magent environments. Please read that page first for general information.
Import | from pettingzoo.magent import battlefield_v4 |
Actions | Discrete |
Parallel API | Yes |
Manual Control | No |
Agents | agents= [red_[0-11], blue_[0-11]] |
Agents | 24 |
Action Shape | (21) |
Action Values | Discrete(21) |
Observation Shape | (13,13,5) |
Observation Values | [0,2] |
State Shape | (80, 80, 5) |
State Values | (0, 2) |
Same as battle but with fewer agents arrayed in a larger space with obstacles.
A small-scale team battle, where agents have to figure out the optimal way to coordinate their small team in a large space and maneuver around obstacles in order to defeat the opposing team. Agents are rewarded for their individual performance, and not for the performance of their neighbors, so coordination is difficult. Agents slowly regain HP over time, so it is best to kill an opposing agent quickly. Specifically, agents have 10 HP, are damaged 2 HP by each attack, and recover 0.1 HP every turn.
Like all MAgent environments, agents can either move or attack each turn. An attack against another agent on their own team will not be registered.
battle_v4.env(map_size=80, minimap_mode=False, step_reward-0.005,
dead_penalty=-0.1, attack_penalty=-0.1, attack_opponent_reward=0.2,
max_cycles=1000, extra_features=False)
map_size
: Sets dimensions of the (square) map. Minimum size is 46.
minimap_mode
: Turns on global minimap observations. These observations include your and your opponents piece densities binned over the 2d grid of the observation space. Also includes your agent_position
, the absolute position on the map (rescaled from 0 to 1).
step_reward
: reward added unconditionally
dead_penalty
: reward added when killed
attack_penalty
: reward added for attacking
attack_opponent_reward
: Reward added for attacking an opponent
max_cycles
: number of frames (a step for each agent) until game terminates
extra_features
: Adds additional features to observation (see table). Default False
Key: move_N
means N separate actions, one to move to each of the N nearest squares on the grid.
Action options: [do_nothing, move_12, attack_8]
Reward is given as:
If multiple options apply, rewards are added.
The observation space is a 13x13 map with the below channels (in order):
feature | number of channels |
---|---|
obstacle/off the map | 1 |
my_team_presence | 1 |
my_team_hp | 1 |
my_team_minimap(minimap_mode=True) | 1 |
other_team_presence | 1 |
other_team_hp | 1 |
other_team_minimap(minimap_mode=True) | 1 |
binary_agent_id(extra_features=True) | 10 |
one_hot_action(extra_features=True) | 21 |
last_reward(extra_features=True) | 1 |
agent_position(minimap_mode=True) | 2 |
The observation space is a 80x80 map. It contains the following channels, which are (in order):
feature | number of channels |
---|---|
obstacle map | 1 |
team_0_presence | 1 |
team_0_hp | 1 |
team_1_presence | 1 |
team_1_hp | 1 |
binary_agent_id(extra_features=True) | 10 |
one_hot_action(extra_features=True) | 21 |
last_reward(extra_features=True) | 1 |