This environment is part of the butterfly environments. Please read that page first for general information.
|Action Shape||(3,), (2,)|
|Action Values||[-1, 1]|
|Observation Shape||(150, 150, 3), (154, 154, 3)|
|Observation Values||(0, 255)|
|State Shape||(720, 1280, 3)|
|State Values||(0, 255)|
|Average Total Reward||0|
This game is inspired by gold panning in the American “wild west” movies. There’s a blue river at the bottom of the screen, which contains gold. 4 “prospector” agents can move and touch the river and pan from it, and get a gold nugget (visibly held by them). Prospectors can only hold one nugget at a time, and nuggets stay in the same position relative to the prospector’s orientation.
Prospector agents take a 3-element vector of continuous values between -1 and 1, inclusive.
The action space is
(y, x, r), where
y is used for forward/backward movement,
x is used for left/right movement, and
r is used for clockwise/counter-clockwise rotation.
There are a handful of bank chests at the top of the screen. The prospector agents can hand their held gold nugget to the 3 “banker” agents, to get a reward. The banker agents can’t rotate, and the prospector agents must give the nuggets (which are held in the same position relative to the prospector’s position and rotation) to the front of the bankers (within a plus or minus 45 degree tolerance). The bankers then get the gold, and can deposit it into the chests to receive a reward.
Bankers take a 2-element vector of continuous values between -1 and 1, inclusive.
The action space is
(y, x), where
y is used for forward/backward movement and
x is used for left/right movement.
The observation space size for prospectors
(150, 150, 3) and the size for
bankers is slightly bigger at
(154, 154, 3). The diameter of both
prospector and banker agents is 30 pixels, so
the observation space sizes are about 5 times larger
than the diameter of each agent. There are fewer
bankers, so they’re given a slightly
larger observation space to give them more
information on their surroundings.
Rewards are issued for a prospector retrieving a nugget, a prospector handing a nugget off to a banker, a banker receiving a nugget from a prospector, and a banker depositing the gold into a bank. There is an individual reward, a group reward (for agents of the same type), and an other-group reward (for agents of the other type).
By default, if a prospector retrieves a nugget from the water, then that prospector receives a reward of 0.8, other prospectors will receive a reward of 0.1 and all bankers receive a reward of 0.1.
By default, if a prospector hands off a gold nugget to a banker (so the banker receives a gold nugget from a prospector), there are two rewards that are processed: the handing-off prospector gets 0.8, the other prospectors get 0.1, and all of the bankers get 0.1. Similarly, the banker receiving the gold nugget gets 0.8, the other bankers get 0.1, and all of the prospectors get 0.1.
Finally, by default when a banker deposits gold in the bank, that banker receives a reward of 0.8, the other bankers receive rewards of 0.1, and all of the prospectors receive 0.1.
"prospector_0" is the first sprite you control. Use the left and
right arrow keys to switch between the agents.
Move prospectors using the ‘WASD’ keys for forward/left/backward/right movement. Rotate using ‘QE’ for counter-clockwise/clockwise rotation.
Move the bankers using the ‘WASD’ keys for forward/left/backward/right movement.
The game lasts for 900 frames by default.
prospector_v4.env(ind_reward=0.8, group_reward=0.1, other_group_reward=0.1, prospec_find_gold_reward=1, prospec_handoff_gold_reward=1, banker_receive_gold_reward=1, banker_deposit_gold_reward=1, max_cycles=150
ind_reward: The reward multiplier for a single agent completing an objective.
group_reward: The reward multiplier that agents of the same type
as the scoring agent will earn.
other_group_reward: The reward multiplier that agents of the other type
as the scoring agent will earn. If the scoring agent is a prospector,
then this is the reward that the bankers get, and vice versa.
ind_reward + group_reward + other_group_reward == 1.0
prospec_find_gold_reward: The reward for a prospector
retrieving gold from the water at the bottom of the screen.
prospec_handoff_gold_reward: The reward for a prospector
handing off gold to a banker.
banker_receive_gold_reward: The reward for a banker receiving
gold from a prospector.
banker_deposit_gold_reward: The reward for a banker depositing
gold into a bank.
max_cycles: The number of frames the game should run for.
max_cycles, bumped PyMunk version (1.6.0)