Implementing our first Multi-agent RL

3 min readSep 25, 2022

Questions are more important the answers.

What’s Multi-agent RL (sometimes abbreviated to MARL)?

In the normal of RL setup, we have one agent communicating with the environment using the observation, reward, and actions. But in some problems, which often arise in reality, we have serveral agents involved in the environment interation. for examples:

Multiplayer games, like Dota2 or StarCraft II, when the agent needs to control several units competing with other players’ units.
Automomous driving is a multi-agent setting where the host vehicle must apply sophisticated negotiation skills with other road users when overtaking, giving way, merging, taking left and right turns and while pushing ahead in unstructured urban roadways.

The MAgent environment

The high-level concepts of MAgent are simple and efficient. It provides the simulation of a grid world that 2D agents inhabit.

For example, the first environment that we will consider is a predator-prey model, where “tigers” hunt “deer” and obtain reward for that.

A random environment

To visualize random environment.

Implement code

import os
import sys
sys.path.append(os.path.join(os.getcwd(), “MAgent/python”))

As MAgent is not installed as a package

import magent
from magent.builtin.rule_model import RandomActorMAP_SIZE = 64

We import the main package provided by MAgent. In addition, we define the size of our environment, which is a 64x64 grid.

if __name__ == “__main__”:
 env = magent.GridWorld(“forest”, map_size=MAP_SIZE)
 env.set_render_dir(“render”)

First of all, we create the environment, which is represented by the GridWorld class.

deer_handle, tiger_handle = env.get_handles()models = [
   RandomActor(env, deer_handle),
   RandomActor(env, tiger_handle),
]

Init two models.

env.reset() 
env.add_walls(method=”random”, n=MAP_SIZE * MAP_SIZE * 0.04)
env.add_agents(deer_handle, method=”random”, n=5)
env.add_agents(tiger_handle, method=”random”, n=2)

In MAgent terminology, reset() clears the grid completely, which is different from the Gym. the preceding code turn 4% of grid cells into walls by add_walls(), and randomly places five deer and two tigers.

v = env.get_view_space(tiger_handle)
r = env.get_feature_space(tiger_handle)
print(“Tiger view: %s, features: %s” % (v, r))
vv = env.get_view_space(deer_handle)
rr = env.get_feature_space(deer_handle)
print(“Deer view: %s, features: %s” % (vv, rr))

In MAgent, the observation of every agent is divided into two parts: view space and feature space.

done = False
step_idx = 0
while not done:
  deer_obs = env.get_observation(deer_handle)
  tiger_obs = env.get_observation(tiger_handle)
  if step_idx == 0:
    print(“Tiger obs: %s, %s” % (
      tiger_obs[0].shape, tiger_obs[1].shape))
    print(“Deer obs: %s, %s” % (
      deer_obs[0].shape, deer_obs[1].shape))
  print(“%d: HP deers: %s” % (
    step_idx, deer_obs[0][:, 1, 1, 2]))
  print(“%d: HP tigers: %s” % (
    step_idx, tiger_obs[0][:, 4, 4, 2]))

We start the step loop, where we get observations.

deer_act = models[0].infer_action(deer_obs)
tiger_act = models[1].infer_action(tiger_obs)
env.set_action(deer_handle, deer_act)
env.set_action(tiger_handle, tiger_act)

we will ask the model to select the actions from the observations (which are taken randomly) and pass those actions to the environment.

env.render()
done = env.step()
env.clear_dead()
t_reward = env.get_reward(tiger_handle)
d_reward = env.get_reward(deer_handle)
print(“Rewards: deer %s, tiger %s” % (d_reward, t_reward))
step_idx += 1

First of all, we ask the environment to save information about the agents and their location to be explored later. Then we call env.step() to do one time step in the grid-world simulation. This function returns a single Boolean flag, which will become True once all the agents are dead. Then we get the rewards vector for the group, show it, and iterate the loop again.

Implementing our first Multi-agent RL

The MAgent environment

A random environment

Written by Dongchan Year

Responses (1)