Key Concepts

Agents and Goals

Each agent has a start position and a target position on the grid. The agent's objective is to navigate from start to target while avoiding obstacles and other agents.

Partial Observability

Agents can only see a local region around themselves, controlled by obs_radius. The default radius is 5, giving an 11x11 observation window. Agents have no knowledge of the global map or other agents outside their field of view.

Actions

5 discrete actions:

Action	ID	Movement
Wait	0	Stay in place
Up	1	y - 1
Down	2	y + 1
Left	3	x - 1
Right	4	x + 1

Observations

Default observations are arrays of shape (3, 2*R+1, 2*R+1) where R = obs_radius:

Channel	Content	Values
0	Obstacles	1.0 = obstacle, 0.0 = free
1	Other agents	1.0 = agent present
2	Target direction	Encodes relative goal position

See Observation Types for alternative formats.

Task Modes (`on_target`)

Mode	Behavior	Use Case
`'nothing'`	Agent stays, all must reach goals simultaneously	Classical MAPF
`'restart'`	Agent gets a new goal upon reaching current one	Lifelong MAPF
`'finish'`	Agent disappears upon reaching goal	Simplified MAPF

See Task Modes for details.

Collision Systems

System	Behavior
`'block_both'`	Both colliding agents stay in place
`'priority'`	Higher-index agent moves, lower is blocked
`'soft'`	Agents can overlap freely

See Collision Systems for details.

Episode Lifecycle

env.reset() — generate or load map, place agents and targets
env.step(actions) — all agents move simultaneously
Episode ends when all agents are terminated or truncated
Metrics available in info[0]['metrics'] on the final step