A* Baseline Policy
POGEMA includes a built-in A* pathfinding baseline for evaluation and benchmarking. The approach is fully decentralized and works under partial observability by reconstructing the map in memory.
BatchAStarAgent (Multi-Agent)
from pogema import pogema_v0, GridConfig, BatchAStarAgent
env = pogema_v0(GridConfig(
num_agents=4,
size=16,
observation_type='POMAPF', # Required — A* needs coordinates
seed=42,
))
agent = BatchAStarAgent()
obs, info = env.reset()
while True:
actions = agent.act(obs)
obs, reward, terminated, truncated, info = env.step(actions)
if all(terminated) or all(truncated):
break
agent.reset_states() # Call between episodes
print(info[0]['metrics'])
AStarAgent (Single-Agent)
from pogema import AStarAgent
agent = AStarAgent()
action = agent.act(obs_dict) # Single observation dict
agent.clear_state() # Call between episodes
How It Works
- Each agent maintains a
GridMemory— a sparse map built from observations - At each step, the agent updates its memory with newly observed obstacles
- A* search finds the shortest path from current position to target
- If the path is blocked or the agent is stuck, it takes a random action
- Each agent plans independently (no communication)
Observation Requirements
A* agents require observation_type='POMAPF' or 'MAPF', which provides:
'xy': Agent's current position'target_xy': Target position'obstacles': Local obstacle map'agents': Local agent positions