Seeding

Seeds control random map generation, agent placement, and target assignment. Setting a seed makes experiments fully reproducible.

Reproducible Environments

Pass seed to GridConfig for deterministic behavior. Two environments with the same config produce identical episodes:

from pogema import pogema_v0, GridConfig

cfg = GridConfig(seed=42, num_agents=4, size=10, density=0.3)

env1 = pogema_v0(cfg)
obs1, _ = env1.reset()

env2 = pogema_v0(cfg)
obs2, _ = env2.reset()

assert all((o1 == o2).all() for o1, o2 in zip(obs1, obs2))

The seed determines obstacle layout, agent start positions, and target positions.

Different Seeds — Different Environments

Each seed produces a unique obstacle layout, agent placement, and target assignment. Here are three environments generated with different seeds on the same grid size:

seed=42seed=7seed=256

from pogema import pogema_v0, GridConfig, BatchAStarAgent

env = pogema_v0(GridConfig(
    seed=42, num_agents=4, size=8, density=0.3,
    observation_type='POMAPF', max_episode_steps=64,
))
env.enable_animation()
agent = BatchAStarAgent()
obs, info = env.reset()

while True:
    actions = agent.act(obs)
    obs, reward, terminated, truncated, info = env.step(actions)
    if all(terminated) or all(truncated):
        break

agent.reset_states()

from pogema import pogema_v0, GridConfig, BatchAStarAgent

env = pogema_v0(GridConfig(
    seed=7, num_agents=4, size=8, density=0.3,
    observation_type='POMAPF', max_episode_steps=64,
))
env.enable_animation()
agent = BatchAStarAgent()
obs, info = env.reset()

while True:
    actions = agent.act(obs)
    obs, reward, terminated, truncated, info = env.step(actions)
    if all(terminated) or all(truncated):
        break

agent.reset_states()

from pogema import pogema_v0, GridConfig, BatchAStarAgent

env = pogema_v0(GridConfig(
    seed=256, num_agents=4, size=8, density=0.3,
    observation_type='POMAPF', max_episode_steps=64,
))
env.enable_animation()
agent = BatchAStarAgent()
obs, info = env.reset()

while True:
    actions = agent.act(obs)
    obs, reward, terminated, truncated, info = env.step(actions)
    if all(terminated) or all(truncated):
        break

agent.reset_states()

Notice how each seed produces a completely different map, start positions, and targets.

Reseeding on Reset

Use env.reset(seed=...) to change the seed at runtime (standard Gymnasium API). Each new seed produces a new deterministic environment:

from pogema import pogema_v0, GridConfig

env = pogema_v0(GridConfig(num_agents=2))

obs_a, _ = env.reset(seed=100)
obs_b, _ = env.reset(seed=200)  # different environment
obs_c, _ = env.reset(seed=100)  # same as obs_a

assert all((o1 == o2).all() for o1, o2 in zip(obs_a, obs_c))

Random Mode

When seed=None (default), a different random environment is generated on each reset. This is useful for training:

from pogema import pogema_v0, GridConfig

env = pogema_v0(GridConfig(seed=None, num_agents=4))

obs1, _ = env.reset()
obs2, _ = env.reset()  # likely different map and positions

Seeds with Custom Maps

When a map is provided, obstacles are fixed regardless of the seed. The seed still controls agent and target placement on free cells:

seed=1seed=2

from pogema import pogema_v0, GridConfig, BatchAStarAgent

env = pogema_v0(GridConfig(
    map=grid, num_agents=4, seed=1,
    observation_type='POMAPF', max_episode_steps=64,
))
env.enable_animation()
agent = BatchAStarAgent()
obs, info = env.reset()

while True:
    actions = agent.act(obs)
    obs, reward, terminated, truncated, info = env.step(actions)
    if all(terminated) or all(truncated):
        break

agent.reset_states()

from pogema import pogema_v0, GridConfig, BatchAStarAgent

env = pogema_v0(GridConfig(
    map=grid, num_agents=4, seed=2,
    observation_type='POMAPF', max_episode_steps=64,
))
env.enable_animation()
agent = BatchAStarAgent()
obs, info = env.reset()

while True:
    actions = agent.act(obs)
    obs, reward, terminated, truncated, info = env.step(actions)
    if all(terminated) or all(truncated):
        break

agent.reset_states()

The obstacle layout (the # wall) stays the same — only agent and target positions change between seeds.

When positions are fully specified via named agents (a/A, b/B, ...), the seed has no effect — everything is deterministic by definition:

from pogema import pogema_v0, GridConfig, BatchAStarAgent

env = pogema_v0(GridConfig(
    map=grid,
    observation_type='POMAPF', max_episode_steps=64,
))
env.enable_animation()
agent = BatchAStarAgent()
obs, info = env.reset()

while True:
    actions = agent.act(obs)
    obs, reward, terminated, truncated, info = env.step(actions)
    if all(terminated) or all(truncated):
        break

agent.reset_states()

Here agents a/b navigate to their targets A/B on a fully specified map — no randomness involved.

Seeds in Lifelong Mode

In lifelong mode (on_target='restart'), the seed controls initial placement and the sequence of new targets. A per-agent RNG is derived from the main seed, so each agent gets a reproducible but independent stream of targets:

from pogema import pogema_v0, GridConfig, BatchAStarAgent

env = pogema_v0(GridConfig(
    on_target='restart', seed=42, num_agents=4, size=8,
    density=0.3, observation_type='POMAPF', max_episode_steps=64,
))
env.enable_animation()
agent = BatchAStarAgent()
obs, info = env.reset()

while True:
    actions = agent.act(obs)
    obs, reward, terminated, truncated, info = env.step(actions)
    if all(terminated) or all(truncated):
        break

agent.reset_states()

The seed ensures the same sequence of regenerated targets on every run.