ocean: add space_invaders env by Infatoshi · Pull Request #535 · PufferAI/PufferLib

Infatoshi · 2026-04-20T04:42:06Z

Summary

Adds a new discrete-action ocean env, `space_invaders`, following the three-file convention used by pong/breakout/asteroids.

`ocean/space_invaders/space_invaders.h` — game logic + rendering
`ocean/space_invaders/binding.c` — pufferlib glue (42 lines)
`ocean/space_invaders/space_invaders.c` — standalone keyboard demo
`config/ocean/space_invaders.ini` — hyperparameters + vec config

4 files, 611 lines total. No changes outside the new env.

Game

5x11 invader grid that moves side-to-side and drops on edge contact, speeds up as ranks thin. Player ship with one bullet on screen, up to three enemy bullets. 4 actions: noop / left / right / fire. Episode ends on lives=0 or invaders reaching the player line; +10 bonus on clear. Reward shaped as `+pts/10` per kill, `-1` per life lost.

Performance

The player-bullet-vs-formation collision is the hot path. A naive implementation scans all 55 invader cells per frame. This impl caches the grid's min/max row and column (updated incrementally on invader death) and additionally narrows the column loop to the 1-2 columns the bullet x actually overlaps — so the hot path is ~1-2 AABB checks rather than 55. The `invaded` check (formation-reached-player) is also O(1) against the cache.

Measured throughput on a 9950X3D (16 physical / 32 SMT):

Config	env-steps/s	frames/s
N=32768, T=16, B=32 (config default)	186 M	746 M
Single-threaded peak	12.5 M	50 M

`num_buffers` scaling matters: optimal `num_buffers ≈ total_agents / 1024`.

Training sanity check

At config defaults:

70 s training (400M steps) → `score ≈ 950`, `perf ≈ 0.78` (clearing 78% of invaders per episode), `episode_return ≈ 95`
Always-fire heuristic baseline: score 342, perf 0.42, return 31

One design note: `c_reset` randomizes the player's initial x. Without this, starting always in the center makes "stay still and fire" a local optimum that PPO gets stuck in — the policy converges to exactly the always-fire numbers above and never learns to move.

Known follow-ups (happy to do either in this PR or as a separate one)

Hyperparameters in the ini are sensible defaults, not swept. Breakout ships with proper swept values; mine are ad-hoc. A `puffer sweep space_invaders` run would produce a tuned set.
No demo weights shipped. `breakout.c` plays a trained policy via puffernet; my `space_invaders.c` is keyboard-only. Would need to export weights and add the puffernet forward call to the demo loop.

Let me know which (if either) you want bundled into this PR before merge.

Unrelated issue flagged

While developing this I hit #534 (BF16 `load_weights` produces degenerate eval). That's independent of this env — any env trained in BF16 hits it. Worked around locally with `--float`.

🤖 Generated with Claude Code

New discrete-action Atari-style env, following the three-file ocean convention (env.h / binding.c / env.c + config/ocean/*.ini). Mechanics: 5x11 invader grid that moves side-to-side and drops on edge contact, speeds up as ranks thin. Player ship with one bullet on screen and up to three enemy bullets. 4 actions: noop, left, right, fire. Episode ends on lives=0 or invaders reaching the player line; +10 bonus on clear. Performance: player-bullet-vs-formation collision uses cached integer grid bounds plus bullet-x column narrowing, so the hot path is ~1-2 AABB checks rather than 55 per frame. Measured 186 M env-steps/s on a 9950X3D at the config defaults (N=32768, T=16, B=32). Training: c_reset randomizes the player's initial x so PPO doesn't collapse to "stay centered and fire"; reaches score ~950, perf ~0.78 in 70 s of training at 10 M SPS. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ocean: add space_invaders env#535

ocean: add space_invaders env#535
Infatoshi wants to merge 1 commit intoPufferAI:4.0from
Infatoshi:ocean-space-invaders

Infatoshi commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Infatoshi commented Apr 20, 2026

Summary

Game

Performance

Training sanity check

Known follow-ups (happy to do either in this PR or as a separate one)

Unrelated issue flagged

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant