Skip to content

ocean: add space_invaders env#535

Draft
Infatoshi wants to merge 1 commit intoPufferAI:4.0from
Infatoshi:ocean-space-invaders
Draft

ocean: add space_invaders env#535
Infatoshi wants to merge 1 commit intoPufferAI:4.0from
Infatoshi:ocean-space-invaders

Conversation

@Infatoshi
Copy link
Copy Markdown
Contributor

Summary

Adds a new discrete-action ocean env, `space_invaders`, following the three-file convention used by pong/breakout/asteroids.

  • `ocean/space_invaders/space_invaders.h` — game logic + rendering
  • `ocean/space_invaders/binding.c` — pufferlib glue (42 lines)
  • `ocean/space_invaders/space_invaders.c` — standalone keyboard demo
  • `config/ocean/space_invaders.ini` — hyperparameters + vec config

4 files, 611 lines total. No changes outside the new env.

Game

5x11 invader grid that moves side-to-side and drops on edge contact, speeds up as ranks thin. Player ship with one bullet on screen, up to three enemy bullets. 4 actions: noop / left / right / fire. Episode ends on lives=0 or invaders reaching the player line; +10 bonus on clear. Reward shaped as `+pts/10` per kill, `-1` per life lost.

Performance

The player-bullet-vs-formation collision is the hot path. A naive implementation scans all 55 invader cells per frame. This impl caches the grid's min/max row and column (updated incrementally on invader death) and additionally narrows the column loop to the 1-2 columns the bullet x actually overlaps — so the hot path is ~1-2 AABB checks rather than 55. The `invaded` check (formation-reached-player) is also O(1) against the cache.

Measured throughput on a 9950X3D (16 physical / 32 SMT):

Config env-steps/s frames/s
N=32768, T=16, B=32 (config default) 186 M 746 M
Single-threaded peak 12.5 M 50 M

`num_buffers` scaling matters: optimal `num_buffers ≈ total_agents / 1024`.

Training sanity check

At config defaults:

  • 70 s training (400M steps) → `score ≈ 950`, `perf ≈ 0.78` (clearing 78% of invaders per episode), `episode_return ≈ 95`
  • Always-fire heuristic baseline: score 342, perf 0.42, return 31

One design note: `c_reset` randomizes the player's initial x. Without this, starting always in the center makes "stay still and fire" a local optimum that PPO gets stuck in — the policy converges to exactly the always-fire numbers above and never learns to move.

Known follow-ups (happy to do either in this PR or as a separate one)

  1. Hyperparameters in the ini are sensible defaults, not swept. Breakout ships with proper swept values; mine are ad-hoc. A `puffer sweep space_invaders` run would produce a tuned set.
  2. No demo weights shipped. `breakout.c` plays a trained policy via puffernet; my `space_invaders.c` is keyboard-only. Would need to export weights and add the puffernet forward call to the demo loop.

Let me know which (if either) you want bundled into this PR before merge.

Unrelated issue flagged

While developing this I hit #534 (BF16 `load_weights` produces degenerate eval). That's independent of this env — any env trained in BF16 hits it. Worked around locally with `--float`.

🤖 Generated with Claude Code

New discrete-action Atari-style env, following the three-file ocean
convention (env.h / binding.c / env.c + config/ocean/*.ini).

Mechanics: 5x11 invader grid that moves side-to-side and drops on edge
contact, speeds up as ranks thin. Player ship with one bullet on screen
and up to three enemy bullets. 4 actions: noop, left, right, fire.
Episode ends on lives=0 or invaders reaching the player line; +10 bonus
on clear.

Performance: player-bullet-vs-formation collision uses cached integer
grid bounds plus bullet-x column narrowing, so the hot path is ~1-2
AABB checks rather than 55 per frame. Measured 186 M env-steps/s on a
9950X3D at the config defaults (N=32768, T=16, B=32).

Training: c_reset randomizes the player's initial x so PPO doesn't
collapse to "stay centered and fire"; reaches score ~950, perf ~0.78
in 70 s of training at 10 M SPS.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant