ocean: add space_invaders env#535
Draft
Infatoshi wants to merge 1 commit intoPufferAI:4.0from
Draft
Conversation
New discrete-action Atari-style env, following the three-file ocean convention (env.h / binding.c / env.c + config/ocean/*.ini). Mechanics: 5x11 invader grid that moves side-to-side and drops on edge contact, speeds up as ranks thin. Player ship with one bullet on screen and up to three enemy bullets. 4 actions: noop, left, right, fire. Episode ends on lives=0 or invaders reaching the player line; +10 bonus on clear. Performance: player-bullet-vs-formation collision uses cached integer grid bounds plus bullet-x column narrowing, so the hot path is ~1-2 AABB checks rather than 55 per frame. Measured 186 M env-steps/s on a 9950X3D at the config defaults (N=32768, T=16, B=32). Training: c_reset randomizes the player's initial x so PPO doesn't collapse to "stay centered and fire"; reaches score ~950, perf ~0.78 in 70 s of training at 10 M SPS. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a new discrete-action ocean env, `space_invaders`, following the three-file convention used by pong/breakout/asteroids.
4 files, 611 lines total. No changes outside the new env.
Game
5x11 invader grid that moves side-to-side and drops on edge contact, speeds up as ranks thin. Player ship with one bullet on screen, up to three enemy bullets. 4 actions: noop / left / right / fire. Episode ends on lives=0 or invaders reaching the player line; +10 bonus on clear. Reward shaped as `+pts/10` per kill, `-1` per life lost.
Performance
The player-bullet-vs-formation collision is the hot path. A naive implementation scans all 55 invader cells per frame. This impl caches the grid's min/max row and column (updated incrementally on invader death) and additionally narrows the column loop to the 1-2 columns the bullet x actually overlaps — so the hot path is ~1-2 AABB checks rather than 55. The `invaded` check (formation-reached-player) is also O(1) against the cache.
Measured throughput on a 9950X3D (16 physical / 32 SMT):
`num_buffers` scaling matters: optimal `num_buffers ≈ total_agents / 1024`.
Training sanity check
At config defaults:
One design note: `c_reset` randomizes the player's initial x. Without this, starting always in the center makes "stay still and fire" a local optimum that PPO gets stuck in — the policy converges to exactly the always-fire numbers above and never learns to move.
Known follow-ups (happy to do either in this PR or as a separate one)
Let me know which (if either) you want bundled into this PR before merge.
Unrelated issue flagged
While developing this I hit #534 (BF16 `load_weights` produces degenerate eval). That's independent of this env — any env trained in BF16 hits it. Worked around locally with `--float`.
🤖 Generated with Claude Code