An action-conditioned, video-generating world model trained on the videogame CoinRun (Procgen) gameplay. Adapted from 500M Oasis Minecraft World Model, but scaled down to 64×64 pixel-space generation, without need of interleaved training with VAE.
Given a single prompt frame and a sequence of actions, the model autoregressively generates future frames using DDIM diffusion, acting as a "simulator of the game world" aka World Model.
Videos from the largest trained 58 Model:
episode.1777341548.mp4
With limited compute, I scaled across 5 model sizes from 5M - 58M parameters
- Based on Oasis by Etched & decart.
- Dataset: p-doom/coinrun-dataset on HuggingFace.
- CoinRun environment: OpenAI Procgen.
- Agentic Programming :) via Claude



