Osrs env upstream#500
Open
valtterivalo wants to merge 60 commits intoPufferAI:4.0from
Open
Conversation
Author
|
added inferno, WIP ofc |
a281def to
9a5249c
Compare
b941b46 to
caa7375
Compare
Author
|
quite a bit of refactoring in the above commit, trying to make adding encounters as easy as possible for other devs. some straight up inferno work in there too |
shared sim headers in src/osrs/, per-encounter bindings in ocean/osrs_*/, visual viewer + asset export pipeline in ocean/osrs/. build.sh wired up with -Isrc/osrs for osrs_* envs.
visual builds (--local/--fast/--web) auto-download pre-exported assets from GitHub releases. training builds need no assets. OSRS visual binary uses shared ocean/osrs/osrs_visual.c source.
bf5bcdc to
4aee0ba
Compare
env_name must match build dir name (osrs_inferno not puffer_osrs_inferno) or pufferl.py _resolve_backend assertion fails. test build instructions updated for src/osrs/ header location. removed legacy export_inferno_npcs.py (superseded by tools/export_encounter_npcs.py) and standalone Makefile (build.sh --local handles visual builds).
c_render now calls pvp_render in all three envs so puffer eval --render-mode raylib actually renders. forward-declare pvp_render in osrs_pvp_api.h so bindings can opt into render by including osrs_render.h. fix PvP binding to use current pvp_runtime/ocean_io struct layout (was using flat fields).
render.h and encounter headers define static helpers only called by the standalone viewer. wrap render/encounter includes in GCC diagnostic push/pop to silence unused-function noise in the binding compile.
osrs_visual.c loads from relative 'data/' path. binary runs from repo root where build.sh outputs it, so data/ needs to be there too.
keep env-specific code under ocean/ alongside the visual viewer and scripts. build.sh points -Iocean/osrs at them. also always download visual assets, not just for --local builds, since puffer eval renders through the same _C.so and needs them present.
db9c8b6 to
5d96c9a
Compare
- c_render now loads encounter terrain/objects/models/anims on first call (same as standalone viewer did in run_visual), so puffer eval renders the 3D world instead of a black screen - middle-mouse pan was inverted (mouse right → world left); fixed sign - prayer/spell sprite filenames now numeric (matches what osrs_gui.h loads). also point build.sh at osrs-assets-v3 which includes missing hitmarks and headicons_prayer sprites
c_render now sleeps between frames to match ticks_per_second so eval runs at OSRS speed instead of blazing through. 9/0 keys adjust the speed during viewing (uses the same existing mechanism standalone viewer already had). also fix: clicking ground while a spell is selected now walks to that tile AND cancels targeting, matching OSRS behaviour. previously it just silently cancelled without walking.
- window now 765x503 matching OSRS fixed-client layout, side panel 190px. tile grid decoupled from pixel size (was tied together, forcing wide viewport). inventory cells scaled from 76x65 down to OSRS-native 42x36. - inferno ammo slot items (dragon darts, dragon arrows) no longer appear as swappable inventory stacks. in real OSRS darts live in the blowpipe and arrows in dizana's quiver, not as separate inventory items. - add bastion potion and stamina potion inventory slot types with proper OSRS item IDs (22461/22464/22467/22470 and 12625/12627/12629/12631). inferno was previously rendering these as super combat + ranging pot because the generic slot types were reused.
wraps ExportItemSprites.java with auto-download of runelite-cache and transitive deps from Maven Central / RuneLite repo. first run downloads ~13MB of jars into ocean/osrs/build/item_exporter/deps/, subsequent runs reuse them. requires java 11+ and curl. export_all.sh now calls it for the default loadout items (inferno pots, darts, arrows, weapons). skips gracefully if javac not installed. also bump osrs-assets release to v4 which includes the dragon dart, bastion potion, and stamina potion item sprites.
add yellow border on the selected spell cell while in spell-target cursor mode (clicked a spell, waiting for enemy click). record the exact GuiSpellIdx clicked so highlight shows the right cell (was only storing ATTACK_ICE/ATTACK_BLOOD family). also fix ancient spellbook row ordering: real OSRS goes Rush -> Blitz -> Burst -> Barrage left to right (ascending level), and Blood above Ice. was previously Rush/Burst/Blitz/Barrage which is wrong.
set PLAY_REPLAY=path.replay and the first env (env 0) loads the replay, seeds the encounter RNG to match the recording, and overrides the policy's actions with the recorded ones every tick. other envs run the policy as normal. combine with RECORD_REPLAY in training to first save a best episode, then PLAY_REPLAY in eval to watch it back.
prayer book: now 29 entries in 5x6 grid matching real OSRS display order (by level, Prayerbook widget layout). sprite IDs corrected — each prayer now shows its actual sprite (SharpEye was showing as RockSkin etc). added Mystic Lore, Steel Skin, Ultimate Strength, Incredible Reflexes that were missing from the 25-entry grid. ancient spellbook: now the full 4x4 combat grid + vengeance, sorted by level (Rush/Burst/Blitz/Barrage rows, Smoke/Shadow/Blood/Ice cols). non-castable spells (Smoke/Shadow) render with the greyed 'off' sprite for authenticity; clicks on them are ignored. only Ice/Blood/Vengeance enter the targeting flow. attack styles: weapon-specific names — bow/blowpipe/zcb shows 3 buttons (Accurate/Rapid/Longrange), scythe shows 3 (Chop/Jab/Block), godsword shows Chop/Slash/Smash/Block, claws shows Chop/Slash/Lunge/Block, staff shows Bash/Pound/Focus/Block. unknown weapons fall back to the classic Accurate/Aggressive/Controlled/Defensive.
export_sprites_modern.py now writes all 32 ancient spell sprites (smoke, shadow, blood, ice × rush/burst/blitz/barrage × on/off) so the full ancient spellbook renders correctly.
…indow three fixes for puffer eval that the standalone viewer didn't have: 1. c_render bootstrap: lazy-init in c_render now mirrors the standalone's run_visual sequence after first pvp_render — load equipment models + anims, init overlay models, load encounter terrain/objects/cmap/anims, call render_populate_entities to set arena bounds, then re-center the camera on the arena and seed sub-tile positions for entities. without this the camera stayed at default wilderness coords and the 3D world rendered as a black void even though assets loaded. 2. tick pacing: c_step now sleeps to match ticks_per_second (default 1.667 = OSRS native 600ms/tick) so eval doesn't blaze through episodes at Python rollout speed. 9/0 keys still adjust the rate live via render_handle_input. only paces when rendering is active (rc != NULL). 3. window 1.5x: 765x503 → 1148x755 with proportional 285px panel and 63x54 inventory cells. same OSRS aspect ratio, comfortable on modern monitors.
remove separate osrs_inferno_zuk.ini alias — use --env.start-wave 69 instead. update window size comments to reflect the 1.5x scaling.
pathfinding generation-counter fallback now checks BFS_VISITED before reading BFS_COST. eval c_render calls render_post_tick every frame so spawned NPCs get visual positions updated (fixes invisible enemies).
3cbc200 to
1f8abba
Compare
- exclude HEALER_JAD from generic NPC-damages-NPC block so its heal branch is reachable. before this, jad healers did 0 damage (magic stats are 0) AND never healed, making jad waves artificially easy. - HEALER_ZUK stun_on_spawn 2 -> 1 to match JalMejJak.ts SPAWN_DELAY - INF_MAX_PENDING_SPARKS 16 -> 32 so 4-healer overlapping volleys (up to 24 sparks in flight) don't silently drop - add hp_restored_this_tick tracked from healer landings + mager resurrection. reward is now 0.01 * max(0, damage - restored) so healers canceling the agent's damage yield zero signal, same shape for zuk healers, jad healers, and mager resurrection - log.hp_restored surfaces total restoration per episode on dashboard
FightStyle was only the 4 melee stances; ranged/magic stances lived as raw
int style_bonus passed around per-call, losing the why (is 0 rapid or
autocast? one needs -1 attack_speed, the other doesn't). each caller
re-derived the mapping and most forgot the speed piece — which is why
blowpipe was firing every 3 ticks instead of 2 and losing its DPS edge.
- extend FightStyle with RAPID, LONGRANGE, AUTOCAST, DEFENSIVE_AUTOCAST
- add osrs_stance_{att,str,def}_bonus + osrs_stance_{speed,range}_mod helpers
in osrs_combat.h as single source of truth per osrs wiki "Combat Options"
- encounter_compute_loadout_stats takes FightStyle instead of int style_bonus,
derives att/str/def bonuses and applies rapid/longrange speed+range mods
- EncounterLoadoutStats stores the stance so encounter_update_loadout_level
re-derives consistently after brew drain / potion boost
- pvp calculate_effective_{attack,strength,defence} collapse their open-coded
switches to use the shared helpers
- inferno: mage AUTOCAST, tbow/bp RAPID (restores BP 2t, tbow 5t). drop the
local -1 patches
- zulrah: mage ACCURATE (trident/Eye of Ayak are powered staves, +3 magic
per wiki), range RAPID
- fix test assertions that encoded the old behavior where style_bonus=3 was
double-counted on both att and str (aggressive wrongly gave +3 att,
accurate wrongly gave +3 str). real osrs separates them.
ref: osrs wiki "Combat Options", .refs/osrs-dps-calc Equipment.ts:245-270,
.refs/osrs-sdk Blowpipe.ts:79-84, TwistedBow.ts:70-75
tests: combat_math 155/155, item_effects 164/164, player_combat 41/41,
damage/consumables/interaction/inventory/special_attacks/bolt_procs
all pass. test_collision build failure pre-existed (unrelated).
inf_get_oracle_prayer was picking the correct protection prayer each tick and overwriting s->player.prayer regardless of the agent's INF_HEAD_PRAYER action, collapsing the prayer-switching challenge that's core to Inferno. source: joseph's f7072fc "Wave 33 in 2:17" benchmark scaffold, merged in via 21f54fa and never removed. pretick now applies actions[INF_HEAD_PRAYER] via the shared encounter_apply_prayer_action() helper, so the agent's choice actually takes effect. offensive +24 flat drain left in place for now — that's separate scaffolding to address next. also cleaned up stale "Pre-determine style for oracle" comments on the blob scan paths (behavior unchanged: blob reads player prayer at scan, commits to opposite style for fire) and removed the dead /* removed oracle prayer obs */ marker.
rewrites the prayer action space across all three OSRS envs to match real OSRS click behavior. agent controls both overhead and offensive prayers via toggle actions, enabling prayer flicking for supply-efficient play. shared (osrs_encounter.h): - new 4/6-dim toggle encoding (ENCOUNTER_OVERHEAD_* / _OFFENSIVE_*) — no "off" action, click-active-prayer-to-disable like the real UI. NO_CHANGE for no-ops; toggle helpers return 1 on OFF→ON for drain skip. - encounter_drain_all_prayers() handles overhead + offensive in one call with per-slot activation-tick skip (wiki: "the game does not drain prayer for prayers on the tick they are activated" — what lets 1-tick flicking burn zero pp). pp=0 auto-clears both slots. - encounter_offensive_prayer_mults() / encounter_offensive_magic_dmg_mult() as single-source-of-truth prayer multiplier helpers. - EncounterPrayer enum removed; compute_loadout_stats + update_loadout_level now take OffensivePrayer. update is called after every prayer toggle so ls->eff_level / max_hit track runtime state. player struct (osrs_types.h): - prayer_just_activated, offensive_prayer_just_activated bits for the activation-tick drain skip mechanic. inferno: - new INF_HEAD_OFFENSIVE action head (4 dim). 9 heads total. - pretick: apply overhead + offensive toggles, recompute loadouts on prayer change, drain both slots. flat +24 scaffolding drain removed. - offensive prayer one-hot added to obs (3 new features). zulrah: - new ZUL_HEAD_OFFENSIVE action head (4 dim). 7 heads total. - zul_process_prayer takes both head actions, recomputes mage/range_stats on offensive toggle. - drain now charges both slots (previously charged 0 for offensive). - offensive prayer one-hot added to obs (3 new features). - heuristic policy updated to emit toggle actions. pvp: - new HEAD_OFFENSIVE action head (4 dim). 8 heads total. - HEAD_OVERHEAD now uses toggle encoding (6 dim). - auto-offensive-prayer assignment based on loadout/attack style removed. - opponent AI funnels overhead emissions through opp_emit_prayer() which maps OverheadAction → toggle given current state. - observation mask updated for both heads. tests: - test_combat_math + test_item_effects migrated to OffensivePrayer enum. - new test_prayer_flicking.c covers toggle semantics, activation-tick skip, 1-tick flick burning 0 pp, pp=0 auto-clear, lazy flick cost. all 25 cases pass. all existing test suites (combat_math, item_effects, bolt_procs, consumables, damage, interaction, inventory, player_combat, special_attacks) continue to pass.
three bindings had hardcoded ACT_SIZES arrays that didn't track the new 9-/7-/8-head layouts from the prayer-toggle migration: - osrs_inferno: add ENCOUNTER_OFFENSIVE_DIM slot, fix PRAYER dim to ENCOUNTER_OVERHEAD_DIM_PVE (4 instead of 5), fix TARGET dim to INF_OBS_NPCS+1 (matches header) instead of INF_MAX_NPCS+1. - osrs_zulrah: add ZUL_OFFENSIVE_DIM slot. - osrs_pvp: add OFFENSIVE_DIM slot. without this the vecenv get_act_sizes() array underreads and metal_pufferlib.mm's mask-width sum overflows negatively, corrupting input_size (the encoder feature count).
under the new toggle-semantic overhead encoding, opp_emit_prayer silently dropped OVERHEAD_NONE intents, so scripted opponents that rolled "none" never deactivated their overhead once one was set. now maps NONE -> toggle matching the currently-active prayer.
Y rotation had both sin_y terms sign-inverted vs Model.java:1074-1080 in the deobbed runelite client. Z and X rotations matched; only yaw was wrong. Symptom: bow-draw arms fling outward instead of together, Zuk's attack plays with arms stretched wide behind back. Fixed in both non-interleaved (anim_apply_frame) and interleaved (anim_apply_single_transform) paths.
encounter_drain_all_prayers combined two early-exits (total<=0 and current_prayer<=0), so when pp entered the tick at 0 the prayer enum stayed set indefinitely. triggered by: agent toggles on a prayer at pp=0 (apply_*_action doesn't gate on pp), activation-tick skip zeros this tick's drain, next tick drain bails on pp<=0. same path hits PvP when smite drains defender pp to 0 while overhead is active. split the checks: pp<=0 now clears both overhead and offensive slots before returning; total<=0 early return stays after. shared across inferno, zulrah, pvp — all three encounter paths fixed. tests: 30/30 prayer flicking (25 existing + 5 regression), 155/155 combat, 41/41 player combat.
drain can clear offensive_prayer at pp<=0 (auto-clear path), but the recompute was only wired to apply-side changes. consequence: player runs out of prayer mid-fight, enum clears to NONE, but cached max_hit and eff_level still have piety/rigour/augury multipliers baked in — subsequent attacks use stale buffed stats. inferno: move prev_offensive capture before apply, recompute after drain. zulrah: drain lives in a later tick phase than zul_process_prayer; add a second prev_offensive+recompute window around the drain call. pvp unaffected — combat math reads p->offensive_prayer live per attack, no cache to stale.
encounter_npc_step_toward short-circuited with early-return when dist<=attack_range, so a ranged NPC in range but with LOS blocked by a pillar would stand still instead of walking around to find a shot. ref: InfernoTrainer Unit.ts:383 'canMove = !hasLOS' — movement has no range or LOS check; the reference stops a mob only once LOS is clear. remove the range-stop from the helper. inferno caller already gates ranged NPCs on LOS (encounter_inferno.h:1155). melee NPCs adjacent to the player naturally no-op because the player tile is blocked. add test_npc_movement.c: 14 cases locking in the no-range-stop behavior and verifying greedy diagonal/cardinal cascade still works.
…tagger three wave-67/68 bugs, all from the jad waves falling through the generic random-spawn path in INF_WAVES: 1. wave 67 single jad spawned at a shuffled INF_SPAWN_POS slot instead of the fixed position. ref InfernoRegion.ts:441-451: jad at (23, 27) with player at (18, 25). 2. wave 68 triple jads all used stats->stun_on_spawn = 0, so they fired first attack on the same tick — impossible to prayer-flick. ref: stunTimers = [1, 4, 7].sort(random) — each jad gets a unique stagger. implement with Fisher-Yates on [1, 4, 7]. 3. pillars stayed alive across the 66→67 transition. ref: InfernoRegion.ts:359 adds pillars only when wave < 67 || >= 70, so jad and zuk waves are pillar-free. clear on wave spawn for any wave >= 66 so mid-episode progression also collapses them. coord transform: pillar positions confirm ours_y = 57 - ref_y across all three pillars. applied same transform to jad + player positions.
ice barrage (req magic 94) and blood barrage (req 92) were castable at any magic level. sara brew drains magic to ~88, bats chip it down further — agent could cast a spell the real game wouldn't even offer. gate both in the action mask (spell greyed out below req) and at cast time (attack skipped, attack_timer unchanged so agent retries). also fix jad healer stun_on_spawn: overlay had 0, ref YtHurKot.ts:50 sets stunned=1. one tick of stagger before the first heal lands.
JalTokJad in InfernoTrainer wraps the damage + prayer check inside a DelayedAction(JAD_PROJECTILE_DELAY=3). animation fires at attack tick T but the prayer match is evaluated at T+3, then super.attack() registers the projectile with reduceDelay=3 so it lands at max(T+3, T+normal_delay). this is the core difficulty knob — agent must commit to the right prayer within 3 ticks of seeing the anim, regardless of distance. our previous behavior checked prayer at projectile LAND (T+hit_delay), which scales 1-3 ticks by distance. close-range jad gave the agent only 1 tick to flick, far-range gave up to 6 — neither matches ref. add prayer_check_delay field to EncounterPendingHit: the deferred check fires when this counter reaches 0, locking damage (zero if prayer matched) independent of flight time. jad queues hits with prayer_check_delay=3 and ticks_remaining=max(3, hit_delay) so the projectile never lands before the check resolves. all other pending- hit insertions explicitly set delay=0 (pre-checked at attack time). shared EncounterPendingHit struct used by inferno only for player hits — zulrah/pvp unaffected. tests green (combat 155, prayer 30, player combat 41, npc movement 14).
you were right that jad's hit delay doesn't vary with range in practice. reference JalTokJad.ts uses reduceDelay=3 and the Projectile clamps remainingDelay>=1, so the effective land time is T + max(4, formula(dist)). for every realistic fight distance the formula gives ≤ 4, so it collapses to exactly T+4 regardless of where the player is standing. our previous code was max(3, hit_delay), which produced variable (3-6 tick) land times. swap to a flat 4 to match the reference. other inferno mobs (mager/ranger/bat/blob) still use distance-based delay per SDK — only jad overrides. jad attack range is 50 (effectively the whole arena, ref verified), so there's never a case where the player is out of range and jad needs to walk. if LOS is somehow blocked — which can't happen in waves 67-69 since we clear pillars — inf_npc_move correctly walks jad toward the player via encounter_npc_step_toward.
old reward was r = damage_dealt - hp_restored per tick, clamped at 0. agent could farm by attacking on ticks where no heal landed (net positive), then losing no reward on separate heal ticks. net episode reward drifted up without real progress. switch to irreversible-progress: each npc tracks min_hp_reached; each tick's reward is the sum of new hp dropped below that floor. heals raise current hp without touching the floor, so re-damaging up to the old floor pays 0. full kill = full max_hp worth of reward. resurrected mobs inherit min_hp=0 (already paid once) — re-kill gives nothing, so agent learns to kill the mager first. zuk phase override kept: while zuk healers are alive, progress is restricted to damage_zuk_healers_this_tick so agent prioritizes them over zuk. once they're down, full progress flows. per-npc min tracking, zero optimizations. accrual runs after all damage/heal has resolved, right before reward compute. tests green (combat 155, prayer 30, npc movement 14).
tagging (one hit) switches the healer's aggro from jad to player — that's all you need to stop the heal chain. killing the healer is pure wasted dps. if healer damage contributed to min-hp progress the agent would learn to finish them off, which is strictly worse play. skip INF_NPC_HEALER_JAD in the accrual loop. the tag still pays off indirectly: tagged healers stop healing jad, so jad's min-hp floor keeps dropping from player damage instead of getting stuck while heals undo progress.
zuk hit still lands at T+4 (projectile still flies visually) but eating between fire and land can't save you — matches in-game rule that zuk is the one inferno mob you can't tick-eat. add hp_at_fire to EncounterPendingHit. zuk stamps the player's HP at fire time when queueing. at resolve, if current HP is above the checkpoint (agent ate while the projectile was airborne), clamp back down to hp_at_fire first — that undoes the heal — then apply damage as normal. the undone HP is booked to damage_received so reward signals it. other damage sources (bat bites, etc.) still bring HP below the checkpoint, since we only clamp as a ceiling. all other pending-hit insertion sites explicitly zero hp_at_fire to keep the clamp disabled.
the hp_at_fire clamp was a speculative guess at real-game mechanics. going back to the simple model: zuk rolls 0..max_hit, queues a pending hit at T that lands unchanged at T+4. eating between fire and land is allowed to save you on non-lethal rolls — if this turns out to matter we can revisit with a better-grounded mechanic. drops the hp_at_fire field from EncounterPendingHit, the clamp logic in encounter_resolve_player_pending_hits, and the per-site set/reset in the npc attack paths.
encounter_damage_player clamps hp at 0 on overkill (so the negative value isn't exposed to downstream code). that's fine on its own, but the tick order is resolve npc damage to player inf_tick_player (player can eat a brew here) end-of-tick death check if an npc hit lands for more than the player's current hp, the clamp zeroes hp then the brew in inf_tick_player lifts it back above zero and the end-of-tick death check sees a live player. observed in training: zuk 140 lands on 115 hp → clamp to 0 → brew +16 → alive at 16. explains 'tick-eating zuk' agent behavior (which shouldn't be possible at all under our ordering). fix: death check directly after the damage-resolve step, before the player action phase. a corpse can't eat. episode ends cleanly.
…nger visual timing zuk healer aoe was queuing 2 of its 3 sparks with y=14+rand(0..3) as if our coord system matched the InfernoTrainer reference. our y is flipped (ours_y = 57 - ref_y) so those sparks were landing on the opposite side of the arena — visible as projectiles shooting to narnia instead of the ground near the player. translate to y=40..43. healer projectile model was reusing INFERNO_ZEK_PROJECTILE (mager's orange ball). reference uses tekton_meteor.glb — not in our cache exports. switch to TZHAAR_FIRE_SPIT_TRAVEL (GFX 448) as the closest meteor-shaped flight model and the most distinct from mager/zuk projectiles. proper fix would export a tekton meteor spotanim into the manifest. mager projectile was ~3 ticks too slow visually. reference (JalZek MagicWeapon) has visualDelayTicks=2 and visualHitEarlyTicks=-1: invisible for 2 ticks, then arrives 1 tick after the hit lands. set start_delay to 2 ticks and duration to (hit_delay - 1) ticks to match. ranger: same structural fix for visualDelayTicks=3 (JalXil RangedWeapon). flagged but not fixed: ranger's reduceDelay=-2 (hit delay +2 ticks) is still missing sim-side, so damage lands 2 ticks earlier than reference.
regression from 7d72438 (range-stop removed from the shared helper): the movement gate in inf_npc_move only fired when aggro_target was the player (aggro_target < 0) and called inf_npc_has_los(s, idx) which is hardcoded to check LOS to the player. for zuk-wave mager/ranger with aggro_target = shield_idx, the gate was skipped entirely — they walked straight into melee of the shield. fix: select the target first, then check LOS+range against that target. use npc_has_line_of_sight directly with (tx, ty) from target selection, which already resolves to player, aggroed NPC, or pillar. drop the aggro_target < 0 condition since the gate is now target-aware. pillar-stuck behavior preserved: when LOS to the target is blocked, the gate doesn't trigger and the greedy helper proceeds. melee NPCs still walk until player-tile blocking stops them. add 4 LOS tests to test_npc_movement.c locking in the generic-target contract of npc_has_line_of_sight: in-range clear (non-player), blocked by pillar, out-of-range, and player-target control. 18/18 passing.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
three (currently) OSRS environments (or encounters as i call them inside this broader OSRS env) in pure C. the env itself and its backend has a ton of OSRS logic already that supports building new encounters pretty easily, but abstractions for this could be way better. we'll see where that goes naturally after we start implementing more encounters. ofc i'd rather have the entire damn playable game as a single functional environment but let's be honest about the amount of work involved.
osrs_pvp: NH PvP with 24 scripted opponents (trivial to onetick+action-reading), PFSP support, 7-head discrete action space (39 logits), 373-dim obs. full combat: gear switching, prayers, eating, specs, movement, action masking. most complete encounter here, but some bugs might remain (or have been introduced lately from abstracting things away to env level).
osrs_zulrah (WIP): money snek solo boss encounter, trying to be as faithful as possible to the real deal but some work remains with more ambiguous mechanics. 3 gear tiers with all special effects etc covered (tbow scaling, sang heal, crystal set bonus, eye of ayak magic drain, confliction double-roll, thralls). basic reward shaping for training, i couldn't get budget setup to win without shaping. bis gear obviously shits on the snake with no problems.
osrs_inferno (WIP): the full inferno challenge encounter with all the waves and the final boss. current dev work has still been basic sim rigor, also prompted me to abstract away a shit ton of stuff so that future encounters are easier to implement. assets are mostly good but some animations and projectiles need to be wired up. NPC mechanics are coming together. RL training isn't yet stable and i didn't run any sweeps yet. probably a couple more days of basic sim work and then back to training models!
binary data assets (sprites, models, collision maps) gitignored, regenerated from OSRS cache via included python scripts that are pretty easy to read and understand. expect there to be quite a bit of coordinate system flipping and offset trial and error when implementing new areas and encounters, but that's part of the deal until we figure out something rigorous for that.