Adding Map Caching and Removing viz dependency#409
Adding Map Caching and Removing viz dependency#409Aditya-Gupta26 wants to merge 12 commits intoemerge/temp_trainingfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR introduces a shared map-cache for the Drive C environment to avoid repeatedly loading large .bin map files, and updates the training/eval render pipeline to remove reliance on the legacy visualize/raylib export path.
Changes:
- Add a process-local shared map cache in the Drive C binding and plumb
map_id+ road observation distance parameters through Drive env creation/reset. - Update training-time inline rendering to use
eval_multi_scenarios_render(multi-view for EGL) instead of exporting.binfiles and invoking the legacy renderer build step. - Add eval-time controls for min/max agents per env (gigaflow) via
drive.ini.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| pufferlib/pufferl.py | Switch inline training render path to eval_multi_scenarios_render, remove ensure_drive_binary, and add agent-per-env controls in eval overrides. |
| pufferlib/ocean/drive/rollout.py | Add a shared Drive rollout loop utility with optional rendering context. |
| pufferlib/ocean/drive/drive.py | Plumb new cache-related kwargs into binding.shared, pass map_id into env init, and rename render APIs to env_id terminology + RenderView. |
| pufferlib/ocean/drive/drive.h | Implement shared map data structures, shared-map init path, and conditional cleanup in c_close. |
| pufferlib/ocean/drive/binding.c | Add process-local shared-map cache, cache-keying, lazy population, and map_id-based initialization fast-path. |
| pufferlib/config/ocean/drive.ini | Add eval config knobs for min_agents_per_env / max_agents_per_env. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| temp_env.init_mode = init_mode; | ||
| temp_env.control_mode = control_mode; | ||
| temp_env.simulation_mode = simulation_mode; | ||
| temp_env.init_steps = init_steps; | ||
| temp_env.num_max_agents = max_agents_per_env; | ||
| temp_env.goal_radius = goal_radius; | ||
| temp_env.agents = shared->template_agents; | ||
| temp_env.num_total_agents = shared->num_total_agents; | ||
| temp_env.grid_map = shared->grid_map; | ||
| set_active_agents(&temp_env); | ||
| int active_count = temp_env.active_agent_count; | ||
| free(temp_env.active_agent_indices); | ||
| free(temp_env.static_agent_indices); | ||
| free(temp_env.expert_static_agent_indices); |
There was a problem hiding this comment.
In REPLAY mode, the active-agent counting path calls set_active_agents(&temp_env) with temp_env.agents pointing directly at shared->template_agents. set_active_agents mutates Agent fields (e.g., active_agent / mark_as_expert), so this permanently mutates the cached template and will affect subsequent env initializations for that map (agent selection, expert flags, etc.). To avoid corrupting the shared cache, run set_active_agents on a temporary copy of the Agent structs (shallow-copy is enough since only scalar fields are mutated), or implement a non-mutating counting routine.
| temp_env.init_mode = init_mode; | |
| temp_env.control_mode = control_mode; | |
| temp_env.simulation_mode = simulation_mode; | |
| temp_env.init_steps = init_steps; | |
| temp_env.num_max_agents = max_agents_per_env; | |
| temp_env.goal_radius = goal_radius; | |
| temp_env.agents = shared->template_agents; | |
| temp_env.num_total_agents = shared->num_total_agents; | |
| temp_env.grid_map = shared->grid_map; | |
| set_active_agents(&temp_env); | |
| int active_count = temp_env.active_agent_count; | |
| free(temp_env.active_agent_indices); | |
| free(temp_env.static_agent_indices); | |
| free(temp_env.expert_static_agent_indices); | |
| Agent *temp_agents = NULL; | |
| temp_env.init_mode = init_mode; | |
| temp_env.control_mode = control_mode; | |
| temp_env.simulation_mode = simulation_mode; | |
| temp_env.init_steps = init_steps; | |
| temp_env.num_max_agents = max_agents_per_env; | |
| temp_env.goal_radius = goal_radius; | |
| temp_env.num_total_agents = shared->num_total_agents; | |
| temp_env.grid_map = shared->grid_map; | |
| if (shared->num_total_agents > 0) { | |
| temp_agents = malloc(sizeof(Agent) * shared->num_total_agents); | |
| if (temp_agents == NULL) { | |
| Py_DECREF(agent_offsets); | |
| Py_DECREF(map_ids); | |
| return PyErr_NoMemory(); | |
| } | |
| for (int i = 0; i < shared->num_total_agents; i++) { | |
| temp_agents[i] = shared->template_agents[i]; | |
| } | |
| } | |
| temp_env.agents = temp_agents; | |
| set_active_agents(&temp_env); | |
| int active_count = temp_env.active_agent_count; | |
| free(temp_env.active_agent_indices); | |
| free(temp_env.static_agent_indices); | |
| free(temp_env.expert_static_agent_indices); | |
| free(temp_agents); |
| // We must NOT free them — they belong to the parent's address space. | ||
| // Discard them and let the child rebuild its own cache on the next call. | ||
| if (g_map_cache_pid != 0 && g_map_cache_pid != getpid()) { | ||
| reset_cache_globals(); | ||
| return; | ||
| } |
There was a problem hiding this comment.
The fork-handling branch in release_map_cache_internal() resets the cache globals on PID mismatch without freeing g_map_cache / g_cache_map_paths or the SharedMapData entries. In a forked child this drops the only pointers to large allocations, effectively leaking them for the lifetime of the child process (and preventing the OS from reclaiming COW pages if the parent later releases them). Consider either (a) reusing the inherited cache after fork by updating g_map_cache_pid to getpid(), or (b) fully freeing/detaching the inherited cache in the child before resetting the globals.
| // We must NOT free them — they belong to the parent's address space. | |
| // Discard them and let the child rebuild its own cache on the next call. | |
| if (g_map_cache_pid != 0 && g_map_cache_pid != getpid()) { | |
| reset_cache_globals(); | |
| return; | |
| } | |
| // Adopt the inherited cache in the child so it can be released normally | |
| // instead of dropping the only references to these allocations. | |
| if (g_map_cache_pid != 0 && g_map_cache_pid != getpid()) | |
| g_map_cache_pid = getpid(); |
| # Wrap rollout + post-processing in try/finally so vecenv.close() is | ||
| # guaranteed to run even if an exception fires mid-rollout. | ||
| # Without this, a crash inside the step loop leaves Drive C envs alive | ||
| # (ref_count > 0), which causes the next binding.shared() call with a | ||
| # different map_dir to raise: | ||
| # RuntimeError: Cannot change map cache config while Drive environments | ||
| # are still open. Call close() on all Drive instances first. | ||
| _rollout_exc = None | ||
| try: |
There was a problem hiding this comment.
eval_multi_scenarios_render is described as being wrapped in try/finally to guarantee vecenv.close(), but the current implementation uses try/except, stores the exception object, and later re-raises it with raise _rollout_exc. Re-raising an exception instance this way loses the original traceback, making render crashes much harder to debug. Prefer try: ... finally: vecenv.close() and let exceptions propagate (or capture/restore exc_info) so the original traceback is preserved.
| with torch.no_grad(): | ||
| ob_t = torch.as_tensor(obs).to(device) | ||
| logits, _ = policy.forward_eval(ob_t, state) | ||
| action, _, _ = pufferlib.pytorch.sample_logits(logits) | ||
| action_np = action.cpu().numpy().reshape(env.action_space.shape) |
There was a problem hiding this comment.
rollout_loop currently samples actions with pufferlib.pytorch.sample_logits(logits) (stochastic). The existing evaluation / render paths in pufferl.py use deterministic sampling for metrics and reproducible renders. To keep this shared rollout loop consistent and reproducible, consider defaulting to deterministic sampling (or adding a deterministic parameter and threading it through).
| num_eval_scenarios=self.current_num_eval_scenarios, # Use the dynamic size here | ||
| road_obs_front_dist=self.road_obs_front_dist, | ||
| road_obs_behind_dist=self.road_obs_behind_dist, | ||
| road_obs_side_dist=self.road_obs_side_dist, |
There was a problem hiding this comment.
The resample-time call to binding.shared() inside Drive.step() does not pass goal_radius, but binding.shared/my_shared unpacks goal_radius as a required kwarg. This will raise a TypeError on the first resample. Pass goal_radius=self.goal_radius here (consistent with the initial call in init).
| road_obs_side_dist=self.road_obs_side_dist, | |
| road_obs_side_dist=self.road_obs_side_dist, | |
| goal_radius=self.goal_radius, |
This PR implements the Map Caching - sharing common (and memory heavy) map features across envs without having to load every
.binfile.Note : The change in
rendercode is meant to removeensure_drive_binary(), and any subsequent depedency onvisualize.c; please continue usingmulti_scenario_renderto control render settings while training.Also adds a small feature to control number of agents in the renders from
drive.ini