-
Notifications
You must be signed in to change notification settings - Fork 26
perf improvements: generate_iso_surface_vertices and generate_sparse_density_map #6
Description
Hello! So as apart of investigation into potentially improving perf, I've collected some stats, and I've identified two target areas that appear to occupy largest portion of meshing budget:
Reconstructed 11790 vertices (indices=64464) from 1000 particlces in 43.492334ms and pushed in 43.679657ms
reconstruct_surface: 100.00%, 43.49ms/call @ 22.99Hz
compute minimum enclosing aabb: 0.01%, 0.01ms/call @ 22.99Hz
neighborhood_search: 11.67%, 5.07ms/call @ 22.99Hz
parallel_generate_cell_to_particle_map: 26.25%, 1.33ms/call @ 22.99Hz
get_cell_neighborhoods_par: 5.06%, 0.26ms/call @ 22.99Hz
calculate_particle_neighbors_par: 64.24%, 3.26ms/call @ 22.99Hz
parallel_compute_particle_densities: 0.47%, 0.21ms/call @ 22.99Hz
parallel_generate_sparse_density_map: 41.18%, 17.91ms/call @ 22.99Hz
triangulate_density_map: 46.62%, 20.28ms/call @ 22.99Hz
interpolate_points_to_cell_data: 91.94%, 18.64ms/call @ 22.99Hz
generate_iso_surface_vertices: 84.61%, 15.77ms/call @ 22.99Hz
relative_to_threshold_postprocessing: 15.36%, 2.86ms/call @ 22.99Hz
triangulate: 8.04%, 1.63ms/call @ 22.99Hz
So for meshing every frame the 1k particles, it takes from 30-50ms; Ideally we can get this down somewhere close to 16ms, so that we could have a one-frame latency delay on generating the meshes for a realtime sim in 60fps.
As such, it looks like generate_iso_surface_vertices (15.7ms) and parallel_generate_sparse_density_map (17.9ms) are good candidates.
I don't know much about fluid simulations, so I'll defer to you on matters here, but I have done a lot of work in perf and optimization; do you think there's any place to attack here, and if so, mind giving me a pointer so I could start/take a look? :)
I'm also wondering perhaps is there any data structures we don't have to compute every frame? Perhaps the density map? Or similar to #4 we could perhaps reuse container structures to reduce allocation strain?
Thanks, and looking forward to your insights here :)