Main merge release/26.06#1270
Conversation
…nds on free variables (NVIDIA#1237) When -inf <= x_j <= inf and we add implied bounds on x_j, we will compute a dual solution (u, w) that satisfies A^T u + w = c + Qx but w_j could be nonzero. When x_j is free, we need to enforce that the corresponding reduced cost is zero. We do this by transforming (u, w) into (y, z) such that A^T y + z = c + Qx with z_j = 0 for all j such that -inf < x_j < inf. We also add a unit test on a small QP to confirm the fix. Fixes NVIDIA#1119 Authors: - Chris Maes (https://github.com/chris-maes) Approvers: - Ramakrishnap (https://github.com/rgsl888prabhu) - Rajesh Gandham (https://github.com/rg20) - Miles Lubin (https://github.com/mlubin) URL: NVIDIA#1237
…IDIA#1227) SonarQube rule **`cpp:S5018`** flags `= default`ed move constructors and move-assignment operators that arent declared `noexcept`. This PR adds `noexcept` to the seven flagged sites on `main`, clearing **7 of 8 BLOCKER** findings. All affected classes already move trivially-noexcept members (`std::vector`, `std::unique_ptr`, `rmm::device_uvector`, raw pointers, scalars), so the specifier compiles cleanly. None of these four types are currently stored by value in an STL container (`mip_node_t` is always behind `unique_ptr` or raw pointer; the other three dont appear in containers), so on the current code this is a runtime no-op — it just satisfies the lint rule. Authors: - Ramakrishnap (https://github.com/rgsl888prabhu) Approvers: - Rajesh Gandham (https://github.com/rg20) - Chris Maes (https://github.com/chris-maes) URL: NVIDIA#1227
This PR includes four performance improvements to dual simplex - Removes an unnecessary BTRAN for computing the steepest edge weight of the entering variable - Rewrites the right-looking LU factorization to use remove linked lists and use a sparse row and column representation of the Schur complement matrix. - Swaps the coefficients in the row representation of A, so that all nonbasic columns appear first. This speeds up computing the change in reduced costs. - Record the amount of work in the solve and refactorization, and trigger refactorization if solve work exceeds factorization work. Accuracy for dual simplex is also improved by - Refactoring the basis when optimal and the primal residual || A x - b ||_inf is large and basis updates are present. There is a 16% (geomean) performance improvement in solve times on the NETLIB LP test set. We are now faster than HSOL (the precursor to HiGHS) on NETLIB. There is a 6% (geomean) performance improvement on time to solve MIPLIB LP relaxations. Authors: - Chris Maes (https://github.com/chris-maes) Approvers: - Nicolas L. Guidotti (https://github.com/nguidotti) URL: NVIDIA#1043
This is a permanent fix to cuda graph capture issue. We add a small RAII wrapper around `cudaStreamBeginCapture` / `cudaStreamEndCapture` that detects `cudaErrorStreamCaptureInvalidated` at EndCapture time, drops the (never-issued) partial graph, re-runs the callable eagerly so the current iteration still produces correct results, and stays uninitialized so the next call retries capture. One extra eager pass instead of a crash. Closing the other PR:NVIDIA#1250 Fixes NVIDIA#1185 Authors: - Akif ÇÖRDÜK (https://github.com/akifcorduk) - Nicolas Blin (https://github.com/Kh4ster) Approvers: - Hugo Linsenmaier (https://github.com/hlinsen) - Ramakrishnap (https://github.com/rgsl888prabhu) URL: NVIDIA#1253
NVIDIA#1257) Add CUOPT_MIP_PROBING setting (default true) to let users disable the probing-cache step of cuOpt's internal MIP presolve without disabling the rest of presolve. Existing CUOPT_PRESOLVE=0 still turns the whole pipeline off, so CUOPT_MIP_PROBING only matters when presolve is otherwise enabled. Authors: - Akif ÇÖRDÜK (https://github.com/akifcorduk) Approvers: - Miles Lubin (https://github.com/mlubin) - Chris Maes (https://github.com/chris-maes) - Ramakrishnap (https://github.com/rgsl888prabhu) URL: NVIDIA#1257
📝 WalkthroughWalkthroughThis pull request refactors core solver infrastructure across dual simplex, branch-and-bound, and CUDA graph execution. It introduces MIP probing parameter configuration, rewires phase 2's linear algebra to use CSR row-based matrices, modernizes pseudo-cost and LU data structures, enhances presolve with bounded-free-variable dual correction, and abstracts CUDA graph capture/launch into a reusable utility. ChangesSolver Optimization and Infrastructure Refactoring
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 3❌ Failed checks (1 warning, 2 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
cpp/src/branch_and_bound/pseudo_costs.hpp (1)
124-133:⚠️ Potential issue | 🟠 Major | ⚡ Quick winCopy
Arowinpseudo_costs_t::operator=.Line 127 still copies the old transpose pointer, but the new dual-pivot paths read
Arow. Any copiedpseudo_costs_t/pseudo_cost_snapshot_tkeeps the default 1x1 row matrix, which can corrupt candidate ranking or trip invalid indexing when the estimate code runs.Proposed fix
if (this != &other) { this->AT = other.AT; + this->Arow = other.Arow; this->pdlp_warm_cache = other.pdlp_warm_cache; this->pseudo_cost_num_down = other.pseudo_cost_num_down; this->pseudo_cost_num_up = other.pseudo_cost_num_up; this->pseudo_cost_sum_down = other.pseudo_cost_sum_down; this->pseudo_cost_sum_up = other.pseudo_cost_sum_up;🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cpp/src/branch_and_bound/pseudo_costs.hpp` around lines 124 - 133, The operator= implementation for pseudo_costs_t fails to copy the Arow member, leaving copied objects with the default 1x1 row matrix and causing corruption; update pseudo_costs_t::operator= to also assign this->Arow = other.Arow (and if there is a related pseudo_cost_snapshot_t copy/assignment, ensure it likewise copies its Arow) so the transpose/pivot paths and candidate ranking use the correct row matrix after copy.cpp/src/mip_heuristics/feasibility_jump/feasibility_jump.cu (1)
690-711:⚠️ Potential issue | 🟠 Major | ⚡ Quick winWrap the new
cub::DeviceSelect::Flagged(...)calls withRAFT_CUDA_TRYIn
cpp/src/mip_heuristics/feasibility_jump/feasibility_jump.cu, the CUB status from both the temp-storage probe (~690-711) and the actual select/compaction call (~776-783) is ignored; the codebase wrapscub::DeviceSelect::FlaggedwithRAFT_CUDA_TRYelsewhere.Proposed fix
- cub::DeviceSelect::Flagged((void*)nullptr, - compaction_temp_storage_bytes, - thrust::counting_iterator<i_t>(0), - valid_move_iterator, - data.candidate_variables.contents.data(), - data.candidate_variables.set_size.data(), - pb_ptr->n_variables, - climber_stream); + RAFT_CUDA_TRY(cub::DeviceSelect::Flagged((void*)nullptr, + compaction_temp_storage_bytes, + thrust::counting_iterator<i_t>(0), + valid_move_iterator, + data.candidate_variables.contents.data(), + data.candidate_variables.set_size.data(), + pb_ptr->n_variables, + climber_stream)); @@ - cub::DeviceSelect::Flagged((void*)data.cub_storage_bytes.data(), - compaction_temp_storage_bytes, - thrust::counting_iterator<i_t>(0), - valid_move_iterator, - data.candidate_variables.contents.data(), - data.candidate_variables.set_size.data(), - pb_ptr->n_variables, - climber_stream); + RAFT_CUDA_TRY(cub::DeviceSelect::Flagged((void*)data.cub_storage_bytes.data(), + compaction_temp_storage_bytes, + thrust::counting_iterator<i_t>(0), + valid_move_iterator, + data.candidate_variables.contents.data(), + data.candidate_variables.set_size.data(), + pb_ptr->n_variables, + climber_stream));🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cpp/src/mip_heuristics/feasibility_jump/feasibility_jump.cu` around lines 690 - 711, The cub::DeviceSelect::Flagged calls (the temp-storage probe using compaction_temp_storage_bytes and the later actual select/compaction call that writes into data.candidate_variables.contents/data.candidate_variables.set_size) must be wrapped with RAFT_CUDA_TRY to check and propagate CUB errors; locate the calls to cub::DeviceSelect::Flagged around the valid_move_iterator/compaction_temp_storage_bytes probe and the later select that uses climber_stream, and wrap each invocation with RAFT_CUDA_TRY(...) so any non-success CUB status is handled (keep the existing logic that resizes data.cub_storage_bytes when compaction_temp_storage_bytes grows and preserve use_graph/step_graph_.is_initialized flow).
🧹 Nitpick comments (1)
cpp/src/utilities/manual_cuda_graph.cuh (1)
117-123: ⚡ Quick winCheck CUDA cleanup failures in the capture guard.
capture_guard_t::~capture_guard_t()still uses barecudaStreamEndCapture/cudaGraphDestroy, so a failed unwind path is silently dropped exactly when the stream is already in an exceptional state.RAFT_CUDA_TRY_NO_THROWkeeps the cleanup best-effort while preserving the repo’s CUDA error-checking contract.♻️ Suggested change
~capture_guard_t() noexcept { if (capture_active) { cudaGraph_t dummy = nullptr; // best-effort; we're already unwinding - cudaStreamEndCapture(stream, &dummy); - if (dummy != nullptr) { cudaGraphDestroy(dummy); } + RAFT_CUDA_TRY_NO_THROW(cudaStreamEndCapture(stream, &dummy)); + if (dummy != nullptr) { RAFT_CUDA_TRY_NO_THROW(cudaGraphDestroy(dummy)); } } }As per coding guidelines, ".github/.coderabbit_review_guide.md: flag any bare CUDA calls or unchecked kernel errors; require RAFT_CUDA_TRY for CUDA error checking."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cpp/src/utilities/manual_cuda_graph.cuh` around lines 117 - 123, The destructor capture_guard_t::~capture_guard_t currently calls cudaStreamEndCapture and cudaGraphDestroy directly when capture_active is true; wrap those cleanup calls with RAFT_CUDA_TRY_NO_THROW to preserve best-effort cleanup while honoring the repo's CUDA error-checking contract — e.g., call RAFT_CUDA_TRY_NO_THROW(cudaStreamEndCapture(stream, &dummy)) and, if dummy != nullptr, RAFT_CUDA_TRY_NO_THROW(cudaGraphDestroy(dummy)) instead of the bare calls so failures are non-throwing but still checked.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@cpp/src/dual_simplex/presolve.cpp`:
- Around line 991-1006: The code currently appends bounded-free corrections into
presolve_info.bounded_free_variables in the order of current_free_variables but
later replays them in reverse, which breaks derivation-dependent correctness
because presolve() updates problem.lower/upper while scanning constraints; to
fix, record and replay corrections in derivation order: when pushing a
bounded-free record from the loop in presolve (the block using
current_free_variables, bounding_constraint, bounding_coefficient) include its
discovery sequence (or push in the true derivation order) and change the
uncrush/replay logic to iterate presolve_info.bounded_free_variables in that
recorded derivation order (not reversed), or perform an explicit topological
ordering of the recorded corrections before replay to ensure dependent
corrections are applied in the same sequence they were derived.
In `@cpp/src/dual_simplex/right_looking_lu.cpp`:
- Around line 669-685: When relocating a column/row you must account for the
entire abandoned span from the old start up to its old max, not just
current_size; replace the increment unused_col_nz_ += current_size; with
unused_col_nz_ += (col_max_[j] - c_start) for the column relocation (and
similarly replace unused_row_nz_ += current_size with unused_row_nz_ +=
(row_max_[i] - r_start) in the row-relocation block), so the dead storage freed
by moving col_start_[j]/row_start_[i] is fully counted before updating
col_start_/row_start_ and col_max_/row_max_.
In `@cpp/src/grpc/cuopt_remote.proto`:
- Line 205: Change the proto field to have presence and only override the C++
default when present: make the field in cuopt_remote.proto "optional bool
probing = 29;" (so MIPSolverSettings.probing has presence) and update
map_proto_to_mip_settings to check pb_settings.has_probing() before assigning
(i.e., only call settings.probing = pb_settings.probing() when has_probing() is
true) to avoid unconditionally overwriting the default true in
mip_settings.probing.
In `@cpp/src/mip_heuristics/feasibility_jump/feasibility_jump.cuh`:
- Line 271: step_graph_ is a shared graph cache that run_step_device() replays
with climber-specific kernel args and streams, so change the design to scope the
graph per-climber or per-stream: replace the single cuopt::manual_cuda_graph_t
step_graph_ with a keyed cache (e.g., std::vector/cuopt::manual_cuda_graph_t
per_climber_graph_ or an unordered_map keyed by climber_idx/stream) or ensure
step_graph_ is reset and re-recorded on each climber switch; update
run_step_device() to lookup/use the graph for the current climber_idx (or create
it lazily), and protect access with appropriate synchronization (mutex) if
multiple threads can access different climbers concurrently to avoid replaying
on the wrong buffers/stream.
In `@cpp/src/pdlp/pdhg.hpp`:
- Around line 130-136: The comment above the cache members is out of sync with
the actual members: it references a `graph_all_non_major` reflected-path cache
but the type only contains `graph_all` and `graph_prim_proj_gradient_dual`. Fix
by either renaming the comment to accurately describe the two stored caches
(`graph_all` and `graph_prim_proj_gradient_dual`) or add the missing member to
the cache type so the reflected non-major branch actually has its own cache;
update any resize/reset references that assume `graph_all_non_major` to use the
corrected member names. Ensure the comment and the cache type (the member
container holding graph_all and graph_prim_proj_gradient_dual) are consistent so
the resize/reset path clearly maps to the actual members.
In
`@python/cuopt_server/cuopt_server/utils/linear_programming/data_definition.py`:
- Around line 527-534: The Field definition for mip_probing has default=None but
its description states "True (default)"; update the schema so the declared
default matches the documented behavior: either change the Field default to True
or amend the description to explicitly state that None means "unset; backend
default (currently true)". Locate the mip_probing Field in data_definition.py
(the mip_probing Optional[bool] = Field(...) declaration) and apply one clear
fix so OpenAPI/client generation and the docstring are consistent.
---
Outside diff comments:
In `@cpp/src/branch_and_bound/pseudo_costs.hpp`:
- Around line 124-133: The operator= implementation for pseudo_costs_t fails to
copy the Arow member, leaving copied objects with the default 1x1 row matrix and
causing corruption; update pseudo_costs_t::operator= to also assign this->Arow =
other.Arow (and if there is a related pseudo_cost_snapshot_t copy/assignment,
ensure it likewise copies its Arow) so the transpose/pivot paths and candidate
ranking use the correct row matrix after copy.
In `@cpp/src/mip_heuristics/feasibility_jump/feasibility_jump.cu`:
- Around line 690-711: The cub::DeviceSelect::Flagged calls (the temp-storage
probe using compaction_temp_storage_bytes and the later actual select/compaction
call that writes into
data.candidate_variables.contents/data.candidate_variables.set_size) must be
wrapped with RAFT_CUDA_TRY to check and propagate CUB errors; locate the calls
to cub::DeviceSelect::Flagged around the
valid_move_iterator/compaction_temp_storage_bytes probe and the later select
that uses climber_stream, and wrap each invocation with RAFT_CUDA_TRY(...) so
any non-success CUB status is handled (keep the existing logic that resizes
data.cub_storage_bytes when compaction_temp_storage_bytes grows and preserve
use_graph/step_graph_.is_initialized flow).
---
Nitpick comments:
In `@cpp/src/utilities/manual_cuda_graph.cuh`:
- Around line 117-123: The destructor capture_guard_t::~capture_guard_t
currently calls cudaStreamEndCapture and cudaGraphDestroy directly when
capture_active is true; wrap those cleanup calls with RAFT_CUDA_TRY_NO_THROW to
preserve best-effort cleanup while honoring the repo's CUDA error-checking
contract — e.g., call RAFT_CUDA_TRY_NO_THROW(cudaStreamEndCapture(stream,
&dummy)) and, if dummy != nullptr,
RAFT_CUDA_TRY_NO_THROW(cudaGraphDestroy(dummy)) instead of the bare calls so
failures are non-throwing but still checked.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: a712bf6a-6f6e-47a8-931c-18f14a5ec84e
⛔ Files ignored due to path filters (1)
datasets/quadratic_programming/min_x_squared.mpsis excluded by!**/*.mps
📒 Files selected for processing (39)
cpp/include/cuopt/linear_programming/constants.hcpp/include/cuopt/linear_programming/mip/solver_settings.hppcpp/include/cuopt/linear_programming/optimization_problem.hppcpp/include/cuopt/linear_programming/pdlp/pdlp_warm_start_data.hppcpp/src/branch_and_bound/branch_and_bound.cppcpp/src/branch_and_bound/deterministic_workers.hppcpp/src/branch_and_bound/mip_node.hppcpp/src/branch_and_bound/pseudo_costs.cppcpp/src/branch_and_bound/pseudo_costs.hppcpp/src/dual_simplex/phase2.cppcpp/src/dual_simplex/phase2.hppcpp/src/dual_simplex/presolve.cppcpp/src/dual_simplex/presolve.hppcpp/src/dual_simplex/right_looking_lu.cppcpp/src/dual_simplex/solve.cppcpp/src/grpc/cuopt_remote.protocpp/src/grpc/grpc_settings_mapper.cppcpp/src/math_optimization/solver_settings.cucpp/src/mip_heuristics/diversity/diversity_manager.cucpp/src/mip_heuristics/feasibility_jump/feasibility_jump.cucpp/src/mip_heuristics/feasibility_jump/feasibility_jump.cuhcpp/src/pdlp/cpu_optimization_problem.cppcpp/src/pdlp/optimization_problem.cucpp/src/pdlp/pdhg.cucpp/src/pdlp/pdhg.hppcpp/src/pdlp/pdlp.cucpp/src/pdlp/restart_strategy/weighted_average_solution.cucpp/src/pdlp/solve.cucpp/src/pdlp/step_size_strategy/adaptive_step_size_strategy.cucpp/src/pdlp/utilities/ping_pong_graph.cucpp/src/pdlp/utilities/ping_pong_graph.cuhcpp/src/utilities/manual_cuda_graph.cuhcpp/tests/dual_simplex/unit_tests/solve.cppcpp/tests/dual_simplex/unit_tests/solve_barrier.cucpp/tests/mip/incumbent_callback_test.cudocs/cuopt/source/cuopt-c/lp-qp-milp/lp-qp-milp-c-api.rstdocs/cuopt/source/lp-qp-milp-settings.rstpython/cuopt/cuopt/tests/linear_programming/test_incumbent_callbacks.pypython/cuopt_server/cuopt_server/utils/linear_programming/data_definition.py
💤 Files with no reviewable changes (1)
- cpp/src/pdlp/utilities/ping_pong_graph.cu
| // Record bounded free variables for dual correction in uncrush. | ||
| // After the keep-one-bound logic, each bounded variable has exactly one finite bound. | ||
| for (i_t j : current_free_variables) { | ||
| i_t bounding_constraint = -1; | ||
| f_t bounding_coefficient = 0.0; | ||
| if (problem.lower[j] > -inf && lower_bound_constraint[j] != -1) { | ||
| bounding_constraint = lower_bound_constraint[j]; | ||
| bounding_coefficient = lower_bound_coefficient[j]; | ||
| } else if (problem.upper[j] < inf && upper_bound_constraint[j] != -1) { | ||
| bounding_constraint = upper_bound_constraint[j]; | ||
| bounding_coefficient = upper_bound_coefficient[j]; | ||
| } | ||
| if (bounding_constraint != -1) { | ||
| presolve_info.bounded_free_variables.push_back( | ||
| {j, bounding_constraint, bounding_coefficient}); | ||
| } |
There was a problem hiding this comment.
Replay bounded-free dual corrections in derivation order, not reverse variable order.
These fixes are order-dependent because presolve() updates problem.lower/upper as it scans constraints, so a later implied bound can rely on an earlier bounded free variable. bounded_free_variables is recorded in current_free_variables order, then replayed in reverse of that order here; that is not the inverse of the derivation chain. In the dependent case, a later correction can resurrect nonzero z on a free variable you already zeroed. Please record the retained bound's derivation sequence (or otherwise topologically order the corrections) and replay by that order instead.
As per coding guidelines, flag algorithm correctness errors including logic errors in optimization algorithms.
Also applies to: 1771-1795
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@cpp/src/dual_simplex/presolve.cpp` around lines 991 - 1006, The code
currently appends bounded-free corrections into
presolve_info.bounded_free_variables in the order of current_free_variables but
later replays them in reverse, which breaks derivation-dependent correctness
because presolve() updates problem.lower/upper while scanning constraints; to
fix, record and replay corrections in derivation order: when pushing a
bounded-free record from the loop in presolve (the block using
current_free_variables, bounding_constraint, bounding_coefficient) include its
discovery sequence (or push in the true derivation order) and change the
uncrush/replay logic to iterate presolve_info.bounded_free_variables in that
recorded derivation order (not reversed), or perform an explicit topological
ordering of the recorded corrections before replay to ensure dependent
corrections are applied in the same sequence they were derived.
| for (i_t p = r_start; p < r_end; p++) { | ||
| const i_t j = r_j_[p]; | ||
| // Look up the value from the column copy of j | ||
| f_t val = 0; | ||
| const i_t c_start = col_start_[j]; | ||
| const i_t c_end = col_end_[j]; | ||
| for (i_t q = c_start; q < c_end; q++) { | ||
| if (c_i_[q] == i) { | ||
| val = c_x_[q]; | ||
| break; | ||
| } | ||
| } | ||
| work_estimate_ += 2 * (c_end - c_start); |
There was a problem hiding this comment.
Row-side value lookups are now a hot-path complexity regression.
The row path no longer has direct access to coefficients, so markowitz_search() and cache_pivot_row() both rescan the full column copy to recover each a_ij. In the factorization loop that turns row-side access from O(1) per entry into O(column_degree), which can make pivot search/extraction quadratic in the active nnz and wipe out the intended LU speedups on larger bases.
Also applies to: 443-455
| unused_col_nz_ += current_size; | ||
| i_t new_start = c_i_.size(); | ||
| for (i_t p = c_start; p < c_end; p++) { | ||
| c_i_.push_back(c_i_[p]); | ||
| c_x_.push_back(c_x_[p]); | ||
| } | ||
| work_estimate += 3 * col_loop_count; | ||
| max_in_column[j] = max_in_col_j; | ||
| row_loop_count++; | ||
| work_estimate_ += 2 * (c_end - c_start); | ||
| col_start_[j] = new_start; | ||
| col_end_[j] = c_i_.size(); | ||
| // Reserve space using doubling strategy to reduce future relocations | ||
| i_t extra = std::max(current_size, needed); | ||
| for (i_t k = 0; k < extra; k++) { | ||
| c_i_.push_back(kNone); | ||
| c_x_.push_back(0.0); | ||
| } | ||
| work_estimate_ += 2 * extra; | ||
| col_max_[j] = c_i_.size(); |
There was a problem hiding this comment.
Count the full abandoned span when relocating rows and columns.
unused_col_nz_ and unused_row_nz_ only add current_size, but after col_start_[j]/row_start_[i] moves the whole old segment up to col_max_[j]/row_max_[i] is unreachable. That undercounts dead storage, so garbage_collect() can miss large amounts of stranded capacity and let these vectors grow much more than intended on difficult factorizations.
Proposed fix
- i_t current_size = c_end - c_start;
- unused_col_nz_ += current_size;
+ i_t current_size = c_end - c_start;
+ unused_col_nz_ += col_max_[j] - c_start;
@@
- i_t current_size = r_end - r_start;
- unused_row_nz_ += current_size;
+ i_t current_size = r_end - r_start;
+ unused_row_nz_ += row_max_[i] - r_start;Also applies to: 705-719
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@cpp/src/dual_simplex/right_looking_lu.cpp` around lines 669 - 685, When
relocating a column/row you must account for the entire abandoned span from the
old start up to its old max, not just current_size; replace the increment
unused_col_nz_ += current_size; with unused_col_nz_ += (col_max_[j] - c_start)
for the column relocation (and similarly replace unused_row_nz_ += current_size
with unused_row_nz_ += (row_max_[i] - r_start) in the row-relocation block), so
the dead storage freed by moving col_start_[j]/row_start_[i] is fully counted
before updating col_start_/row_start_ and col_max_/row_max_.
| int32 seed = 28; | ||
|
|
||
| // Presolve sub-steps | ||
| bool probing = 29; // default true on the C++ side; see grpc_settings_mapper.cpp |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "1) Proto field declaration:"
rg -n 'probing\s*=\s*29' cpp/src/grpc/cuopt_remote.proto
echo
echo "2) Mapper assignment (unconditional read):"
rg -n 'settings\.probing\s*=\s*pb_settings\.probing\(\)' cpp/src/grpc/grpc_settings_mapper.cpp
echo
echo "3) Presence check currently absent (expected no matches before fix):"
rg -n 'has_probing\s*\(' cpp/src/grpc/grpc_settings_mapper.cpp || trueRepository: NVIDIA/cuopt
Length of output: 334
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== cuopt_remote.proto context around probing = 29 =="
sed -n '180,230p' cpp/src/grpc/cuopt_remote.proto
echo
echo "== grpc_settings_mapper.cpp context around settings.probing = pb_settings.probing() =="
sed -n '250,330p' cpp/src/grpc/grpc_settings_mapper.cppRepository: NVIDIA/cuopt
Length of output: 5318
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== Find the mapper function containing settings.probing = pb_settings.probing() =="
rg -n "settings\.probing\s*=\s*pb_settings\.probing\(\)" cpp/src/grpc/grpc_settings_mapper.cpp
echo
echo "== Show a larger context window above that assignment =="
sed -n '240,310p' cpp/src/grpc/grpc_settings_mapper.cpp
echo
echo "== Show the function header / preceding lines =="
# Try to locate the nearest map_proto_to_pdlp_settings definition
rg -n "map_proto_to_pdlp_settings" cpp/src/grpc/grpc_settings_mapper.cppRepository: NVIDIA/cuopt
Length of output: 4076
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== Search for C++ default initialization of probing =="
rg -n --hidden --no-ignore-vcs "probing" cpp/src | head -n 50
echo
echo "== Look specifically for probing default true in settings definitions =="
rg -n "probing\s*=\s*(true|false)|bool\s+probing\b|probing\)\s*;\s*$|probing\(" cpp/src | head -n 80
echo
echo "== Search for pdlp_solver_settings_t probing field =="
rg -n "pdlp_solver_settings_t<.*>.*probing|struct\s+.*pdlp_solver_settings_t|PDLPSolverSettings.*probing" cpp/src | head -n 80
echo
echo "== Find definition of pdlp_solver_settings_t =="
rg -n "template <.*pdlp_solver_settings_t|pdlp_solver_settings_t" cpp/src | head -n 80Repository: NVIDIA/cuopt
Length of output: 15863
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== Locate pdlp_solver_settings_t definition files =="
fd "solver_settings" cpp/src -t f 2>/dev/null | head
rg -n "struct\s+pdlp_solver_settings_t|class\s+pdlp_solver_settings_t" cpp/src/pdlp | head -n 50
echo
echo "== Search for 'probing' inside pdlp solver settings types =="
rg -n "probing" cpp/src/pdlp/solver_settings* cpp/src/pdlp -S | head -n 50
echo
echo "== Search for member 'probing' initialization/default =="
rg -n "probing\s*=" cpp/src/pdlp -S | head -n 80
echo
echo "== Search for 'probing' inside dual simplex settings (if shared) =="
rg -n "probing" cpp/src/dual_simplex -S | head -n 50Repository: NVIDIA/cuopt
Length of output: 549
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== Who declares/initializes the C++ member settings.probing ? =="
rg -n "bool\s+probing\b|probing\s*=\s*true|probing\s*=\s*false" cpp/src -S | head -n 200
echo
echo "== Find all occurrences of '.probing' assignments in grpc mapper =="
rg -n "settings\.probing\s*=" cpp/src/grpc -S
echo
echo "== Locate map_proto_to_pdlp_settings signature + parameter type =="
sed -n '90,160p' cpp/src/grpc/grpc_settings_mapper.cpp
echo
echo "== Inspect solver settings type that has probing (search common member name) =="
rg -n "probing" cpp/src/dual_simplex/simplex_solver_settings.hpp cpp/src/math_optimization/solver_settings.cu cpp/src/mip_heuristics/solver_settings.cu cpp/src/routing/solver_settings.cu 2>/dev/null || trueRepository: NVIDIA/cuopt
Length of output: 4867
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== grpc_settings_mapper.cpp occurrences of probing =="
rg -n "probing" cpp/src/grpc/grpc_settings_mapper.cpp
echo
echo "== Find CUOPT_*PROBING constants in settings initialization code =="
rg -n "PROBING" cpp/src/math_optimization/solver_settings.cu cpp/src/mip_heuristics/solver_settings.cu cpp/src/routing/solver_settings.cu cpp/src/pdlp/solver_settings.cu 2>/dev/null || true
echo
echo "== Inspect math_optimization/solver_settings.cu around probing default table entry =="
sed -n '130,230p' cpp/src/math_optimization/solver_settings.cu
echo
echo "== Search for member declaration of probing in solver settings types =="
rg -n "\bprobing\b" cpp/src/dual_simplex/simplex_solver_settings.hpp cpp/src/math_optimization/solver_settings.cu cpp/src/pdlp/solver_settings.cu cpp/src/grpc/grpc_settings_mapper.hpp cpp/src/grpc/grpc_settings_mapper.cpp 2>/dev/null || trueRepository: NVIDIA/cuopt
Length of output: 7596
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== Find definition of pdlp_solver_settings_t =="
rg -n --hidden --no-ignore-vcs "struct\s+pdlp_solver_settings_t|class\s+pdlp_solver_settings_t|template\s*<[^>]*>\s*struct\s+pdlp_solver_settings_t|template\s*<[^>]*>\s*class\s+pdlp_solver_settings_t" cpp/src/pdlp cpp/src | head -n 50
echo
echo "== Look for pdlp_settings.probing / CUOPT_*_PROBING defaults for pdlp settings =="
rg -n "pdlp_settings\.probing|CUOPT_.*PROBING.*pdlp_settings|&pdlp_settings\.probing" cpp/src | head -n 50
echo
echo "== Locate pdlp_solver_settings_t member list / any 'probing' field declarations =="
rg -n --hidden --no-ignore-vcs "probing\s*;" cpp/src/pdlp cpp/src | head -n 50
echo
echo "== Inspect type of settings in map_proto_to_pdlp_settings (to see where default might come from) =="
sed -n '200,320p' cpp/src/grpc/grpc_settings_mapper.cppRepository: NVIDIA/cuopt
Length of output: 751
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== Search for pdlp_settings.probing in the repo =="
rg -n "pdlp_settings\.probing" cpp/src -S || true
echo
echo "== Search for 'probing' occurrences in math_optimization/solver_settings.cu (defaults table) =="
rg -n "probing" cpp/src/math_optimization/solver_settings.cu -S || true
echo
echo "== Search for 'struct pdlp_solver_settings_t' definition across src (not just instantiations) =="
rg -n --hidden --no-ignore-vcs "struct\s+pdlp_solver_settings_t|class\s+pdlp_solver_settings_t|template\s*<[^>]*>\s*struct\s+pdlp_solver_settings_t|template\s*<[^>]*>\s*class\s+pdlp_solver_settings_t" cpp/src -S || true
echo
echo "== Search for 'probing' member declarations in dual_simplex/simplex_solver_settings.hpp =="
rg -n "\bprobing\b" cpp/src/dual_simplex/simplex_solver_settings.hpp -S || true
echo
echo "== Search for 'probing' in dual_simplex directory =="
rg -n "\bprobing\b" cpp/src/dual_simplex -S || trueRepository: NVIDIA/cuopt
Length of output: 1016
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== cuopt_remote.proto: message context around line ~205 =="
sed -n '150,220p' cpp/src/grpc/cuopt_remote.proto
echo
echo "== pdlp: locate solver settings type definition files =="
fd "solver_settings.*" cpp/src/pdlp -t f
echo
echo "== Search for pdlp_solver_settings_t definition (non-instantiation) =="
rg -n --hidden --no-ignore-vcs "pdlp_solver_settings_t" cpp/src/pdlp | head -n 200
echo
echo "== Search for a member named 'probing' in pdlp settings files =="
rg -n --hidden --no-ignore-vcs "\bprobing\b" cpp/src/pdlp | head -n 200
echo
echo "== Broader search for 'struct pdlp_solver_settings_t' (anywhere) =="
rg -n --hidden --no-ignore-vcs "struct\s+pdlp_solver_settings_t" cpp/src | head -n 50Repository: NVIDIA/cuopt
Length of output: 11072
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== Locate map_proto_to_mip_settings and map_proto_to_pdlp_settings in grpc_settings_mapper.cpp =="
rg -n "map_proto_to_(pdlp|mip)_settings" cpp/src/grpc/grpc_settings_mapper.cpp
rg -n "MIPSolverSettings|PDLPSolverSettings" cpp/src/grpc/grpc_settings_mapper.cpp
echo
echo "== Show mapper file around the assignment at line ~296 =="
sed -n '200,340p' cpp/src/grpc/grpc_settings_mapper.cpp
echo
echo "== Show more above to capture which function the assignment belongs to =="
sed -n '100,220p' cpp/src/grpc/grpc_settings_mapper.cpp
echo
echo "== Show proto definition for MIPSolverSettings and PDLPSolverSettings around probing field =="
rg -n "message (PDLPSolverSettings|MIPSolverSettings)" cpp/src/grpc/cuopt_remote.proto
sed -n '100,230p' cpp/src/grpc/cuopt_remote.protoRepository: NVIDIA/cuopt
Length of output: 18729
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== Call sites for map_proto_to_mip_settings =="
rg -n "map_proto_to_mip_settings\s*\(" cpp/src/grpc -S
echo
echo "== Find where bool_parameters defaults are applied in solver_settings_t =="
rg -n "bool_parameters|set_parameter_from|initialize|default|defaults" cpp/src/math_optimization/solver_settings.cu
echo
echo "== Open solver_settings.cu around where parameter tables are initialized/applied =="
sed -n '1,140p' cpp/src/math_optimization/solver_settings.cu
echo
echo "== Search for constructor / init that sets bool_parameters' default values =="
rg -n "solver_settings_t<|::solver_settings_t|constructor" cpp/src/math_optimization/solver_settings.cu cpp/src/math_optimization/solver_settings.cuh 2>/dev/null || trueRepository: NVIDIA/cuopt
Length of output: 15715
Fix proto3 presence for MIPSolverSettings.probing to avoid defaulting to false when omitted.
File: cpp/src/grpc/cuopt_remote.proto
Lines: 205-205
bool probing = 29; // default true on the C++ side; see grpc_settings_mapper.cpp
In proto3, bool probing has no presence; when omitted by an older/partial client it deserializes as false. map_proto_to_mip_settings then unconditionally assigns settings.probing = pb_settings.probing();, overwriting the intended C++ default (mip_settings.probing is configured with default true in cpp/src/math_optimization/solver_settings.cu).
Use optional bool and gate the assignment on has_probing().
🔧 Proposed fix
--- a/cpp/src/grpc/cuopt_remote.proto
+++ b/cpp/src/grpc/cuopt_remote.proto
@@
- bool probing = 29; // default true on the C++ side; see grpc_settings_mapper.cpp
+ optional bool probing = 29; // default true on the C++ side; see grpc_settings_mapper.cpp--- a/cpp/src/grpc/grpc_settings_mapper.cpp
+++ b/cpp/src/grpc/grpc_settings_mapper.cpp
@@
- settings.probing = pb_settings.probing();
+ if (pb_settings.has_probing()) { settings.probing = pb_settings.probing(); }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| bool probing = 29; // default true on the C++ side; see grpc_settings_mapper.cpp | |
| optional bool probing = 29; // default true on the C++ side; see grpc_settings_mapper.cpp |
| bool probing = 29; // default true on the C++ side; see grpc_settings_mapper.cpp | |
| if (pb_settings.has_probing()) { settings.probing = pb_settings.probing(); } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@cpp/src/grpc/cuopt_remote.proto` at line 205, Change the proto field to have
presence and only override the C++ default when present: make the field in
cuopt_remote.proto "optional bool probing = 29;" (so MIPSolverSettings.probing
has presence) and update map_proto_to_mip_settings to check
pb_settings.has_probing() before assigning (i.e., only call settings.probing =
pb_settings.probing() when has_probing() is true) to avoid unconditionally
overwriting the default true in mip_settings.probing.
|
|
||
| cudaGraphExec_t graph_instance; | ||
| bool graph_created = false; | ||
| cuopt::manual_cuda_graph_t step_graph_; |
There was a problem hiding this comment.
Scope the graph cache to a climber or stream.
step_graph_ is shared across the whole fj_t, but run_step_device() captures climber-specific kernel arguments and uses the selected climber stream. If a nonzero climber_idx ever reuses this cache, it will replay kernels against the wrong buffers/stream unless the graph is keyed or reset per climber.
As per coding guidelines, **/*.{cpp,cuh,cu,hpp}: Flag missing synchronization for shared state in concurrent code.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@cpp/src/mip_heuristics/feasibility_jump/feasibility_jump.cuh` at line 271,
step_graph_ is a shared graph cache that run_step_device() replays with
climber-specific kernel args and streams, so change the design to scope the
graph per-climber or per-stream: replace the single cuopt::manual_cuda_graph_t
step_graph_ with a keyed cache (e.g., std::vector/cuopt::manual_cuda_graph_t
per_climber_graph_ or an unordered_map keyed by climber_idx/stream) or ensure
step_graph_ is reset and re-recorded on each climber switch; update
run_step_device() to lookup/use the graph for the current climber_idx (or create
it lazily), and protect access with appropriate synchronization (mutex) if
multiple threads can access different climbers concurrently to avoid replaying
on the wrong buffers/stream.
| // Or skip the SpMV (most cases) if it was done at the previous iteration. | ||
| // The reflected primal/dual path branches on `should_major`, and the two branches build | ||
| // different graph topologies. They get separate ping-pong caches so each branch can key its | ||
| // 2-slot cache on `total_pdlp_iterations` parity (the swap state of the primal/dual buffers | ||
| // baked into the captured graph) without colliding with the other branch's topology. | ||
| // graph_all serves the non-reflected path and the major reflected branch (mutually exclusive | ||
| // at runtime); graph_all_non_major serves the non-major reflected branch. |
There was a problem hiding this comment.
Keep the cache comment aligned with the actual members.
The comment describes a graph_all_non_major cache, but this type only stores graph_all and graph_prim_proj_gradient_dual. Please rename the comment or add the missing member; otherwise the resize/reset path reads as if a second reflected-path cache exists somewhere else.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@cpp/src/pdlp/pdhg.hpp` around lines 130 - 136, The comment above the cache
members is out of sync with the actual members: it references a
`graph_all_non_major` reflected-path cache but the type only contains
`graph_all` and `graph_prim_proj_gradient_dual`. Fix by either renaming the
comment to accurately describe the two stored caches (`graph_all` and
`graph_prim_proj_gradient_dual`) or add the missing member to the cache type so
the reflected non-major branch actually has its own cache; update any
resize/reset references that assume `graph_all_non_major` to use the corrected
member names. Ensure the comment and the cache type (the member container
holding graph_all and graph_prim_proj_gradient_dual) are consistent so the
resize/reset path clearly maps to the actual members.
| mip_probing: Optional[bool] = Field( | ||
| default=None, | ||
| description="Enable or disable the cuOpt-internal probing-cache step of " # noqa | ||
| "MIP presolve. True (default) runs probing as part of presolve; False " # noqa | ||
| "skips probing while leaving the rest of presolve untouched. Has no " # noqa | ||
| "effect if presolve is disabled (presolve=0) or when running in " # noqa | ||
| "deterministic mode (probing is already skipped). LP-only solves " # noqa | ||
| "ignore this setting.", |
There was a problem hiding this comment.
Align mip_probing schema default with the documented default behavior.
Line 528 sets default=None, while the description says True (default). This creates an ambiguous API contract for OpenAPI/client generation. Please either set the field default to True or explicitly document that None means “unset; backend default (currently true)”.
Suggested clarification (non-breaking)
mip_probing: Optional[bool] = Field(
default=None,
- description="Enable or disable the cuOpt-internal probing-cache step of "
- "MIP presolve. True (default) runs probing as part of presolve; False "
+ description="Enable or disable the cuOpt-internal probing-cache step of "
+ "MIP presolve. If unset (None), backend default behavior applies "
+ "(currently equivalent to True). False "
"skips probing while leaving the rest of presolve untouched. Has no "
"effect if presolve is disabled (presolve=0) or when running in "
"deterministic mode (probing is already skipped). LP-only solves "
"ignore this setting.",
)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@python/cuopt_server/cuopt_server/utils/linear_programming/data_definition.py`
around lines 527 - 534, The Field definition for mip_probing has default=None
but its description states "True (default)"; update the schema so the declared
default matches the documented behavior: either change the Field default to True
or amend the description to explicitly state that None means "unset; backend
default (currently true)". Locate the mip_probing Field in data_definition.py
(the mip_probing Optional[bool] = Field(...) declaration) and apply one clear
fix so OpenAPI/client generation and the docstring are consistent.
For #1266