Skip to content

add a Python interface to the C++ grpc asyc client#1430

Open
tmckayus wants to merge 10 commits into
NVIDIA:mainfrom
tmckayus:feature/cuopt-grpc-async-client
Open

add a Python interface to the C++ grpc asyc client#1430
tmckayus wants to merge 10 commits into
NVIDIA:mainfrom
tmckayus:feature/cuopt-grpc-async-client

Conversation

@tmckayus

Copy link
Copy Markdown
Contributor

This lets Python use an async grpc client with Problem type in additon to existing blocking Problem.solve() with remote execution env vars set (zero-code-change remote solve)

For some scenarios, an async client with polling, or results fetched by a different process/thread, or job cancellation, etc is desirable. It also helps with agents that formulate problems and then want to cancel the job and modify the model.

This lets Python use an async grpc client with Problem type in
additon to existing blocking Problem.solve() with remote execution env
vars set (zero-code-change remote solve)
@tmckayus tmckayus requested review from a team as code owners June 12, 2026 15:40
@tmckayus

Copy link
Copy Markdown
Contributor Author

Check test_grpc_client.py to see usage. Adding user-facing documentation/example as a follow up on the PR

@tmckayus tmckayus added non-breaking Introduces a non-breaking change Feature feature request New feature or request labels Jun 12, 2026
@tmckayus

Copy link
Copy Markdown
Contributor Author

Also, some of the name-spacing around linear_programming should change in a future PR. The grpc client is not necessarily specific to linear programming, although it started there. We should move it later. For the moment, left it alone to reduce churn.

@coderabbitai

coderabbitai Bot commented Jun 12, 2026

Copy link
Copy Markdown

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a cuopt.grpc.numerical.Client Python/Cython gRPC client for remote LP/MIP job submission. Introduces a shared C++ env-config helper (grpc_client_env.cpp), a new Cython-facing grpc_python_client_t wrapper, and refactors solve_remote.cpp to use the shared config. Includes solver-wrapper solution-building helpers, Python package wiring, pytest fixtures for server lifecycle, and integration tests.

Changes

gRPC Python Client Implementation

Layer / File(s) Summary
C++ environment-driven gRPC config
cpp/include/cuopt/grpc/grpc_client_env.hpp, cpp/src/grpc/client/grpc_client_env.cpp
Declares and implements make_grpc_client_config and apply_grpc_client_env_overrides, reading CUOPT_GRPC_*/CUOPT_TLS_*/CUOPT_CHUNK_SIZE env vars to override chunk sizes, debug logging, and TLS/mTLS settings with validation.
solve_remote.cpp refactoring
cpp/src/grpc/client/solve_remote.cpp
Removes file-local parse_env_int64, PEM-reading helpers, and apply_env_overrides; adds #include <grpc_client_env.hpp>; updates both solve_lp_remote and solve_mip_remote to call apply_grpc_client_env_overrides.
C++ Cython-facing gRPC client wrapper
cpp/include/cuopt/grpc/cython_grpc_client.hpp, cpp/src/grpc/client/cython_grpc_client.cpp, cpp/CMakeLists.txt
Defines result structs/enums and grpc_python_client_t interface; implements submit (LP/MIP dispatch), polling wait, cancel/delete, result construction, log fetch/stream, and incumbent pagination; adds both .cpp files to the gRPC build.
Python solution building helpers
python/cuopt/cuopt/linear_programming/solver/solver_wrapper.pxd, python/cuopt/cuopt/linear_programming/solver/solver_wrapper.pyx
Adds _vars_dict with empty-solution and length-mismatch handling; refactors create_solution into create_solution_with_names; exports build_solution_from_unique_ptr and prepare_solver_settings for gRPC result conversion.
Cython bindings and Python Client
python/cuopt/cuopt/grpc/numerical/grpc_client.pxd, python/cuopt/cuopt/grpc/numerical/grpc_client.pyx
Declares all extern C++ types in .pxd; implements Client with lifecycle methods (submit/status/wait/cancel/delete/result), background-threaded log streaming with backfill, incumbent polling/streaming with callback dispatch, and _as_data_model/_is_mip normalization helpers.
Python package build and module integration
python/cuopt/cuopt/CMakeLists.txt, python/cuopt/cuopt/grpc/CMakeLists.txt, python/cuopt/cuopt/grpc/numerical/CMakeLists.txt, python/cuopt/cuopt/__init__.py, python/cuopt/cuopt/grpc/__init__.py, python/cuopt/cuopt/grpc/numerical/__init__.py
Adds CMake subdirectory chain and rapids_cython_create_modules target; registers grpc in _submodules for lazy loading; re-exports Client, GrpcError, JobNotReadyError, JobStatus from cuopt.grpc.numerical.
gRPC test server fixtures
python/cuopt/conftest.py, python/cuopt/cuopt/tests/fixtures/grpc_server_fixtures.py, python/cuopt/cuopt/tests/fixtures/__init__.py, python/cuopt/cuopt/tests/linear_programming/test_cpu_only_execution.py, python/cuopt/pyproject.toml
Adds pytest plugin registration, port offset constants, server binary discovery, TCP/gRPC readiness helpers, environment builders, server start/stop lifecycle, TLS port offset adjustment (+850), and --import-mode=importlib configuration.
gRPC client integration tests
python/cuopt/cuopt/tests/linear_programming/test_grpc_client.py
Tests LP submit/status/result/delete, log streaming and bulk log verification, JobNotReadyError before completion, MIP result, job cancellation with GrpcError on result, and MIP incumbent streaming with GetSolutionCallback.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 43.08% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: adding a Python interface to a C++ gRPC async client.
Description check ✅ Passed The description is directly related to the changeset, explaining the async gRPC client functionality and use cases.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

🧹 Nitpick comments (5)
cpp/include/cuopt/grpc/cython_grpc_client.hpp (1)

138-143: ⚡ Quick win

Fix the fetch_incumbents contract comment.

This API is shaped as a single paginated fetch (from_index, max_count, next_index, job_complete), but the comment says it polls until completion. The implementation in cpp/src/grpc/client/cython_grpc_client.cpp also performs just one get_incumbents call. Please document the actual one-page behavior so the Cython layer does not assume it blocks.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cpp/include/cuopt/grpc/cython_grpc_client.hpp` around lines 138 - 143, The
doc comment for fetch_incumbents is incorrect: it does not poll until job
completion but performs a single paginated fetch using the supplied from_index
and max_count and returns a grpc_incumbents_result_t that includes next_index
and job_complete flags; update the comment on fetch_incumbents to state it
issues one GetIncumbents RPC call that returns up to max_count incumbents
starting at from_index and indicates the next_index to continue pagination and
whether the job is complete so callers (e.g., the Cython layer) must loop if
they want to poll to completion.
python/cuopt/cuopt/tests/linear_programming/test_grpc_client.py (4)

27-38: ⚖️ Poor tradeoff

Polling logic may cause non-deterministic test behavior.

Lines 35-36 check if results are available during polling and return COMPLETED even if status() hasn't transitioned yet. This dual-check pattern can cause race conditions:

  • Result availability and status transitions may not be atomic
  • Different timing on CI vs local runs could produce different test paths
  • The C++ wait() (context snippet 2) polls status only until !is_in_flight

For deterministic tests, rely on client.wait() or poll status() alone, not result availability.

Simplify polling to use only status checks
 def _poll_until_complete(
     client, job_id, names, timeout=120, poll_interval=0.05
 ):
     deadline = time.monotonic() + timeout
     while time.monotonic() < deadline:
         status = client.status(job_id)
         if status not in (JobStatus.QUEUED, JobStatus.PROCESSING):
             return status
-        if client.result(job_id, names) is not None:
-            return JobStatus.COMPLETED
         time.sleep(poll_interval)
     return client.status(job_id)

Or use client.wait() directly instead of a custom helper.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@python/cuopt/cuopt/tests/linear_programming/test_grpc_client.py` around lines
27 - 38, The _poll_until_complete helper mixes status checks and result()
availability which can lead to racey, non-deterministic tests; update it to rely
solely on client.status (or replace its usage with client.wait) so state
transitions are the single source of truth. Specifically, change the logic in
_poll_until_complete to remove the client.result(job_id, names) check and only
return based on client.status(job_id) and JobStatus values (or call
client.wait(job_id, timeout) from tests instead of _poll_until_complete) to
match the C++ wait() behavior and avoid race conditions.

109-124: ⚡ Quick win

Validate that MIP solution variables are actually integers.

Line 121-123 fetch and validate the objective but don't check that x and y are integer-valued, which is the defining characteristic of a MIP solution.

Add integer validation
         solution = client.result(job_id, _MIP_NAMES)
         assert solution is not None
         assert solution.get_primal_objective() == pytest.approx(15.0, rel=1e-3)
+        vars_ = solution.get_vars()
+        assert vars_["x"] % 1 == 0, f"x={vars_['x']} is not integer"
+        assert vars_["y"] % 1 == 0, f"y={vars_['y']} is not integer"
         client.delete(job_id)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@python/cuopt/cuopt/tests/linear_programming/test_grpc_client.py` around lines
109 - 124, In test_mip_submit_and_result, after obtaining solution =
client.result(job_id, _MIP_NAMES) and asserting the objective, also validate
that the decision variables x and y are integer-valued: extract the primal
values for "x" and "y" from the solution (use the existing solution API such as
solution.get_primal_value("x") / solution.get_primal_value("y") or the
equivalent accessor used elsewhere), and assert each is integer within a small
tolerance (e.g., assert abs(val - round(val)) < 1e-6 or compare to
pytest.approx(round(val))). Keep these checks immediately after the objective
assertion and before client.delete(job_id).

70-92: ⚡ Quick win

Verify whether log backfill was needed.

Line 90 asserts length equality but doesn't check if streaming worked or if backfill was required. According to context snippet 4, join_log_stream() returns a state dict with "backfilled" and "live_lines" keys. To catch regressions where streaming silently fails and only backfill works, assert that live streaming actually received lines:

Add assertion on streaming success
         terminal = _poll_until_complete(client, job_id, _DEMO_LP_NAMES)
         assert terminal == JobStatus.COMPLETED
-        client.join_log_stream(job_id)
+        state = client.join_log_stream(job_id)
+        assert state["live_lines"] > 0, "Log streaming failed; only backfill worked"
 
         solution = client.result(job_id, _DEMO_LP_NAMES)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@python/cuopt/cuopt/tests/linear_programming/test_grpc_client.py` around lines
70 - 92, The test currently only compares received vs bulk logs length and
doesn't verify whether live streaming supplied any lines; after calling
client.join_log_stream(job_id) capture its return value (the state dict returned
by join_log_stream) and assert that the "live_lines" count is > 0 (and
optionally that "backfilled" is False if you want to ensure no backfill was
required) to ensure start_log_stream actually streamed messages; modify
test_submit_with_log_stream to store the result of join_log_stream and add an
assertion on state["live_lines"] > 0 (referencing test_submit_with_log_stream,
client.start_log_stream, client.join_log_stream and the
"live_lines"/"backfilled" keys).

126-168: ⚡ Quick win

Validate incumbent solution quality and progression.

Lines 162-164 assert that incumbents exist but don't validate their content or progression:

  • No check that incumbent objectives improve monotonically
  • No validation that the final solution's objective matches the best incumbent
  • No verification of incumbent variable values

Tests should validate numerical correctness, not just that callbacks fired.

Add incumbent validation
         assert collector.entries
+        # Verify objectives are non-decreasing (MAXIMIZE problem)
+        objectives = [e["cost"] for e in collector.entries]
+        for i in range(1, len(objectives)):
+            assert objectives[i] >= objectives[i-1], \
+                f"Incumbent regressed: {objectives[i-1]} -> {objectives[i]}"
+        
         bulk = client.incumbents(job_id)
         assert bulk
+        assert len(bulk) == len(collector.entries)
 
         solution = client.result(job_id, _MIP_NAMES)
         assert solution is not None
+        final_obj = solution.get_primal_objective()
+        best_incumbent = max(e["cost"] for e in collector.entries)
+        assert final_obj == pytest.approx(best_incumbent, rel=1e-9)
         client.delete(job_id)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@python/cuopt/cuopt/tests/linear_programming/test_grpc_client.py` around lines
126 - 168, The test_mip_incumbent_stream currently only asserts that incumbents
were received; update it to validate incumbent quality by (1) inspecting
collector.entries (populated by IncumbentCollector.get_solution) to ensure the
reported costs form a non-increasing sequence (monotonic improvement for
MAXIMIZE or non-decreasing for MINIMIZE), (2) compare the best incumbent cost
from collector.entries to the final solution objective returned by
client.result(job_id, _MIP_NAMES) to ensure they match (within a small numeric
tolerance), and (3) verify that the variable assignments in the final solution
equal the variable values in the best incumbent entry (or are consistent within
tolerance). Use client.incumbents(job_id) and collector.entries to find the best
incumbent and perform numeric comparisons rather than just asserting presence.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cpp/src/grpc/client/cython_grpc_client.cpp`:
- Around line 124-126: The wait() method (grpc_python_client_t::wait)
incorrectly treats timeout_seconds <= 0 as the blocking sentinel and forwards
negative values into impl_->client.wait_for_completion(job_id); change the logic
to only treat 0 as the blocking sentinel (per cython_grpc_client.hpp) and
validate negative values: if timeout_seconds < 0 return an appropriate error
status via map_status_result (or a mapped grpc_status_result_t indicating
invalid argument) instead of calling wait_for_completion; if timeout_seconds ==
0 call wait_for_completion(job_id) and otherwise call the timed API.

In `@cpp/src/grpc/client/grpc_client_env.cpp`:
- Around line 79-83: The make_grpc_client_config function must validate inputs
before composing server_address: check that host is non-empty and port is within
the valid TCP range (1..65535) and, if invalid, throw a std::invalid_argument
with a clear message (e.g., "invalid host" or "invalid port"). Move the
validation to the top of make_grpc_client_config (before building
config.server_address), then compose server_address as host + ":" +
std::to_string(port) and continue calling
apply_grpc_client_env_overrides(config); reference: make_grpc_client_config,
grpc_client_config_t, server_address, apply_grpc_client_env_overrides.
- Around line 70-75: The code silently skips client cert/key when only one env
var is set; update the logic in the grpc client env initialization (look around
get_env("CUOPT_TLS_CLIENT_CERT") / get_env("CUOPT_TLS_CLIENT_KEY") and the
branch that assigns config.tls_client_cert/config.tls_client_key using
read_pem_file) to validate mTLS input: if exactly one of CUOPT_TLS_CLIENT_CERT
or CUOPT_TLS_CLIENT_KEY is present, return an error (or abort initialization)
indicating the missing counterpart; only call read_pem_file and set
config.tls_client_cert and config.tls_client_key when both are present, and
leave both unset when both are absent.

In `@python/cuopt/cuopt/grpc/grpc_client.pyx`:
- Around line 282-298: The join_log_stream implementation currently pops the
thread handle before joining, losing the reference if join(timeout) times out;
change it to look up (but not pop) the thread from self._log_threads, call
thread.join(timeout), and only remove (pop) the thread from self._log_threads
when join returned (thread is not alive) so the thread can be joined/cleaned
later if the timeout expired; similarly only pop self._log_thread_errors and
self._log_stream_state after confirming the thread has stopped (or propagate
errors immediately if thread is known stopped), using the existing symbols
join_log_stream, _log_threads, _log_thread_errors, and _log_stream_state to
guide where to adjust the lookup/pop ordering and lifecycle checks.
- Around line 62-78: The C callback _invoke_log_callback currently swallows
exceptions; change it to capture the exception object and traceback and save
them into thread-shared state on userdata (e.g., set attributes like
userdata._log_error and userdata._log_traceback or push into a thread-safe
container referenced by userdata) instead of just returning 0 silently; still
stop the stream by returning 0 after storing the error. Then update
join_log_stream() (or the function that waits for the stream) to inspect that
shared userdata state after the stream ends and re-raise the original exception
(or raise a new exception that includes the stored traceback/message) so callers
see the real failure. Ensure the storage and inspection are done while holding
the GIL and use consistent keys/symbols (userdata, _invoke_log_callback,
_call_log_callback, join_log_stream) so the bridge and join code can find and
propagate the error.
- Around line 265-268: The wrapped() helper currently ignores the return value
of _call_log_callback so the user's stop signal is lost; modify wrapped (used by
start_log_stream) to capture the callback result (e.g., should_continue =
_call_log_callback(callback, line, job_complete)) and return that boolean (or
False on explicit stop) instead of discarding it so start_log_stream can honor
"False stops early" and stop the stream promptly.

In `@python/cuopt/cuopt/tests/linear_programming/grpc_server_fixtures.py`:
- Around line 115-122: Update stop_grpc_server to avoid signaling a process that
already exited: check proc.poll() first and if it is not None just call
proc.wait() (or reap exit code) instead of proc.send_signal(signal.SIGTERM);
only call proc.send_signal(signal.SIGTERM) and the subsequent
proc.wait(timeout=5)/proc.kill() logic when proc.poll() returns None (process
still running). Ensure the function references proc.poll, proc.send_signal,
proc.wait, and proc.kill as described so teardown won't convert a test crash
into a teardown error.
- Around line 82-100: The start_grpc_server function launches cuopt_grpc_server
with the inherited environment; call the test helper cpu_only_env() to get the
sanitized environment, pass it into subprocess.Popen via the env=... parameter,
and return the proc and the env tuple as the docstring promises; update the
Popen call in start_grpc_server (which locates the binary via find_grpc_server)
to include env=cpu_only_env() (store it in a variable like env) and return
(proc, env).

In `@python/cuopt/cuopt/tests/linear_programming/test_grpc_client.py`:
- Around line 41-168: The tests only cover happy paths; add non-happy-path unit
tests in python/cuopt/cuopt/tests/linear_programming/test_grpc_client.py that
exercise client.delete, client.result, and client.logs for error scenarios: (1)
submit jobs that produce infeasible/unbounded/empty outcomes (use or create
small problems analogous to _demo_lp_problem/_MIP_NAMES) and assert
result()/logs() return appropriate errors or None and JobStatus reflects
INFEASIBLE/UNBOUNDED/etc, (2) test deleting a job while it is JobStatus.QUEUED
and while JobStatus.PROCESSING then assert subsequent result(job_id, ...) and
logs(job_id) raise the expected errors or return None, (3) call
client.result/client.logs after client.delete and assert proper error behavior,
and (4) add tests for invalid job_id and unreachable server (e.g., use a bad
port or mock Client to simulate connection failure) asserting the client raises
the expected exceptions; reference existing symbols Client, job_id,
client.submit, client.delete, client.result, client.logs,
client.start_log_stream, client.start_incumbent_stream, JobStatus, and
JobNotReadyError to locate where to add these cases.

---

Nitpick comments:
In `@cpp/include/cuopt/grpc/cython_grpc_client.hpp`:
- Around line 138-143: The doc comment for fetch_incumbents is incorrect: it
does not poll until job completion but performs a single paginated fetch using
the supplied from_index and max_count and returns a grpc_incumbents_result_t
that includes next_index and job_complete flags; update the comment on
fetch_incumbents to state it issues one GetIncumbents RPC call that returns up
to max_count incumbents starting at from_index and indicates the next_index to
continue pagination and whether the job is complete so callers (e.g., the Cython
layer) must loop if they want to poll to completion.

In `@python/cuopt/cuopt/tests/linear_programming/test_grpc_client.py`:
- Around line 27-38: The _poll_until_complete helper mixes status checks and
result() availability which can lead to racey, non-deterministic tests; update
it to rely solely on client.status (or replace its usage with client.wait) so
state transitions are the single source of truth. Specifically, change the logic
in _poll_until_complete to remove the client.result(job_id, names) check and
only return based on client.status(job_id) and JobStatus values (or call
client.wait(job_id, timeout) from tests instead of _poll_until_complete) to
match the C++ wait() behavior and avoid race conditions.
- Around line 109-124: In test_mip_submit_and_result, after obtaining solution =
client.result(job_id, _MIP_NAMES) and asserting the objective, also validate
that the decision variables x and y are integer-valued: extract the primal
values for "x" and "y" from the solution (use the existing solution API such as
solution.get_primal_value("x") / solution.get_primal_value("y") or the
equivalent accessor used elsewhere), and assert each is integer within a small
tolerance (e.g., assert abs(val - round(val)) < 1e-6 or compare to
pytest.approx(round(val))). Keep these checks immediately after the objective
assertion and before client.delete(job_id).
- Around line 70-92: The test currently only compares received vs bulk logs
length and doesn't verify whether live streaming supplied any lines; after
calling client.join_log_stream(job_id) capture its return value (the state dict
returned by join_log_stream) and assert that the "live_lines" count is > 0 (and
optionally that "backfilled" is False if you want to ensure no backfill was
required) to ensure start_log_stream actually streamed messages; modify
test_submit_with_log_stream to store the result of join_log_stream and add an
assertion on state["live_lines"] > 0 (referencing test_submit_with_log_stream,
client.start_log_stream, client.join_log_stream and the
"live_lines"/"backfilled" keys).
- Around line 126-168: The test_mip_incumbent_stream currently only asserts that
incumbents were received; update it to validate incumbent quality by (1)
inspecting collector.entries (populated by IncumbentCollector.get_solution) to
ensure the reported costs form a non-increasing sequence (monotonic improvement
for MAXIMIZE or non-decreasing for MINIMIZE), (2) compare the best incumbent
cost from collector.entries to the final solution objective returned by
client.result(job_id, _MIP_NAMES) to ensure they match (within a small numeric
tolerance), and (3) verify that the variable assignments in the final solution
equal the variable values in the best incumbent entry (or are consistent within
tolerance). Use client.incumbents(job_id) and collector.entries to find the best
incumbent and perform numeric comparisons rather than just asserting presence.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 2d3732be-cc10-45f1-96b4-5aba3dad083c

📥 Commits

Reviewing files that changed from the base of the PR and between 1ec39b3 and 22d7297.

📒 Files selected for processing (17)
  • cpp/CMakeLists.txt
  • cpp/include/cuopt/grpc/cython_grpc_client.hpp
  • cpp/include/cuopt/grpc/grpc_client_env.hpp
  • cpp/src/grpc/client/cython_grpc_client.cpp
  • cpp/src/grpc/client/grpc_client_env.cpp
  • cpp/src/grpc/client/solve_remote.cpp
  • python/cuopt/cuopt/CMakeLists.txt
  • python/cuopt/cuopt/__init__.py
  • python/cuopt/cuopt/grpc/CMakeLists.txt
  • python/cuopt/cuopt/grpc/__init__.py
  • python/cuopt/cuopt/grpc/grpc_client.pxd
  • python/cuopt/cuopt/grpc/grpc_client.pyx
  • python/cuopt/cuopt/linear_programming/solver/solver_wrapper.pxd
  • python/cuopt/cuopt/linear_programming/solver/solver_wrapper.pyx
  • python/cuopt/cuopt/tests/linear_programming/conftest.py
  • python/cuopt/cuopt/tests/linear_programming/grpc_server_fixtures.py
  • python/cuopt/cuopt/tests/linear_programming/test_grpc_client.py

Comment thread cpp/src/grpc/client/cython_grpc_client.cpp Outdated
Comment thread cpp/src/grpc/client/grpc_client_env.cpp
Comment thread cpp/src/grpc/client/grpc_client_env.cpp
Comment thread python/cuopt/cuopt/grpc/numerical/grpc_client.pyx
Comment thread python/cuopt/cuopt/grpc/grpc_client.pyx Outdated
Comment thread python/cuopt/cuopt/grpc/numerical/grpc_client.pyx
Comment thread python/cuopt/cuopt/tests/fixtures/grpc_server_fixtures.py
Comment thread python/cuopt/cuopt/tests/fixtures/grpc_server_fixtures.py
Comment thread python/cuopt/cuopt/tests/linear_programming/test_grpc_client.py

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
python/cuopt/cuopt/grpc_server_fixtures.py (1)

35-37: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use unique port offsets to avoid cross-class server collisions.

Line 35 and Line 36 both resolve to offset 800, which can cause nondeterministic server bind failures when those classes execute concurrently.

Suggested minimal fix
 GRPC_PORT_OFFSET_CLIENT = 800
-GRPC_PORT_OFFSET_TLS = 800
+GRPC_PORT_OFFSET_TLS = 850
 GRPC_PORT_OFFSET_MTLS = 900

As per coding guidelines, **/*test*.{cpp,py} requires preventing flaky tests from race/timing-related behavior; shared fixed ports across classes make this fixture flaky under parallel execution.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@python/cuopt/cuopt/grpc_server_fixtures.py` around lines 35 - 37, The
constants GRPC_PORT_OFFSET_CLIENT and GRPC_PORT_OFFSET_TLS currently share the
same value (800), causing port collisions when fixtures run in parallel; change
GRPC_PORT_OFFSET_TLS to a unique, non-overlapping offset (for example 810 or
another unused value distinct from GRPC_PORT_OFFSET_CLIENT=800 and
GRPC_PORT_OFFSET_MTLS=900) and update any references to GRPC_PORT_OFFSET_TLS so
tests and server fixtures use the new constant.

Source: Coding guidelines

🧹 Nitpick comments (1)
python/cuopt/cuopt/grpc_server_fixtures.py (1)

40-167: ⚡ Quick win

Add type hints and complete API docstrings for the new public helpers.

The newly introduced public functions/fixture are missing type annotations, and current docstrings do not consistently document params/returns/raises. Please annotate signatures and expand docstring content accordingly.

As per coding guidelines, python/**/*.py requires type hints on new public functions/classes and docstring content covering params, returns, and raises.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@python/cuopt/cuopt/grpc_server_fixtures.py` around lines 40 - 167, The public
helpers find_grpc_server, wait_for_port, cpu_only_env, start_grpc_server,
stop_grpc_server and the grpc_server fixture need Python type annotations and
expanded docstrings that document parameters, return values and raised
exceptions; update each function signature to include appropriate types (e.g. ->
Optional[str] for find_grpc_server, port: int, timeout: float for wait_for_port
-> bool, cpu_only_env(port: int) -> Dict[str,str],
start_grpc_server(port_offset: int) -> Tuple[subprocess.Popen, Dict[str,str]],
stop_grpc_server(proc: subprocess.Popen) -> None, and annotate the fixture
return type in its typing comment or use pytest.GeneratorFixture/Iterator
typing), add necessary typing imports (Optional, Dict, Tuple) and expand each
docstring to list :param:, :return:, and :raises: entries for the conditions
already handled (e.g. pytest.skip, pytest.fail, subprocess.TimeoutExpired) so
the API is fully documented per project guidelines.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@python/cuopt/cuopt/grpc_server_fixtures.py`:
- Around line 35-37: The constants GRPC_PORT_OFFSET_CLIENT and
GRPC_PORT_OFFSET_TLS currently share the same value (800), causing port
collisions when fixtures run in parallel; change GRPC_PORT_OFFSET_TLS to a
unique, non-overlapping offset (for example 810 or another unused value distinct
from GRPC_PORT_OFFSET_CLIENT=800 and GRPC_PORT_OFFSET_MTLS=900) and update any
references to GRPC_PORT_OFFSET_TLS so tests and server fixtures use the new
constant.

---

Nitpick comments:
In `@python/cuopt/cuopt/grpc_server_fixtures.py`:
- Around line 40-167: The public helpers find_grpc_server, wait_for_port,
cpu_only_env, start_grpc_server, stop_grpc_server and the grpc_server fixture
need Python type annotations and expanded docstrings that document parameters,
return values and raised exceptions; update each function signature to include
appropriate types (e.g. -> Optional[str] for find_grpc_server, port: int,
timeout: float for wait_for_port -> bool, cpu_only_env(port: int) ->
Dict[str,str], start_grpc_server(port_offset: int) -> Tuple[subprocess.Popen,
Dict[str,str]], stop_grpc_server(proc: subprocess.Popen) -> None, and annotate
the fixture return type in its typing comment or use
pytest.GeneratorFixture/Iterator typing), add necessary typing imports
(Optional, Dict, Tuple) and expand each docstring to list :param:, :return:, and
:raises: entries for the conditions already handled (e.g. pytest.skip,
pytest.fail, subprocess.TimeoutExpired) so the API is fully documented per
project guidelines.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 023d9ed7-2291-4319-be01-e85bec5e7230

📥 Commits

Reviewing files that changed from the base of the PR and between 22d7297 and a023c81.

📒 Files selected for processing (6)
  • cpp/src/grpc/client/cython_grpc_client.cpp
  • cpp/src/grpc/client/grpc_client_env.cpp
  • python/cuopt/conftest.py
  • python/cuopt/cuopt/grpc/grpc_client.pyx
  • python/cuopt/cuopt/grpc_server_fixtures.py
  • python/cuopt/cuopt/tests/linear_programming/test_grpc_client.py
🚧 Files skipped from review as they are similar to previous changes (4)
  • python/cuopt/cuopt/tests/linear_programming/test_grpc_client.py
  • cpp/src/grpc/client/grpc_client_env.cpp
  • cpp/src/grpc/client/cython_grpc_client.cpp
  • python/cuopt/cuopt/grpc/grpc_client.pyx

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
python/cuopt/cuopt/grpc_server_fixtures.py (1)

35-36: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use unique port offsets for each test class configuration.

Line 35 and Line 36 both use 800, which can make grpc_server classes contend for the same port and cause intermittent startup failures under parallel execution.

Suggested fix
 GRPC_PORT_OFFSET_CLIENT = 800
-GRPC_PORT_OFFSET_TLS = 800
+GRPC_PORT_OFFSET_TLS = 850
 GRPC_PORT_OFFSET_MTLS = 900
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@python/cuopt/cuopt/grpc_server_fixtures.py` around lines 35 - 36,
GRPC_PORT_OFFSET_CLIENT and GRPC_PORT_OFFSET_TLS are both set to 800 causing
port collisions when grpc_server fixtures run in parallel; change one constant
(e.g., GRPC_PORT_OFFSET_TLS) to a different unique offset value (or pick
distinct offsets for both) and update any usages so each test class
configuration uses a distinct port range to avoid startup contention.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@python/cuopt/cuopt/grpc_server_fixtures.py`:
- Around line 35-36: GRPC_PORT_OFFSET_CLIENT and GRPC_PORT_OFFSET_TLS are both
set to 800 causing port collisions when grpc_server fixtures run in parallel;
change one constant (e.g., GRPC_PORT_OFFSET_TLS) to a different unique offset
value (or pick distinct offsets for both) and update any usages so each test
class configuration uses a distinct port range to avoid startup contention.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 89213f16-bf9e-4b36-aab3-98f7a98d35dd

📥 Commits

Reviewing files that changed from the base of the PR and between a023c81 and 081e193.

📒 Files selected for processing (2)
  • python/cuopt/cuopt/grpc_server_fixtures.py
  • python/cuopt/pyproject.toml

@github-actions

Copy link
Copy Markdown

🔔 Hi @anandhkb, this pull request has had no activity for 7 days. Please update or let us know if it can be closed. Thank you!

If this is an "epic" issue, then please add the "epic" label to this issue.
If it is a PR and not ready for review, then please convert this to draft.
If you just want to switch off this notification, then use the "skip inactivity reminder" label.

Comment thread python/cuopt/cuopt/grpc/numerical/grpc_client.pxd
Comment thread python/cuopt/cuopt/tests/fixtures/grpc_server_fixtures.py
Comment thread python/cuopt/conftest.py Outdated
also move grpc server fixtures under test

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
python/cuopt/cuopt/grpc/numerical/grpc_client.pyx (2)

450-457: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Preserve incumbent thread ownership when join() times out.

At Line 452, join_incumbent_stream() removes the thread from _incumbent_threads before confirming it has stopped. If thread.join(timeout) times out, the thread can continue running but becomes untracked, so later cleanup/error propagation is lost.

Suggested fix
 def join_incumbent_stream(self, str job_id, timeout=None):
     """Wait for a background incumbent poll started by :meth:`start_incumbent_stream`."""
-    thread = self._incumbent_threads.pop(job_id, None)
+    thread = self._incumbent_threads.get(job_id)
     if thread is not None:
         thread.join(timeout)
+        if thread.is_alive():
+            exc = self._incumbent_thread_errors.get(job_id)
+            if exc is not None:
+                raise exc
+            return
+        self._incumbent_threads.pop(job_id, None)
     exc = self._incumbent_thread_errors.pop(job_id, None)
     if exc is not None:
         raise exc
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@python/cuopt/cuopt/grpc/numerical/grpc_client.pyx` around lines 450 - 457,
The join_incumbent_stream method removes the thread from _incumbent_threads
dictionary before verifying that thread.join(timeout) has actually completed. If
join times out, the thread continues running but becomes untracked, causing loss
of cleanup and error propagation. Modify the logic to only pop the thread from
_incumbent_threads after confirming the join completed successfully, or defer
the removal until after checking that the thread has actually stopped. This
ensures the thread remains registered if the join operation times out, allowing
proper tracking and later cleanup.

225-230: 🎯 Functional Correctness | 🔴 Critical

Confirm MIP result retrieval handles fresh Client instances correctly, and add regression test.

The result() method defaults fetch_mip to False for unknown job IDs because _job_is_mip is only populated during submit() on the same Client instance (line 228). When a fresh Client calls result() for a MIP job without explicitly passing is_mip=True, it will call get_lp_result() instead of get_mip_result(). Since the server's ResultResponse uses a oneof between lp_solution and mip_solution, accessing the wrong field will silently return an empty result rather than the actual MIP solution.

Existing tests (e.g., test_mip_submit_and_result) only cover same-Client-instance retrieval where _job_is_mip preserves the flag. Add a test that creates a fresh Client instance to retrieve a previously submitted MIP job's result without passing is_mip=True, verifying that either an error is raised or the method documents that callers must pass is_mip=True for cross-process/thread retrieval.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@python/cuopt/cuopt/grpc/numerical/grpc_client.pyx` around lines 225 - 230,
The result() method in grpc_client.pyx has a limitation where it defaults
fetch_mip to False when a job_id is not found in the _job_is_mip dictionary,
which only gets populated during submit() calls on the same Client instance.
This causes MIP results to be incorrectly retrieved as LP results when calling
result() from a fresh Client instance without explicitly passing is_mip=True.
Add a regression test that demonstrates this scenario by submitting a MIP job
with one Client instance and then attempting to retrieve its result from a
different fresh Client instance without passing is_mip=True, verifying that the
method either raises an appropriate error or documents that callers must
explicitly pass is_mip=True for cross-instance result retrieval. Additionally,
consider adding validation in the result() method to detect this condition and
either raise an error or warn users about the requirement.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@python/cuopt/cuopt/tests/linear_programming/test_grpc_client.py`:
- Around line 54-58: The assertion `assert "numerical" in (grpc_pkg.__doc__ or
"")` in the test_grpc_package_is_namespace_only function is brittle because it
couples the test to documentation wording rather than actual package structure.
Replace this docstring check with a more robust assertion that directly verifies
"numerical" exists as an attribute or subpackage in grpc_pkg (such as checking
if it's in grpc_pkg.__dict__ or using hasattr), so the test validates the actual
API structure instead of documentation content.
- Around line 160-168: The test_cancel_job function is flaky because it relies
on timing assumptions that the job will still be in QUEUED or PROCESSING state
when cancel() is called, but on faster machines the job can complete before the
cancel executes. Mock or stub the client.status() method to return a consistent
QUEUED or PROCESSING state when called before the cancel() operation, rather
than relying on actual job execution timing, to eliminate the race condition and
ensure the test deterministically validates the cancel functionality and the
resulting CANCELLED state.

---

Outside diff comments:
In `@python/cuopt/cuopt/grpc/numerical/grpc_client.pyx`:
- Around line 450-457: The join_incumbent_stream method removes the thread from
_incumbent_threads dictionary before verifying that thread.join(timeout) has
actually completed. If join times out, the thread continues running but becomes
untracked, causing loss of cleanup and error propagation. Modify the logic to
only pop the thread from _incumbent_threads after confirming the join completed
successfully, or defer the removal until after checking that the thread has
actually stopped. This ensures the thread remains registered if the join
operation times out, allowing proper tracking and later cleanup.
- Around line 225-230: The result() method in grpc_client.pyx has a limitation
where it defaults fetch_mip to False when a job_id is not found in the
_job_is_mip dictionary, which only gets populated during submit() calls on the
same Client instance. This causes MIP results to be incorrectly retrieved as LP
results when calling result() from a fresh Client instance without explicitly
passing is_mip=True. Add a regression test that demonstrates this scenario by
submitting a MIP job with one Client instance and then attempting to retrieve
its result from a different fresh Client instance without passing is_mip=True,
verifying that the method either raises an appropriate error or documents that
callers must explicitly pass is_mip=True for cross-instance result retrieval.
Additionally, consider adding validation in the result() method to detect this
condition and either raise an error or warn users about the requirement.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 59352653-b9d7-4cd8-9051-15a064cd6618

📥 Commits

Reviewing files that changed from the base of the PR and between 351e194 and 671f1da.

📒 Files selected for processing (10)
  • python/cuopt/conftest.py
  • python/cuopt/cuopt/grpc/CMakeLists.txt
  • python/cuopt/cuopt/grpc/__init__.py
  • python/cuopt/cuopt/grpc/numerical/CMakeLists.txt
  • python/cuopt/cuopt/grpc/numerical/__init__.py
  • python/cuopt/cuopt/grpc/numerical/grpc_client.pxd
  • python/cuopt/cuopt/grpc/numerical/grpc_client.pyx
  • python/cuopt/cuopt/tests/fixtures/__init__.py
  • python/cuopt/cuopt/tests/fixtures/grpc_server_fixtures.py
  • python/cuopt/cuopt/tests/linear_programming/test_grpc_client.py
💤 Files with no reviewable changes (1)
  • python/cuopt/cuopt/grpc/numerical/grpc_client.pxd
✅ Files skipped from review due to trivial changes (3)
  • python/cuopt/cuopt/grpc/init.py
  • python/cuopt/cuopt/tests/fixtures/init.py
  • python/cuopt/conftest.py

Comment thread python/cuopt/cuopt/tests/linear_programming/test_grpc_client.py
Comment thread python/cuopt/cuopt/tests/linear_programming/test_grpc_client.py Outdated
@tmckayus tmckayus requested a review from ramakrishnap-nv June 23, 2026 19:57

@ramakrishnap-nv ramakrishnap-nv left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest looks good, minor suggestions. And also lets create an issue for the follow-up PR for docs.

Comment thread cpp/include/cuopt/grpc/cython_grpc_client.hpp Outdated
Comment thread cpp/include/cuopt/grpc/grpc_client_env.hpp Outdated
Comment thread python/cuopt/pyproject.toml
tmckayus and others added 2 commits June 26, 2026 12:21
Co-authored-by: Ramakrishna Prabhu <42624703+ramakrishnap-nv@users.noreply.github.com>
Co-authored-by: Ramakrishna Prabhu <42624703+ramakrishnap-nv@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature request New feature or request Feature non-breaking Introduces a non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants