add a Python interface to the C++ grpc asyc client#1430
Conversation
This lets Python use an async grpc client with Problem type in additon to existing blocking Problem.solve() with remote execution env vars set (zero-code-change remote solve)
|
Check test_grpc_client.py to see usage. Adding user-facing documentation/example as a follow up on the PR |
|
Also, some of the name-spacing around linear_programming should change in a future PR. The grpc client is not necessarily specific to linear programming, although it started there. We should move it later. For the moment, left it alone to reduce churn. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a ChangesgRPC Python Client Implementation
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 9
🧹 Nitpick comments (5)
cpp/include/cuopt/grpc/cython_grpc_client.hpp (1)
138-143: ⚡ Quick winFix the
fetch_incumbentscontract comment.This API is shaped as a single paginated fetch (
from_index,max_count,next_index,job_complete), but the comment says it polls until completion. The implementation incpp/src/grpc/client/cython_grpc_client.cppalso performs just oneget_incumbentscall. Please document the actual one-page behavior so the Cython layer does not assume it blocks.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cpp/include/cuopt/grpc/cython_grpc_client.hpp` around lines 138 - 143, The doc comment for fetch_incumbents is incorrect: it does not poll until job completion but performs a single paginated fetch using the supplied from_index and max_count and returns a grpc_incumbents_result_t that includes next_index and job_complete flags; update the comment on fetch_incumbents to state it issues one GetIncumbents RPC call that returns up to max_count incumbents starting at from_index and indicates the next_index to continue pagination and whether the job is complete so callers (e.g., the Cython layer) must loop if they want to poll to completion.python/cuopt/cuopt/tests/linear_programming/test_grpc_client.py (4)
27-38: ⚖️ Poor tradeoffPolling logic may cause non-deterministic test behavior.
Lines 35-36 check if results are available during polling and return
COMPLETEDeven ifstatus()hasn't transitioned yet. This dual-check pattern can cause race conditions:
- Result availability and status transitions may not be atomic
- Different timing on CI vs local runs could produce different test paths
- The C++
wait()(context snippet 2) polls status only until!is_in_flightFor deterministic tests, rely on
client.wait()or pollstatus()alone, not result availability.Simplify polling to use only status checks
def _poll_until_complete( client, job_id, names, timeout=120, poll_interval=0.05 ): deadline = time.monotonic() + timeout while time.monotonic() < deadline: status = client.status(job_id) if status not in (JobStatus.QUEUED, JobStatus.PROCESSING): return status - if client.result(job_id, names) is not None: - return JobStatus.COMPLETED time.sleep(poll_interval) return client.status(job_id)Or use
client.wait()directly instead of a custom helper.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@python/cuopt/cuopt/tests/linear_programming/test_grpc_client.py` around lines 27 - 38, The _poll_until_complete helper mixes status checks and result() availability which can lead to racey, non-deterministic tests; update it to rely solely on client.status (or replace its usage with client.wait) so state transitions are the single source of truth. Specifically, change the logic in _poll_until_complete to remove the client.result(job_id, names) check and only return based on client.status(job_id) and JobStatus values (or call client.wait(job_id, timeout) from tests instead of _poll_until_complete) to match the C++ wait() behavior and avoid race conditions.
109-124: ⚡ Quick winValidate that MIP solution variables are actually integers.
Line 121-123 fetch and validate the objective but don't check that
xandyare integer-valued, which is the defining characteristic of a MIP solution.Add integer validation
solution = client.result(job_id, _MIP_NAMES) assert solution is not None assert solution.get_primal_objective() == pytest.approx(15.0, rel=1e-3) + vars_ = solution.get_vars() + assert vars_["x"] % 1 == 0, f"x={vars_['x']} is not integer" + assert vars_["y"] % 1 == 0, f"y={vars_['y']} is not integer" client.delete(job_id)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@python/cuopt/cuopt/tests/linear_programming/test_grpc_client.py` around lines 109 - 124, In test_mip_submit_and_result, after obtaining solution = client.result(job_id, _MIP_NAMES) and asserting the objective, also validate that the decision variables x and y are integer-valued: extract the primal values for "x" and "y" from the solution (use the existing solution API such as solution.get_primal_value("x") / solution.get_primal_value("y") or the equivalent accessor used elsewhere), and assert each is integer within a small tolerance (e.g., assert abs(val - round(val)) < 1e-6 or compare to pytest.approx(round(val))). Keep these checks immediately after the objective assertion and before client.delete(job_id).
70-92: ⚡ Quick winVerify whether log backfill was needed.
Line 90 asserts length equality but doesn't check if streaming worked or if backfill was required. According to context snippet 4,
join_log_stream()returns a state dict with"backfilled"and"live_lines"keys. To catch regressions where streaming silently fails and only backfill works, assert that live streaming actually received lines:Add assertion on streaming success
terminal = _poll_until_complete(client, job_id, _DEMO_LP_NAMES) assert terminal == JobStatus.COMPLETED - client.join_log_stream(job_id) + state = client.join_log_stream(job_id) + assert state["live_lines"] > 0, "Log streaming failed; only backfill worked" solution = client.result(job_id, _DEMO_LP_NAMES)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@python/cuopt/cuopt/tests/linear_programming/test_grpc_client.py` around lines 70 - 92, The test currently only compares received vs bulk logs length and doesn't verify whether live streaming supplied any lines; after calling client.join_log_stream(job_id) capture its return value (the state dict returned by join_log_stream) and assert that the "live_lines" count is > 0 (and optionally that "backfilled" is False if you want to ensure no backfill was required) to ensure start_log_stream actually streamed messages; modify test_submit_with_log_stream to store the result of join_log_stream and add an assertion on state["live_lines"] > 0 (referencing test_submit_with_log_stream, client.start_log_stream, client.join_log_stream and the "live_lines"/"backfilled" keys).
126-168: ⚡ Quick winValidate incumbent solution quality and progression.
Lines 162-164 assert that incumbents exist but don't validate their content or progression:
- No check that incumbent objectives improve monotonically
- No validation that the final solution's objective matches the best incumbent
- No verification of incumbent variable values
Tests should validate numerical correctness, not just that callbacks fired.
Add incumbent validation
assert collector.entries + # Verify objectives are non-decreasing (MAXIMIZE problem) + objectives = [e["cost"] for e in collector.entries] + for i in range(1, len(objectives)): + assert objectives[i] >= objectives[i-1], \ + f"Incumbent regressed: {objectives[i-1]} -> {objectives[i]}" + bulk = client.incumbents(job_id) assert bulk + assert len(bulk) == len(collector.entries) solution = client.result(job_id, _MIP_NAMES) assert solution is not None + final_obj = solution.get_primal_objective() + best_incumbent = max(e["cost"] for e in collector.entries) + assert final_obj == pytest.approx(best_incumbent, rel=1e-9) client.delete(job_id)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@python/cuopt/cuopt/tests/linear_programming/test_grpc_client.py` around lines 126 - 168, The test_mip_incumbent_stream currently only asserts that incumbents were received; update it to validate incumbent quality by (1) inspecting collector.entries (populated by IncumbentCollector.get_solution) to ensure the reported costs form a non-increasing sequence (monotonic improvement for MAXIMIZE or non-decreasing for MINIMIZE), (2) compare the best incumbent cost from collector.entries to the final solution objective returned by client.result(job_id, _MIP_NAMES) to ensure they match (within a small numeric tolerance), and (3) verify that the variable assignments in the final solution equal the variable values in the best incumbent entry (or are consistent within tolerance). Use client.incumbents(job_id) and collector.entries to find the best incumbent and perform numeric comparisons rather than just asserting presence.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@cpp/src/grpc/client/cython_grpc_client.cpp`:
- Around line 124-126: The wait() method (grpc_python_client_t::wait)
incorrectly treats timeout_seconds <= 0 as the blocking sentinel and forwards
negative values into impl_->client.wait_for_completion(job_id); change the logic
to only treat 0 as the blocking sentinel (per cython_grpc_client.hpp) and
validate negative values: if timeout_seconds < 0 return an appropriate error
status via map_status_result (or a mapped grpc_status_result_t indicating
invalid argument) instead of calling wait_for_completion; if timeout_seconds ==
0 call wait_for_completion(job_id) and otherwise call the timed API.
In `@cpp/src/grpc/client/grpc_client_env.cpp`:
- Around line 79-83: The make_grpc_client_config function must validate inputs
before composing server_address: check that host is non-empty and port is within
the valid TCP range (1..65535) and, if invalid, throw a std::invalid_argument
with a clear message (e.g., "invalid host" or "invalid port"). Move the
validation to the top of make_grpc_client_config (before building
config.server_address), then compose server_address as host + ":" +
std::to_string(port) and continue calling
apply_grpc_client_env_overrides(config); reference: make_grpc_client_config,
grpc_client_config_t, server_address, apply_grpc_client_env_overrides.
- Around line 70-75: The code silently skips client cert/key when only one env
var is set; update the logic in the grpc client env initialization (look around
get_env("CUOPT_TLS_CLIENT_CERT") / get_env("CUOPT_TLS_CLIENT_KEY") and the
branch that assigns config.tls_client_cert/config.tls_client_key using
read_pem_file) to validate mTLS input: if exactly one of CUOPT_TLS_CLIENT_CERT
or CUOPT_TLS_CLIENT_KEY is present, return an error (or abort initialization)
indicating the missing counterpart; only call read_pem_file and set
config.tls_client_cert and config.tls_client_key when both are present, and
leave both unset when both are absent.
In `@python/cuopt/cuopt/grpc/grpc_client.pyx`:
- Around line 282-298: The join_log_stream implementation currently pops the
thread handle before joining, losing the reference if join(timeout) times out;
change it to look up (but not pop) the thread from self._log_threads, call
thread.join(timeout), and only remove (pop) the thread from self._log_threads
when join returned (thread is not alive) so the thread can be joined/cleaned
later if the timeout expired; similarly only pop self._log_thread_errors and
self._log_stream_state after confirming the thread has stopped (or propagate
errors immediately if thread is known stopped), using the existing symbols
join_log_stream, _log_threads, _log_thread_errors, and _log_stream_state to
guide where to adjust the lookup/pop ordering and lifecycle checks.
- Around line 62-78: The C callback _invoke_log_callback currently swallows
exceptions; change it to capture the exception object and traceback and save
them into thread-shared state on userdata (e.g., set attributes like
userdata._log_error and userdata._log_traceback or push into a thread-safe
container referenced by userdata) instead of just returning 0 silently; still
stop the stream by returning 0 after storing the error. Then update
join_log_stream() (or the function that waits for the stream) to inspect that
shared userdata state after the stream ends and re-raise the original exception
(or raise a new exception that includes the stored traceback/message) so callers
see the real failure. Ensure the storage and inspection are done while holding
the GIL and use consistent keys/symbols (userdata, _invoke_log_callback,
_call_log_callback, join_log_stream) so the bridge and join code can find and
propagate the error.
- Around line 265-268: The wrapped() helper currently ignores the return value
of _call_log_callback so the user's stop signal is lost; modify wrapped (used by
start_log_stream) to capture the callback result (e.g., should_continue =
_call_log_callback(callback, line, job_complete)) and return that boolean (or
False on explicit stop) instead of discarding it so start_log_stream can honor
"False stops early" and stop the stream promptly.
In `@python/cuopt/cuopt/tests/linear_programming/grpc_server_fixtures.py`:
- Around line 115-122: Update stop_grpc_server to avoid signaling a process that
already exited: check proc.poll() first and if it is not None just call
proc.wait() (or reap exit code) instead of proc.send_signal(signal.SIGTERM);
only call proc.send_signal(signal.SIGTERM) and the subsequent
proc.wait(timeout=5)/proc.kill() logic when proc.poll() returns None (process
still running). Ensure the function references proc.poll, proc.send_signal,
proc.wait, and proc.kill as described so teardown won't convert a test crash
into a teardown error.
- Around line 82-100: The start_grpc_server function launches cuopt_grpc_server
with the inherited environment; call the test helper cpu_only_env() to get the
sanitized environment, pass it into subprocess.Popen via the env=... parameter,
and return the proc and the env tuple as the docstring promises; update the
Popen call in start_grpc_server (which locates the binary via find_grpc_server)
to include env=cpu_only_env() (store it in a variable like env) and return
(proc, env).
In `@python/cuopt/cuopt/tests/linear_programming/test_grpc_client.py`:
- Around line 41-168: The tests only cover happy paths; add non-happy-path unit
tests in python/cuopt/cuopt/tests/linear_programming/test_grpc_client.py that
exercise client.delete, client.result, and client.logs for error scenarios: (1)
submit jobs that produce infeasible/unbounded/empty outcomes (use or create
small problems analogous to _demo_lp_problem/_MIP_NAMES) and assert
result()/logs() return appropriate errors or None and JobStatus reflects
INFEASIBLE/UNBOUNDED/etc, (2) test deleting a job while it is JobStatus.QUEUED
and while JobStatus.PROCESSING then assert subsequent result(job_id, ...) and
logs(job_id) raise the expected errors or return None, (3) call
client.result/client.logs after client.delete and assert proper error behavior,
and (4) add tests for invalid job_id and unreachable server (e.g., use a bad
port or mock Client to simulate connection failure) asserting the client raises
the expected exceptions; reference existing symbols Client, job_id,
client.submit, client.delete, client.result, client.logs,
client.start_log_stream, client.start_incumbent_stream, JobStatus, and
JobNotReadyError to locate where to add these cases.
---
Nitpick comments:
In `@cpp/include/cuopt/grpc/cython_grpc_client.hpp`:
- Around line 138-143: The doc comment for fetch_incumbents is incorrect: it
does not poll until job completion but performs a single paginated fetch using
the supplied from_index and max_count and returns a grpc_incumbents_result_t
that includes next_index and job_complete flags; update the comment on
fetch_incumbents to state it issues one GetIncumbents RPC call that returns up
to max_count incumbents starting at from_index and indicates the next_index to
continue pagination and whether the job is complete so callers (e.g., the Cython
layer) must loop if they want to poll to completion.
In `@python/cuopt/cuopt/tests/linear_programming/test_grpc_client.py`:
- Around line 27-38: The _poll_until_complete helper mixes status checks and
result() availability which can lead to racey, non-deterministic tests; update
it to rely solely on client.status (or replace its usage with client.wait) so
state transitions are the single source of truth. Specifically, change the logic
in _poll_until_complete to remove the client.result(job_id, names) check and
only return based on client.status(job_id) and JobStatus values (or call
client.wait(job_id, timeout) from tests instead of _poll_until_complete) to
match the C++ wait() behavior and avoid race conditions.
- Around line 109-124: In test_mip_submit_and_result, after obtaining solution =
client.result(job_id, _MIP_NAMES) and asserting the objective, also validate
that the decision variables x and y are integer-valued: extract the primal
values for "x" and "y" from the solution (use the existing solution API such as
solution.get_primal_value("x") / solution.get_primal_value("y") or the
equivalent accessor used elsewhere), and assert each is integer within a small
tolerance (e.g., assert abs(val - round(val)) < 1e-6 or compare to
pytest.approx(round(val))). Keep these checks immediately after the objective
assertion and before client.delete(job_id).
- Around line 70-92: The test currently only compares received vs bulk logs
length and doesn't verify whether live streaming supplied any lines; after
calling client.join_log_stream(job_id) capture its return value (the state dict
returned by join_log_stream) and assert that the "live_lines" count is > 0 (and
optionally that "backfilled" is False if you want to ensure no backfill was
required) to ensure start_log_stream actually streamed messages; modify
test_submit_with_log_stream to store the result of join_log_stream and add an
assertion on state["live_lines"] > 0 (referencing test_submit_with_log_stream,
client.start_log_stream, client.join_log_stream and the
"live_lines"/"backfilled" keys).
- Around line 126-168: The test_mip_incumbent_stream currently only asserts that
incumbents were received; update it to validate incumbent quality by (1)
inspecting collector.entries (populated by IncumbentCollector.get_solution) to
ensure the reported costs form a non-increasing sequence (monotonic improvement
for MAXIMIZE or non-decreasing for MINIMIZE), (2) compare the best incumbent
cost from collector.entries to the final solution objective returned by
client.result(job_id, _MIP_NAMES) to ensure they match (within a small numeric
tolerance), and (3) verify that the variable assignments in the final solution
equal the variable values in the best incumbent entry (or are consistent within
tolerance). Use client.incumbents(job_id) and collector.entries to find the best
incumbent and perform numeric comparisons rather than just asserting presence.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 2d3732be-cc10-45f1-96b4-5aba3dad083c
📒 Files selected for processing (17)
cpp/CMakeLists.txtcpp/include/cuopt/grpc/cython_grpc_client.hppcpp/include/cuopt/grpc/grpc_client_env.hppcpp/src/grpc/client/cython_grpc_client.cppcpp/src/grpc/client/grpc_client_env.cppcpp/src/grpc/client/solve_remote.cpppython/cuopt/cuopt/CMakeLists.txtpython/cuopt/cuopt/__init__.pypython/cuopt/cuopt/grpc/CMakeLists.txtpython/cuopt/cuopt/grpc/__init__.pypython/cuopt/cuopt/grpc/grpc_client.pxdpython/cuopt/cuopt/grpc/grpc_client.pyxpython/cuopt/cuopt/linear_programming/solver/solver_wrapper.pxdpython/cuopt/cuopt/linear_programming/solver/solver_wrapper.pyxpython/cuopt/cuopt/tests/linear_programming/conftest.pypython/cuopt/cuopt/tests/linear_programming/grpc_server_fixtures.pypython/cuopt/cuopt/tests/linear_programming/test_grpc_client.py
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
python/cuopt/cuopt/grpc_server_fixtures.py (1)
35-37:⚠️ Potential issue | 🟠 Major | ⚡ Quick winUse unique port offsets to avoid cross-class server collisions.
Line 35 and Line 36 both resolve to offset
800, which can cause nondeterministic server bind failures when those classes execute concurrently.Suggested minimal fix
GRPC_PORT_OFFSET_CLIENT = 800 -GRPC_PORT_OFFSET_TLS = 800 +GRPC_PORT_OFFSET_TLS = 850 GRPC_PORT_OFFSET_MTLS = 900As per coding guidelines,
**/*test*.{cpp,py}requires preventing flaky tests from race/timing-related behavior; shared fixed ports across classes make this fixture flaky under parallel execution.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@python/cuopt/cuopt/grpc_server_fixtures.py` around lines 35 - 37, The constants GRPC_PORT_OFFSET_CLIENT and GRPC_PORT_OFFSET_TLS currently share the same value (800), causing port collisions when fixtures run in parallel; change GRPC_PORT_OFFSET_TLS to a unique, non-overlapping offset (for example 810 or another unused value distinct from GRPC_PORT_OFFSET_CLIENT=800 and GRPC_PORT_OFFSET_MTLS=900) and update any references to GRPC_PORT_OFFSET_TLS so tests and server fixtures use the new constant.Source: Coding guidelines
🧹 Nitpick comments (1)
python/cuopt/cuopt/grpc_server_fixtures.py (1)
40-167: ⚡ Quick winAdd type hints and complete API docstrings for the new public helpers.
The newly introduced public functions/fixture are missing type annotations, and current docstrings do not consistently document params/returns/raises. Please annotate signatures and expand docstring content accordingly.
As per coding guidelines,
python/**/*.pyrequires type hints on new public functions/classes and docstring content covering params, returns, and raises.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@python/cuopt/cuopt/grpc_server_fixtures.py` around lines 40 - 167, The public helpers find_grpc_server, wait_for_port, cpu_only_env, start_grpc_server, stop_grpc_server and the grpc_server fixture need Python type annotations and expanded docstrings that document parameters, return values and raised exceptions; update each function signature to include appropriate types (e.g. -> Optional[str] for find_grpc_server, port: int, timeout: float for wait_for_port -> bool, cpu_only_env(port: int) -> Dict[str,str], start_grpc_server(port_offset: int) -> Tuple[subprocess.Popen, Dict[str,str]], stop_grpc_server(proc: subprocess.Popen) -> None, and annotate the fixture return type in its typing comment or use pytest.GeneratorFixture/Iterator typing), add necessary typing imports (Optional, Dict, Tuple) and expand each docstring to list :param:, :return:, and :raises: entries for the conditions already handled (e.g. pytest.skip, pytest.fail, subprocess.TimeoutExpired) so the API is fully documented per project guidelines.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@python/cuopt/cuopt/grpc_server_fixtures.py`:
- Around line 35-37: The constants GRPC_PORT_OFFSET_CLIENT and
GRPC_PORT_OFFSET_TLS currently share the same value (800), causing port
collisions when fixtures run in parallel; change GRPC_PORT_OFFSET_TLS to a
unique, non-overlapping offset (for example 810 or another unused value distinct
from GRPC_PORT_OFFSET_CLIENT=800 and GRPC_PORT_OFFSET_MTLS=900) and update any
references to GRPC_PORT_OFFSET_TLS so tests and server fixtures use the new
constant.
---
Nitpick comments:
In `@python/cuopt/cuopt/grpc_server_fixtures.py`:
- Around line 40-167: The public helpers find_grpc_server, wait_for_port,
cpu_only_env, start_grpc_server, stop_grpc_server and the grpc_server fixture
need Python type annotations and expanded docstrings that document parameters,
return values and raised exceptions; update each function signature to include
appropriate types (e.g. -> Optional[str] for find_grpc_server, port: int,
timeout: float for wait_for_port -> bool, cpu_only_env(port: int) ->
Dict[str,str], start_grpc_server(port_offset: int) -> Tuple[subprocess.Popen,
Dict[str,str]], stop_grpc_server(proc: subprocess.Popen) -> None, and annotate
the fixture return type in its typing comment or use
pytest.GeneratorFixture/Iterator typing), add necessary typing imports
(Optional, Dict, Tuple) and expand each docstring to list :param:, :return:, and
:raises: entries for the conditions already handled (e.g. pytest.skip,
pytest.fail, subprocess.TimeoutExpired) so the API is fully documented per
project guidelines.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 023d9ed7-2291-4319-be01-e85bec5e7230
📒 Files selected for processing (6)
cpp/src/grpc/client/cython_grpc_client.cppcpp/src/grpc/client/grpc_client_env.cpppython/cuopt/conftest.pypython/cuopt/cuopt/grpc/grpc_client.pyxpython/cuopt/cuopt/grpc_server_fixtures.pypython/cuopt/cuopt/tests/linear_programming/test_grpc_client.py
🚧 Files skipped from review as they are similar to previous changes (4)
- python/cuopt/cuopt/tests/linear_programming/test_grpc_client.py
- cpp/src/grpc/client/grpc_client_env.cpp
- cpp/src/grpc/client/cython_grpc_client.cpp
- python/cuopt/cuopt/grpc/grpc_client.pyx
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
python/cuopt/cuopt/grpc_server_fixtures.py (1)
35-36:⚠️ Potential issue | 🟠 Major | ⚡ Quick winUse unique port offsets for each test class configuration.
Line 35 and Line 36 both use
800, which can makegrpc_serverclasses contend for the same port and cause intermittent startup failures under parallel execution.Suggested fix
GRPC_PORT_OFFSET_CLIENT = 800 -GRPC_PORT_OFFSET_TLS = 800 +GRPC_PORT_OFFSET_TLS = 850 GRPC_PORT_OFFSET_MTLS = 900🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@python/cuopt/cuopt/grpc_server_fixtures.py` around lines 35 - 36, GRPC_PORT_OFFSET_CLIENT and GRPC_PORT_OFFSET_TLS are both set to 800 causing port collisions when grpc_server fixtures run in parallel; change one constant (e.g., GRPC_PORT_OFFSET_TLS) to a different unique offset value (or pick distinct offsets for both) and update any usages so each test class configuration uses a distinct port range to avoid startup contention.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@python/cuopt/cuopt/grpc_server_fixtures.py`:
- Around line 35-36: GRPC_PORT_OFFSET_CLIENT and GRPC_PORT_OFFSET_TLS are both
set to 800 causing port collisions when grpc_server fixtures run in parallel;
change one constant (e.g., GRPC_PORT_OFFSET_TLS) to a different unique offset
value (or pick distinct offsets for both) and update any usages so each test
class configuration uses a distinct port range to avoid startup contention.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 89213f16-bf9e-4b36-aab3-98f7a98d35dd
📒 Files selected for processing (2)
python/cuopt/cuopt/grpc_server_fixtures.pypython/cuopt/pyproject.toml
|
🔔 Hi @anandhkb, this pull request has had no activity for 7 days. Please update or let us know if it can be closed. Thank you! If this is an "epic" issue, then please add the "epic" label to this issue. |
also move grpc server fixtures under test
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
python/cuopt/cuopt/grpc/numerical/grpc_client.pyx (2)
450-457: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick winPreserve incumbent thread ownership when
join()times out.At Line 452,
join_incumbent_stream()removes the thread from_incumbent_threadsbefore confirming it has stopped. Ifthread.join(timeout)times out, the thread can continue running but becomes untracked, so later cleanup/error propagation is lost.Suggested fix
def join_incumbent_stream(self, str job_id, timeout=None): """Wait for a background incumbent poll started by :meth:`start_incumbent_stream`.""" - thread = self._incumbent_threads.pop(job_id, None) + thread = self._incumbent_threads.get(job_id) if thread is not None: thread.join(timeout) + if thread.is_alive(): + exc = self._incumbent_thread_errors.get(job_id) + if exc is not None: + raise exc + return + self._incumbent_threads.pop(job_id, None) exc = self._incumbent_thread_errors.pop(job_id, None) if exc is not None: raise exc🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@python/cuopt/cuopt/grpc/numerical/grpc_client.pyx` around lines 450 - 457, The join_incumbent_stream method removes the thread from _incumbent_threads dictionary before verifying that thread.join(timeout) has actually completed. If join times out, the thread continues running but becomes untracked, causing loss of cleanup and error propagation. Modify the logic to only pop the thread from _incumbent_threads after confirming the join completed successfully, or defer the removal until after checking that the thread has actually stopped. This ensures the thread remains registered if the join operation times out, allowing proper tracking and later cleanup.
225-230: 🎯 Functional Correctness | 🔴 CriticalConfirm MIP result retrieval handles fresh Client instances correctly, and add regression test.
The
result()method defaultsfetch_miptoFalsefor unknown job IDs because_job_is_mipis only populated duringsubmit()on the same Client instance (line 228). When a fresh Client callsresult()for a MIP job without explicitly passingis_mip=True, it will callget_lp_result()instead ofget_mip_result(). Since the server'sResultResponseuses aoneofbetweenlp_solutionandmip_solution, accessing the wrong field will silently return an empty result rather than the actual MIP solution.Existing tests (e.g.,
test_mip_submit_and_result) only cover same-Client-instance retrieval where_job_is_mippreserves the flag. Add a test that creates a fresh Client instance to retrieve a previously submitted MIP job's result without passingis_mip=True, verifying that either an error is raised or the method documents that callers must passis_mip=Truefor cross-process/thread retrieval.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@python/cuopt/cuopt/grpc/numerical/grpc_client.pyx` around lines 225 - 230, The result() method in grpc_client.pyx has a limitation where it defaults fetch_mip to False when a job_id is not found in the _job_is_mip dictionary, which only gets populated during submit() calls on the same Client instance. This causes MIP results to be incorrectly retrieved as LP results when calling result() from a fresh Client instance without explicitly passing is_mip=True. Add a regression test that demonstrates this scenario by submitting a MIP job with one Client instance and then attempting to retrieve its result from a different fresh Client instance without passing is_mip=True, verifying that the method either raises an appropriate error or documents that callers must explicitly pass is_mip=True for cross-instance result retrieval. Additionally, consider adding validation in the result() method to detect this condition and either raise an error or warn users about the requirement.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@python/cuopt/cuopt/tests/linear_programming/test_grpc_client.py`:
- Around line 54-58: The assertion `assert "numerical" in (grpc_pkg.__doc__ or
"")` in the test_grpc_package_is_namespace_only function is brittle because it
couples the test to documentation wording rather than actual package structure.
Replace this docstring check with a more robust assertion that directly verifies
"numerical" exists as an attribute or subpackage in grpc_pkg (such as checking
if it's in grpc_pkg.__dict__ or using hasattr), so the test validates the actual
API structure instead of documentation content.
- Around line 160-168: The test_cancel_job function is flaky because it relies
on timing assumptions that the job will still be in QUEUED or PROCESSING state
when cancel() is called, but on faster machines the job can complete before the
cancel executes. Mock or stub the client.status() method to return a consistent
QUEUED or PROCESSING state when called before the cancel() operation, rather
than relying on actual job execution timing, to eliminate the race condition and
ensure the test deterministically validates the cancel functionality and the
resulting CANCELLED state.
---
Outside diff comments:
In `@python/cuopt/cuopt/grpc/numerical/grpc_client.pyx`:
- Around line 450-457: The join_incumbent_stream method removes the thread from
_incumbent_threads dictionary before verifying that thread.join(timeout) has
actually completed. If join times out, the thread continues running but becomes
untracked, causing loss of cleanup and error propagation. Modify the logic to
only pop the thread from _incumbent_threads after confirming the join completed
successfully, or defer the removal until after checking that the thread has
actually stopped. This ensures the thread remains registered if the join
operation times out, allowing proper tracking and later cleanup.
- Around line 225-230: The result() method in grpc_client.pyx has a limitation
where it defaults fetch_mip to False when a job_id is not found in the
_job_is_mip dictionary, which only gets populated during submit() calls on the
same Client instance. This causes MIP results to be incorrectly retrieved as LP
results when calling result() from a fresh Client instance without explicitly
passing is_mip=True. Add a regression test that demonstrates this scenario by
submitting a MIP job with one Client instance and then attempting to retrieve
its result from a different fresh Client instance without passing is_mip=True,
verifying that the method either raises an appropriate error or documents that
callers must explicitly pass is_mip=True for cross-instance result retrieval.
Additionally, consider adding validation in the result() method to detect this
condition and either raise an error or warn users about the requirement.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 59352653-b9d7-4cd8-9051-15a064cd6618
📒 Files selected for processing (10)
python/cuopt/conftest.pypython/cuopt/cuopt/grpc/CMakeLists.txtpython/cuopt/cuopt/grpc/__init__.pypython/cuopt/cuopt/grpc/numerical/CMakeLists.txtpython/cuopt/cuopt/grpc/numerical/__init__.pypython/cuopt/cuopt/grpc/numerical/grpc_client.pxdpython/cuopt/cuopt/grpc/numerical/grpc_client.pyxpython/cuopt/cuopt/tests/fixtures/__init__.pypython/cuopt/cuopt/tests/fixtures/grpc_server_fixtures.pypython/cuopt/cuopt/tests/linear_programming/test_grpc_client.py
💤 Files with no reviewable changes (1)
- python/cuopt/cuopt/grpc/numerical/grpc_client.pxd
✅ Files skipped from review due to trivial changes (3)
- python/cuopt/cuopt/grpc/init.py
- python/cuopt/cuopt/tests/fixtures/init.py
- python/cuopt/conftest.py
ramakrishnap-nv
left a comment
There was a problem hiding this comment.
Rest looks good, minor suggestions. And also lets create an issue for the follow-up PR for docs.
Co-authored-by: Ramakrishna Prabhu <42624703+ramakrishnap-nv@users.noreply.github.com>
Co-authored-by: Ramakrishna Prabhu <42624703+ramakrishnap-nv@users.noreply.github.com>
This lets Python use an async grpc client with Problem type in additon to existing blocking Problem.solve() with remote execution env vars set (zero-code-change remote solve)
For some scenarios, an async client with polling, or results fetched by a different process/thread, or job cancellation, etc is desirable. It also helps with agents that formulate problems and then want to cancel the job and modify the model.