feat: add coding-agent sidecar integrations by willkill07 · Pull Request #63 · NVIDIA/NeMo-Flow

willkill07 · 2026-05-05T19:57:31Z

Overview

Adds the Rust nemo-flow-sidecar binary and coding-agent integration surface for observing Claude Code, Codex, Cursor, and Hermes agent activity through hook ingestion, transparent sidecar execution, passthrough LLM gateway routes, ATIF export, and OpenInference export. The branch now covers the sidecar runtime, installer behavior, agent-specific hook bundles/docs, Hermes-specific shell hook support, best-effort LLM/tool correlation, token metrics extraction, and CI/release plumbing for publishing the sidecar crate and binary artifacts.

I confirm this contribution is my own work, or I have the right to submit it under this project's license.
I searched existing issues and open pull requests, and this does not duplicate existing work.

Details

Adds the nemo-flow-sidecar workspace crate with CLI entry points for serve, run, install, and hook-forward.
Adds hook endpoints and adapters for Claude Code, Codex, Cursor, and Hermes, including Hermes YAML config merge support and runtime NEMO_FLOW_SIDECAR_URL handling for ephemeral sidecar runs.
Adds sidecar session management for agent, subagent, tool, LLM, ATIF, and OpenInference lifecycle events, with shared extraction of task/session IDs, parent relationships, tool call IDs, results, and sidecar metadata.
Adds passthrough gateway routes for OpenAI Responses, OpenAI Chat Completions, Anthropic Messages, Anthropic token counting, and model listing, plus best-effort correlation between gateway LLM activity and hook-reported tool calls.
Updates ATIF export behavior to preserve the unwrapped LLM request payload, extract provider-native usage/token metrics, include cached-token variants, and improve source call correlation.
Adds installable Claude Code and Codex hook/plugin bundles plus a Cursor hook bundle under integrations/coding-agents/; Hermes is configured through the sidecar installer rather than a static bundle.
Adds end-to-end sidecar documentation, per-agent setup guides, Rust API index notes, install docs, and integration overview links.
Wires the sidecar into workspace metadata, release guidance, local dependency publishing helpers, attribution data, Codecov configuration, path filters, GitHub Actions, GitLab CI, and sidecar release-binary artifact publishing.
Adds sidecar CLI tests plus coverage for adapters, config, gateway behavior, installer merge/planning, transparent launcher behavior, server responses, session/export behavior, and ATIF exporter changes.
Follow-up CI fixes remove the stale sidecar generated-doc toctree entry and make the transparent launcher env/exit-code test use a platform-valid fake agent on Windows.

Validation recorded for this PR:

cargo test -p nemo-flow-sidecar
PATH="/opt/homebrew/bin:$PATH" /opt/homebrew/bin/just docs
cargo fmt --all
cargo test -p nemo-flow-sidecar launcher::tests::run_starts_sidecar_injects_env_and_returns_agent_exit_code
Previous branch validation included just test-rust, just docs, and cargo clippy -p nemo-flow-sidecar --all-targets -- -D warnings.
Commit pre-checks for the follow-up fix passed with lychee skipped because it still fails on unrelated generated Rust API doc links in docs/reference/api/rust/_generated/...; cargo fmt, cargo clippy, and cargo check passed with PyO3 pointed at Python 3.12.
Latest PR head is f5f9a385, which includes the follow-up fix commit 23315c09 plus the current main merge; GitHub CI is rerunning for this head.

Where should the reviewer start?

Start with crates/sidecar/src/session.rs for lifecycle normalization and export behavior, then crates/sidecar/src/gateway.rs and crates/sidecar/src/server.rs for the HTTP/gateway surface. For installation and user-facing behavior, review crates/sidecar/src/installer.rs, docs/integrate-frameworks/coding-agent-sidecar.md, and integrations/coding-agents/README.md.

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Closes NMF-57
Closes NMF-58
Closes NMF-59
Closes NMF-60

Summary by CodeRabbit

New Features
- Added a coding-agent gateway sidecar (nemo-flow-sidecar) with adapters, CLI/launcher/installer, server, and workspace/crate publication.
- Python LLM API: optional annotated_response and response_codec for richer end-event annotations.
Improvements
- Enhanced observability: request-level LLM payload capture and broader provider token-usage extraction; normalized response decoding behavior.
Documentation
- New sidecar guides, integration READMEs, install instructions, and README badge.
Tests
- Extensive unit/integration coverage for sidecar components.
Chores
- CI/workflow updates to build/publish sidecar and Codecov coverage component.

Signed-off-by: Will Killian <wkillian@nvidia.com>

coderabbitai · 2026-05-05T19:57:39Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

Walkthrough

Introduces a new nemo-flow-sidecar crate (server, adapters, gateway, session manager), installer/launcher CLI, Python llm_call_end annotation/codec support, CI/publishing and Codecov updates, ATIF exporter improvements, extensive tests, docs, and packaged coding-agent integration bundles.

Changes

Sidecar crate, API, CI, tests, and docs

Layer / File(s)	Summary
Core contracts and wiring `.github/...`, `.gitlab-ci.yml`, `Cargo.toml`, `codecov.yml`, `README.md`, `RELEASING.md`, `justfile`	Adds `crates/sidecar` to the workspace and workspace.dependencies; includes sidecar in GitHub/GitLab publish loops and adds a CI packaging job; updates Codecov components/ignore; extends CI path filters for `integrations/coding-agents/**`; updates README badges and RELEASING notes; updates justfile helper.
Core ATIF exporter `crates/core/src/observability/atif.rs`, `crates/core/tests/unit/atif_tests.rs`	Adds `llm_request` to `AtifStepExtra`; records request payload at LLM start and emits `None` for non-user finalized agent steps; refactors `extract_metrics` to handle provider `usage` shapes and adds helper extractors; tests updated.
Core LLM API / codecs `crates/core/src/api/llm.rs`, `crates/core/src/codec/traits.rs`	Adds `response_codec` to `LlmCallEndParams`; `llm_call_end` sanitizes the provider `response`, derives `annotated_response` from the codec when omitted, emits the end event even if decode fails, and returns a decode error afterward; codec docs clarify fallible decode semantics.
Python API extension `crates/python/src/py_api/mod.rs`, `python/nemo_flow/_native.pyi`, `python/nemo_flow/llm.py`, `crates/python/tests/`, `python/tests/`	Extends Python `llm_call_end` to accept optional `annotated_response` and `response_codec`; adds helpers to resolve/validate codec and annotated-response inputs; refactors execute/stream execute paths to use shared helper; updates stubs and tests to cover decoding, sanitized-payload decoding, and error-with-emitted-end-event semantics.
Sidecar manifest & models `crates/sidecar/Cargo.toml`, `crates/sidecar/src/model.rs`	Adds `crates/sidecar` Cargo manifest and workspace dependency; defines `AgentKind`, `NormalizedEvent`, and typed event payload structs (SessionEvent, SubagentEvent, LlmHintEvent, LlmEvent, ToolEvent).
Adapters & normalization `crates/sidecar/src/adapters/*`, `crates/sidecar/src/adapters/mod.rs`	Implements vendor adapters (Claude Code, Codex, Cursor, Hermes) and shared adapter utilities: `AdapterOutcome`, classification rules, session-id extraction, metadata builders, normalization helpers; adapters return vendor-shaped responses while producing normalized events.
Config, CLI, and error mapping `crates/sidecar/src/config.rs`, `crates/sidecar/src/error.rs`	Implements CLI models (server/run/install/hook-forward), TOML discovery + deep-merge, environment overlays, header-derived per-session config parsing, and `SidecarError` with `IntoResponse` mapping to structured JSON and HTTP status codes.
Server and gateway `crates/sidecar/src/server.rs`, `crates/sidecar/src/gateway.rs`	Adds Axum router with `/healthz`, per-agent `/hooks/*` endpoints that adapt payloads and apply normalized events; gateway passthrough for OpenAI-style routes with buffered and streaming handling, streaming preview/truncation, `StreamingLlmGuard` for client-drop cleanup, and `/v1/models` proxy; header filtering and provider route mapping included.
Session manager, correlation, and ATIF `crates/sidecar/src/session.rs`	Implements `SessionManager` for in-memory session lifecycle management, lazy root-scope creation, LLM/tool/subagent ownership resolution via headers/pending hints/sticky ownership, hint scoring and matching, tool-hint extraction from provider responses, ATIF trajectory writing, and observer flush/cleanup.
Installer, launcher, and CLI entrypoint `crates/sidecar/src/installer.rs`, `crates/sidecar/src/launcher.rs`, `crates/sidecar/src/main.rs`	Adds `install` planning/merge/write with idempotent merging and backups; `hook-forward` stdin->HTTP forwarding with fail-open/closed modes; `run` launcher that injects sidecar wiring for various agents (temporary plugin/hook patching, env/argv injection, health polling); adds async CLI entrypoint and dispatcher.
Tests `crates/sidecar/tests/*/`, `crates/core/tests/`, `crates/python/tests/`, `python/tests/*`	Adds wide test coverage: adapters, config resolution, gateway streaming/headers/models, server endpoints, session lifecycles and correlation, installer merge/idempotency, launcher prepare/restore/health, CLI hook-forward behaviors, Python codec decode/sanitization/failure semantics, and ATIF content validation.
Docs & integrations `docs/`, `integrations/coding-agents/`	Adds sidecar getting-started and per-agent guides, integration bundles (Claude Code, Codex, Cursor, Hermes) with hooks and READMEs, updates Rust API reference to list `nemo-flow-sidecar`, and documents install-from-source and index navigation links.

Sequence Diagram(s)

sequenceDiagram
  participant Agent as Coding Agent
  participant Sidecar as Sidecar Server
  participant Upstream as Provider API
  participant Sess as SessionManager
  Agent->>Sidecar: POST /hooks/{agent} (hook JSON)
  Sidecar->>Sess: apply_events(normalized)
  Agent->>Sidecar: POST /v1/chat/completions
  Sidecar->>Sess: start_llm(request, headers)
  Sidecar->>Upstream: forward request
  Upstream-->>Sidecar: response/stream
  Sidecar->>Sess: end_llm(response/meta)
  Sidecar-->>Agent: proxied response/vendor hook reply

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Signed-off-by: Will Killian <wkillian@nvidia.com>

- Added Hermes as a first-class sidecar agent: --agent hermes, config support, command inference for hermes / hermes-agent, and AgentKind::Hermes. - Added a dedicated Hermes hook endpoint: /hooks/hermes now routes Hermes shell-hook payloads into the sidecar session manager. - Added a Hermes adapter: Maps Hermes lifecycle, tool, and subagent shell hooks into normalized NeMo Flow sidecar events. - Added Hermes installer support: nemo-flow-sidecar install hermes writes/merges .hermes/config.yaml hook config using YAML instead of JSON/TOML. - Added dynamic sidecar URL handling for Hermes: Hermes installed hooks prefer runtime NEMO_FLOW_SIDECAR_URL, so nemo-flow-sidecar run --agent hermes can use an ephemeral sidecar URL without reinstalling hooks. - Preserved Hermes hook consent semantics: The runner exports the dynamic sidecar URL but does not auto-enable HERMES_ACCEPT_HOOKS. - Improved ATIF LLM request fidelity: ATIF user steps now preserve the full unwrapped LLM request payload under extra.llm_request. - Improved ATIF token metrics extraction: Supports provider-native usage objects in addition to NeMo Flow token_usage, including cached-token variants from OpenAI/Anthropic-style payloads. - Expanded shared adapter extraction: Adds support for Hermes-friendly payload fields like task_id, parent_session_id, extra.tool_call_id, and extra.result. - Added test coverage: New/updated tests cover Hermes adapter mapping, config parsing, installer generation/merge behavior, runtime launcher behavior, server hook response shape, and ATIF request/usage export. Signed-off-by: Bryan Bednarski <bbednarski@nvidia.com> Signed-off-by: Will Killian <wkillian@nvidia.com>

Signed-off-by: GSD Agent <bbednarski@nvidia.com>

Signed-off-by: Will Killian <wkillian@nvidia.com>

Signed-off-by: GSD Agent <bbednarski@nvidia.com>

Signed-off-by: Will Killian <wkillian@nvidia.com>

Signed-off-by: Will Killian <2007799+willkill07@users.noreply.github.com>

copy-pr-bot · 2026-05-07T20:13:48Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

…egrations' into wkk_coding-agent-sidecar-integrations # Conflicts: # crates/sidecar/src/model.rs # crates/sidecar/src/session.rs # crates/sidecar/tests/coverage/installer_tests.rs # crates/sidecar/tests/coverage/session_tests.rs

Signed-off-by: Bryan Bednarski <bbednarski@nvidia.com>

…decar-integrations Signed-off-by: Will Killian <wkillian@nvidia.com> # Conflicts: # ATTRIBUTIONS-Rust.md # Cargo.lock

coderabbitai

Actionable comments posted: 6

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/sidecar/src/launcher.rs`:
- Around line 168-179: The current configured_command function uses
command.split_whitespace(), which drops quotes and corrupts quoted paths/args
(e.g., "C:\Program Files\..."), so replace the whitespace-only split with a
quote-aware parser: call a shell-style splitter (e.g., shell_words::split or
shlex::split) on the command string inside configured_command (refer to the
configured_command function and CodingAgent/AgentConfigs lookup), handle the
Result (return None or log/error on parse failure), and return the parsed
Vec<String> only when non-empty; this preserves quoted executables and arguments
on Windows and wrapper commands.
- Around line 482-496: The current insert_after_agent implementation can place
agent flags after argv[0] when the agent was specified via an explicit --agent
flag but no argv element matches CodingAgent::infer, causing wrappers to be
configured instead of the agent; update insert_after_agent (referencing argv,
agent, and CodingAgent::infer) to first search argv for an explicit --agent
occurrence (both "--agent" followed by a value and "--agent=VALUE" forms) and,
if found, insert the generated args immediately after the agent value token (or
the same token for --agent=VALUE); only if no explicit --agent token is present
fall back to finding the last infer-able agent index (as currently done) and
finally fall back to index 0.
- Around line 108-111: The health probe timeout currently discards any actual
server error by ignoring the result of running_server.stop().await; change the
error handling so that when wait_for_health(sidecar_url).await returns
Err(health_err), you call running_server.stop().await and if that returns
Err(server_err) return server_err (or otherwise wrap/preserve both, preferring
server_err) instead of always returning health_err; reference
RunningSidecar::start, wait_for_health, and running_server.stop().await to
locate where to capture and propagate the server-side error.

In `@crates/sidecar/tests/coverage/launcher_tests.rs`:
- Around line 223-252: The test mutates process cwd and currently restores it
with a trailing std::env::set_current_dir(previous) which fails to run on panic;
introduce a small RAII guard (e.g., CwdGuard) that saves the previous cwd on
creation and restores it in Drop, then replace the manual previous/save/restore
pattern in tests like cursor_patching_can_be_disabled (and the other listed
tests) by creating a CwdGuard after setting temp dir (use current_dir_lock() as
before) so the cwd is always restored even if the test panics or returns early;
ensure the guard is used in the scope where PreparedRun::new and assertions run.

In `@docs/integrate-frameworks/coding-agent-codex.md`:
- Around line 89-90: The provider display name currently uses "Nemo Flow"
casing; update the value for the TOML table [model_providers.nemo-flow-openai]
so the name key uses the official product capitalization "NeMo Flow" (e.g.,
change name = "Nemo Flow OpenAI" to name = "NeMo Flow OpenAI") to ensure
consistent product naming across docs.
- Around line 111-114: The docs currently reference deprecated keys
features.codex_hooks and codex_hooks; update the text so the transparent wrapper
sets the CLI override features.hooks=true (instead of features.codex_hooks) and
the persistent install writes .codex/config.toml with hooks = true (instead of
codex_hooks = true), while still describing that generated hook entries are
merged into .codex/hooks.json; update both occurrences to use the canonical
features.hooks and hooks configuration names and leave the hooks.json behavior
unchanged.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 7d4ca10f-58e2-4750-9c45-2c77d6195b69

📥 Commits

Reviewing files that changed from the base of the PR and between d336a47 and b35f658.

📒 Files selected for processing (4)

crates/sidecar/src/launcher.rs
crates/sidecar/tests/coverage/launcher_tests.rs
docs/integrate-frameworks/coding-agent-codex.md
integrations/coding-agents/codex/README.md

📜 Review details

🧰 Additional context used

📓 Path-based instructions (30)

**/*.{md,rst,html,txt}

📄 CodeRabbit inference engine (.agents/skills/review-doc-style/assets/nvidia-style-brand-terminology.md)

**/*.{md,rst,html,txt}: Always spell NVIDIA in all caps. Do not use Nvidia, nvidia, nVidia, nVIDIA, or NV.
Use an NVIDIA before a noun because the name starts with an 'en' sound.
Do not add a registered trademark symbol after NVIDIA when referring to the company.
Use trademark symbols with product names only when the document type or legal guidance requires them.
Verify official capitalization, spacing, and hyphenation for product names.
Precede NVIDIA product names with NVIDIA on first mention when it is natural and accurate.
Do not rewrite product names for grammar or title-case rules.
Preserve third-party product names according to the owner's spelling.
Include the company name and full model qualifier on first use when it helps identify the model.
Preserve the official capitalization and punctuation of model names.
Use shorter family names only after the full name is established.
Spell out a term on first use and put the acronym in parentheses unless the acronym is widely understood by the intended audience.
Use the acronym on later mentions after it has been defined.
For long documents, reintroduce the full term if readers might lose context.
Form plurals of acronyms with s, not an apostrophe, such as GPUs.
In headings, common acronyms can remain abbreviated. Spell out the term in the first or second sentence of the body.
Common terms such as CPU, GPU, PC, API, and UI usually do not need to be spelled out for developer audiences.

Files:

integrations/coding-agents/codex/README.md
docs/integrate-frameworks/coding-agent-codex.md

**/*.{md,rst,html}

📄 CodeRabbit inference engine (.agents/skills/review-doc-style/assets/nvidia-style-brand-terminology.md)

Link the first mention of a product name when the destination helps the reader.

Files:

integrations/coding-agents/codex/README.md
docs/integrate-frameworks/coding-agent-codex.md

**/*.{md,rst,txt}

📄 CodeRabbit inference engine (.agents/skills/review-doc-style/assets/nvidia-style-guide.md)

**/*.{md,rst,txt}: Spell NVIDIA in all caps. Do not use Nvidia, nvidia, or NV.
Format commands, code elements, expressions, package names, file names, and paths as inline code.
Use descriptive link text. Avoid raw URLs and weak anchors such as 'here' or 'read more.'
Use title case consistently for technical documentation headings.
Introduce code blocks, lists, tables, and images with complete sentences.
Write procedures as imperative steps. Keep steps parallel and split long procedures into smaller tasks.
Prefer active voice, present tense, short sentences, contractions, and plain English.
Use can for possibility and reserve may for permission.
Use after for temporal relationships instead of once.
Prefer refer to over see when the wording points readers to another resource.
Avoid culture-specific idioms, unnecessary Latinisms, jokes, and marketing exaggeration in technical documentation.
Spell out months in body text, avoid ordinal dates, and use clear time zones.
Spell out whole numbers from zero through nine unless they are technical values, parameters, versions, or UI values.
Use numerals for 10 or greater and include commas in thousands.
Do not add trademark symbols to learning-oriented documentation unless the source, platform, or legal guidance explicitly requires them.
Do not add trademark symbols to NeMo Flow learning documentation by default.
Do not rewrite API names, package names, command flags, or code literals for style reasons.

Files:

integrations/coding-agents/codex/README.md
docs/integrate-frameworks/coding-agent-codex.md

**/*.{md,markdown,rst}

📄 CodeRabbit inference engine (.agents/skills/review-doc-style/assets/nvidia-style-technical-docs.md)

**/*.{md,markdown,rst}: Use title case consistently in technical documentation headings
Avoid quotation marks, ampersands, and exclamation marks in headings
Keep product, event, research, and whitepaper names in their official title case
Use title case for table headers
Do not force social-media sentence case into technical docs
Use monospace formatting for code elements, commands, parameters, package names, and expressions
Use monospace formatting for directories, file names, and paths
Use angle brackets inside monospace for variables inside paths, such as /home/<username>/.login
Use quotation marks for error messages and strings in documentation
Use bold formatting for UI buttons, menus, fields, and labels in documentation
Use angle brackets between UI labels for menu paths, such as File > Save As
Use italics for new terms on first use in documentation
Use italics for publication titles in documentation
Use plain text formatting for keyboard shortcuts in documentation
Prefer [NVIDIA/NeMo](link) format for GitHub repository references over generic phrases like 'the GitHub repo'
Introduce every code block with a complete sentence
Do not make a code block complete the grammar of the previous sentence
Do not continue a sentence after a code block
Use syntax highlighting when the format supports it for code blocks
Avoid the word 'snippet' unless the surrounding docs already use it as a term of art
Keep inline method, function, and class references consistent with nearby docs, omitting empty parentheses for prose readability when no call is shown
Use descriptive anchor text that matches the destination title when possible for links
Avoid raw URLs in running text in documentation
Avoid generic link anchors such as 'here,' 'this page,' and 'read more' in documentation
Include the acronym in link text if a linked term includes an acronym
Do not link long sentences or multiple sentences in documentation
Avoid links that pull readers away from a procedure unles...

Files:

integrations/coding-agents/codex/README.md
docs/integrate-frameworks/coding-agent-codex.md

**/*.{html,md}

📄 CodeRabbit inference engine (CONTRIBUTING.md)

Include SPDX license headers in HTML and Markdown files using HTML comment syntax

Files:

integrations/coding-agents/codex/README.md
docs/integrate-frameworks/coding-agent-codex.md

**/README.md

📄 CodeRabbit inference engine (CONTRIBUTING.md)

Update relevant crate or package README when that surface changed

Relevant package or crate README.md files must be updated when examples or binding guidance changes

Files:

integrations/coding-agents/codex/README.md

**/*.md

📄 CodeRabbit inference engine (CONTRIBUTING.md)

Run Markdown link checking via lychee for README.md, CONTRIBUTING.md, and docs/ through pre-commit hooks

Files:

integrations/coding-agents/codex/README.md
docs/integrate-frameworks/coding-agent-codex.md

**/*.{md,markdown,py,sh,bash,js,ts,java,cpp,go,rust}

📄 CodeRabbit inference engine (.agents/skills/contribute-docs/SKILL.md)

Keep package names, repo references, and build commands current in documentation

Files:

integrations/coding-agents/codex/README.md
docs/integrate-frameworks/coding-agent-codex.md

**/*.{md,markdown,py,sh,bash}

📄 CodeRabbit inference engine (.agents/skills/contribute-docs/SKILL.md)

Keep stable user-facing wrappers at scripts/ root in docs and examples; only point at namespaced helper paths when documenting internal maintenance work

Files:

integrations/coding-agents/codex/README.md
docs/integrate-frameworks/coding-agent-codex.md

**/*.{md,markdown,py,sh,bash,js,ts,example}

📄 CodeRabbit inference engine (.agents/skills/contribute-docs/SKILL.md)

Example commands must match current package names and paths

Files:

integrations/coding-agents/codex/README.md
docs/integrate-frameworks/coding-agent-codex.md

**/{integrations,integration,*-integration}/**

📄 CodeRabbit inference engine (.agents/skills/contribute-integration/SKILL.md)

**/{integrations,integration,*-integration}/**: Keep NeMo Flow optional in framework integrations
Preserve the framework's original behavior when NeMo Flow is absent
Wrap tool and LLM paths at the correct framework boundary
Integration pattern must follow docs/integrate-frameworks/adding-scopes.md

Files:

integrations/coding-agents/codex/README.md

**/*.{md,txt,rst}

📄 CodeRabbit inference engine (.agents/skills/review-doc-style/SKILL.md)

**/*.{md,txt,rst}: Ensure commands, package names, file paths, and APIs in documentation are correct and not stale; flag incorrect or outdated information as blocking issues
Ensure examples and procedures in documentation will execute successfully with current APIs and commands
Use consistent user-facing terminology throughout documentation that matches current repo terminology
Capitalize NVIDIA correctly in all documentation and public-facing text
Format code, commands, paths, and filenames as inline code (monospace) in documentation
Use descriptive anchor text for links instead of bare URLs or weak labels like 'here' in documentation
Prefer active voice, present tense, short sentences, and plain English in documentation
Structure documentation procedures as imperative steps that are easy to scan and not too long for a single sequence
Prefer 'after' instead of 'once' for temporal references in documentation
Use 'can' instead of 'may' when describing possibility (rather than permission) in documentation
Avoid ambiguous numeric dates and ordinal dates in documentation body text

Files:

integrations/coding-agents/codex/README.md
docs/integrate-frameworks/coding-agent-codex.md

{README.md,docs/index.md,**/README.md}

📄 CodeRabbit inference engine (.agents/skills/review-doc-style/SKILL.md)

Update entry-point documentation (README.md, docs/index.md, package READMEs, binding-level source READMEs) whenever public behavior changes

Files:

integrations/coding-agents/codex/README.md

**/*.{rs,py,js,ts,go,sh,toml,yaml,yml,json,md,rst}

📄 CodeRabbit inference engine (AGENTS.md)

Keep SPDX headers on source, docs, scripts, and configuration files. The project is Apache-2.0.

Files:

integrations/coding-agents/codex/README.md
docs/integrate-frameworks/coding-agent-codex.md
crates/sidecar/src/launcher.rs
crates/sidecar/tests/coverage/launcher_tests.rs

**/*.{py,js,ts,tsx,go,rs,md}

📄 CodeRabbit inference engine (.agents/skills/validate-change/SKILL.md)

Format changed files with the language-native formatter before the final lint/test pass

Files:

integrations/coding-agents/codex/README.md
docs/integrate-frameworks/coding-agent-codex.md
crates/sidecar/src/launcher.rs
crates/sidecar/tests/coverage/launcher_tests.rs

docs/**/*.md

📄 CodeRabbit inference engine (CONTRIBUTING.md)

Run ./scripts/build-docs.sh for documentation site changes

docs/**/*.md: Relevant getting-started or reference docs must be updated when examples change
Release-policy docs must point to GitHub Releases as the only release-history source of truth

docs/**/*.md: Use title case for headings in technical documentation
Introduce code blocks, tables, and lists with complete lead-in sentences in documentation

Files:

docs/integrate-frameworks/coding-agent-codex.md

{RELEASING.md,CHANGELOG.md,docs/**/*.md}

📄 CodeRabbit inference engine (.agents/skills/contribute-docs/SKILL.md)

Keep release-process and release-notes guidance in repo-maintainer docs such as RELEASING.md, not as user-facing docs pages or CHANGELOG.md

Files:

docs/integrate-frameworks/coding-agent-codex.md

docs/integrate-frameworks/**/*.md

📄 CodeRabbit inference engine (.agents/skills/contribute-integration/SKILL.md)

Documentation must be updated if activation or usage of the integration changed

Files:

docs/integrate-frameworks/coding-agent-codex.md

{scripts/*.sh,docs/**/*.md}

📄 CodeRabbit inference engine (.agents/skills/contribute-integration/SKILL.md)

Use root ./scripts/*.sh commands in docs and contributor guidance as documented, with implementations under scripts/third-party/

Files:

docs/integrate-frameworks/coding-agent-codex.md

{docs/**,examples/**,crates/adaptive/**,python/nemo_flow/**,go/nemo_flow/**,**/{example,component}.{ts,tsx,js,rs,py,go}}

📄 CodeRabbit inference engine (.agents/skills/maintain-optimizer/SKILL.md)

Any new adaptive component kind must have documentation, examples, and binding coverage across all supported languages

Files:

docs/integrate-frameworks/coding-agent-codex.md

{README*,CHANGELOG*,docs/**/*.{md,rst,txt},examples/**/*,*.md}

📄 CodeRabbit inference engine (.agents/skills/rename-surfaces/SKILL.md)

Update documentation, examples, and getting-started guides with new package/module/crate names after rename operations

Files:

docs/integrate-frameworks/coding-agent-codex.md

{README.md,docs/**/*.md,examples/**/*.{js,ts,py,go,rs}}

📄 CodeRabbit inference engine (.agents/skills/maintain-packaging/SKILL.md)

Keep documentation and examples synchronized with current install, import, and build commands

Files:

docs/integrate-frameworks/coding-agent-codex.md

{README.md,CONTRIBUTING.md,docs/**/*.md}

📄 CodeRabbit inference engine (.agents/skills/validate-change/SKILL.md)

For docs-only changes, run targeted checks only if commands, package names, or examples changed. Use just docs for docs-site builds and just docs-linkcheck when links changed

Files:

docs/integrate-frameworks/coding-agent-codex.md

{docs/**,README.md,CONTRIBUTING.md,RELEASING.md,SECURITY.md}

⚙️ CodeRabbit configuration file

{docs/**,README.md,CONTRIBUTING.md,RELEASING.md,SECURITY.md}: Review documentation for technical accuracy against the current API, command correctness, and consistency across language bindings.
Flag stale examples, missing SPDX headers where required, and instructions that no longer match CI or pre-commit behavior.

Files:

docs/integrate-frameworks/coding-agent-codex.md

**/*.rs

📄 CodeRabbit inference engine (.agents/skills/test-ffi-surface/SKILL.md)

**/*.rs: Run cargo fmt --all for FFI work as it is Rust work
Run just test-rust for FFI validation
Run cargo clippy --workspace --all-targets -- -D warnings to enforce warnings-as-errors linting

When Rust files changed as part of Python work, run cargo fmt --all, just test-rust, and cargo clippy --workspace --all-targets -- -D warnings

**/*.rs: Run cargo fmt --all for Rust code formatting
Run cargo clippy --workspace --all-targets -- -D warnings to enforce Rust linting with no warnings
Run just test-rust as the shared-runtime build/test wrapper for Rust changes

Use Rust snake_case naming convention for Rust code

**/*.rs: Any Rust change must run just test-rust
Any Rust change must run cargo fmt --all
Any Rust change must run cargo clippy --workspace --all-targets -- -D warnings

**/*.rs: Run cargo fmt --all for Rust code formatting when Node changes touch Rust files
Run cargo clippy --workspace --all-targets -- -D warnings to enforce strict linting when Rust files changed as part of Node work

**/*.rs: Always run just test-rust when any Rust code changes
Always run cargo fmt --all when any Rust code changes
Always run cargo clippy --workspace --all-targets -- -D warnings when any Rust code changes

Files:

crates/sidecar/src/launcher.rs
crates/sidecar/tests/coverage/launcher_tests.rs

**/*.{rs,go,js,ts,tsx}

📄 CodeRabbit inference engine (CONTRIBUTING.md)

Include SPDX license headers in all Rust, Go, JavaScript, and TypeScript source files using C-style comment syntax

Files:

crates/sidecar/src/launcher.rs
crates/sidecar/tests/coverage/launcher_tests.rs

**/*.{rs,py,go,js,ts,tsx}

📄 CodeRabbit inference engine (CONTRIBUTING.md)

Use SONAR_IGNORE_START / SONAR_IGNORE_END markers only for documented false positives that cannot be resolved in code; keep ignored blocks small, add explanatory comments, and require reviewer sign-off

Files:

crates/sidecar/src/launcher.rs
crates/sidecar/tests/coverage/launcher_tests.rs

**/*.{js,ts,tsx,jsx,py,rs,go,java,c,cpp,h,cc,cxx,cs,rb,php,swift,kt}

📄 CodeRabbit inference engine (.agents/skills/prepare-pr/SKILL.md)

Changed files must be formatted with the language-native formatter

Files:

crates/sidecar/src/launcher.rs
crates/sidecar/tests/coverage/launcher_tests.rs

**/*.{rs,py,js,ts,tsx,go}

📄 CodeRabbit inference engine (.agents/skills/validate-change/SKILL.md)

During iteration, prefer uv run pre-commit run --files <changed files...> for targeted validation

Files:

crates/sidecar/src/launcher.rs
crates/sidecar/tests/coverage/launcher_tests.rs

{crates/**/tests/**,python/tests/**,go/nemo_flow/**/*_test.go}

⚙️ CodeRabbit configuration file

{crates/**/tests/**,python/tests/**,go/nemo_flow/**/*_test.go}: Tests should cover the behavior promised by the changed API surface, including error paths and cross-request isolation where relevant.
Prefer assertions on lifecycle events, scope stacks, middleware ordering, and binding parity over shallow smoke tests.

Files:

crates/sidecar/tests/coverage/launcher_tests.rs

🔇 Additional comments (2)

integrations/coding-agents/codex/README.md (2)
103-104: Use Official Product Capitalization in Provider Name

The user-facing provider label should use NeMo Flow, not Nemo Flow.
Suggested doc fix
-name = "Nemo Flow OpenAI"
+name = "NeMo Flow OpenAI"
As per coding guidelines, "Verify official capitalization, spacing, and hyphenation for product names."

33-35: ⚡ Quick win

Documentation is correct as written.

The configuration key documented in lines 34–35 is codex_hooks = true under the [features] table in .codex/config.toml, which matches the canonical syntax for codex-cli >= 0.129.0. The suggested change to features.hooks = true is incorrect; that key does not exist.

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (2)

crates/sidecar/src/gateway.rs (2)

545-549: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Stop recording cookie and generic API-key aliases.

should_record_header still allows secrets like cookie, api-key, and similar aliases into LlmRequest.headers, which then flow into downstream exports.

Proposed fix

 fn should_record_header(name: &HeaderName) -> bool {
     should_forward_request_header(name)
         && name != http::header::AUTHORIZATION
+        && name != http::header::COOKIE
         && name.as_str() != "x-api-key"
+        && name.as_str() != "api-key"
+        && name.as_str() != "apikey"
         && name.as_str() != "anthropic-api-key"
 }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/sidecar/src/gateway.rs` around lines 545 - 549, The
should_record_header function currently permits secret headers like "cookie" and
generic API-key aliases; update should_record_header (the function taking
&HeaderName) to explicitly exclude cookie-related headers ("cookie" and
"set-cookie") and common API-key variants ("api-key", "apikey", and any other
vendor aliases such as "x-api-key" and "anthropic-api-key") and perform
comparisons case-insensitively (e.g., use name.as_str().eq_ignore_ascii_case)
while keeping the existing should_forward_request_header and the authorization
check.

503-565: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Honor Connection header tokens when filtering headers.

The proxy still strips only the fixed hop-by-hop names. If a request or response carries Connection: x-foo, x-foo is still eligible for forwarding and recording because is_hop_by_hop never inspects the current header map. That breaks proxy semantics and can leak per-connection metadata.

Parse the message’s Connection header into a per-request/per-response exclusion set and apply it in both the request and response filters.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/sidecar/src/gateway.rs` around lines 503 - 565, The code currently
strips only fixed hop-by-hop header names; you must honor per-message Connection
tokens by parsing the request/response HeaderMap's "connection" header into a
set of token names and treating those tokens as hop-by-hop too. Update
response_headers to first collect connection tokens from the provided HeaderMap
and exclude any header whose name matches those tokens (in addition to calling
is_hop_by_hop), and change the request-side predicates
should_forward_request_header and should_record_header to accept either the
HeaderMap or a precomputed set of connection token names so they can exclude
those tokens as well; keep the existing fixed-name checks (HOST, CONTENT_LENGTH,
ACCEPT_ENCODING, AUTHORIZATION, x-api-key, anthropic-api-key) and reuse or
refactor is_hop_by_hop into a helper that only tests fixed names while combining
it with the per-message token set when filtering.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/sidecar/src/gateway.rs`:
- Around line 245-259: The code currently calls upstream_response.bytes() and
fully buffers the upstream body (see upstream_response.bytes(), the bytes
variable and response_json usage), which can OOM; replace this with a shared
bounded reader that enforces a maximum buffer size (e.g., a configured
MAX_BODY_BYTES) and choose an explicit overflow policy: either fail-fast by
returning a SidecarError::UpstreamOverflow and emitting end_llm(active, {
"error": "body_size_limit" }, { "http_status": status.as_u16(), "streaming":
false, "gateway_error": true, "stage": "body" }), or switch to passthrough
streaming by stopping buffering once the cap is hit and proxying the remaining
bytes directly to the client; implement the cap logic where bytes() is called
and also apply the same bounded-reader behavior to the other occurrence (around
lines referenced 375-378) so both non-streaming paths share the same
bounded-read helper and overflow handling.

---

Duplicate comments:
In `@crates/sidecar/src/gateway.rs`:
- Around line 545-549: The should_record_header function currently permits
secret headers like "cookie" and generic API-key aliases; update
should_record_header (the function taking &HeaderName) to explicitly exclude
cookie-related headers ("cookie" and "set-cookie") and common API-key variants
("api-key", "apikey", and any other vendor aliases such as "x-api-key" and
"anthropic-api-key") and perform comparisons case-insensitively (e.g., use
name.as_str().eq_ignore_ascii_case) while keeping the existing
should_forward_request_header and the authorization check.
- Around line 503-565: The code currently strips only fixed hop-by-hop header
names; you must honor per-message Connection tokens by parsing the
request/response HeaderMap's "connection" header into a set of token names and
treating those tokens as hop-by-hop too. Update response_headers to first
collect connection tokens from the provided HeaderMap and exclude any header
whose name matches those tokens (in addition to calling is_hop_by_hop), and
change the request-side predicates should_forward_request_header and
should_record_header to accept either the HeaderMap or a precomputed set of
connection token names so they can exclude those tokens as well; keep the
existing fixed-name checks (HOST, CONTENT_LENGTH, ACCEPT_ENCODING,
AUTHORIZATION, x-api-key, anthropic-api-key) and reuse or refactor is_hop_by_hop
into a helper that only tests fixed names while combining it with the
per-message token set when filtering.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: a870b963-f006-4ee5-8568-231bf1fa4f3e

📥 Commits

Reviewing files that changed from the base of the PR and between b35f658 and 74189e8.

📒 Files selected for processing (1)

crates/sidecar/src/gateway.rs

📜 Review details

🧰 Additional context used

📓 Path-based instructions (7)

**/*.rs

📄 CodeRabbit inference engine (.agents/skills/test-ffi-surface/SKILL.md)

**/*.rs: Run cargo fmt --all for FFI work as it is Rust work
Run just test-rust for FFI validation
Run cargo clippy --workspace --all-targets -- -D warnings to enforce warnings-as-errors linting

When Rust files changed as part of Python work, run cargo fmt --all, just test-rust, and cargo clippy --workspace --all-targets -- -D warnings

**/*.rs: Run cargo fmt --all for Rust code formatting
Run cargo clippy --workspace --all-targets -- -D warnings to enforce Rust linting with no warnings
Run just test-rust as the shared-runtime build/test wrapper for Rust changes

Use Rust snake_case naming convention for Rust code

**/*.rs: Any Rust change must run just test-rust
Any Rust change must run cargo fmt --all
Any Rust change must run cargo clippy --workspace --all-targets -- -D warnings

**/*.rs: Run cargo fmt --all for Rust code formatting when Node changes touch Rust files
Run cargo clippy --workspace --all-targets -- -D warnings to enforce strict linting when Rust files changed as part of Node work

**/*.rs: Always run just test-rust when any Rust code changes
Always run cargo fmt --all when any Rust code changes
Always run cargo clippy --workspace --all-targets -- -D warnings when any Rust code changes

Files:

crates/sidecar/src/gateway.rs

**/*.{rs,go,js,ts,tsx}

📄 CodeRabbit inference engine (CONTRIBUTING.md)

Include SPDX license headers in all Rust, Go, JavaScript, and TypeScript source files using C-style comment syntax

Files:

crates/sidecar/src/gateway.rs

**/*.{rs,py,go,js,ts,tsx}

📄 CodeRabbit inference engine (CONTRIBUTING.md)

Use SONAR_IGNORE_START / SONAR_IGNORE_END markers only for documented false positives that cannot be resolved in code; keep ignored blocks small, add explanatory comments, and require reviewer sign-off

Files:

crates/sidecar/src/gateway.rs

**/*.{js,ts,tsx,jsx,py,rs,go,java,c,cpp,h,cc,cxx,cs,rb,php,swift,kt}

📄 CodeRabbit inference engine (.agents/skills/prepare-pr/SKILL.md)

Changed files must be formatted with the language-native formatter

Files:

crates/sidecar/src/gateway.rs

**/*.{rs,py,js,ts,go,sh,toml,yaml,yml,json,md,rst}

📄 CodeRabbit inference engine (AGENTS.md)

Keep SPDX headers on source, docs, scripts, and configuration files. The project is Apache-2.0.

Files:

crates/sidecar/src/gateway.rs

**/*.{py,js,ts,tsx,go,rs,md}

📄 CodeRabbit inference engine (.agents/skills/validate-change/SKILL.md)

Format changed files with the language-native formatter before the final lint/test pass

Files:

crates/sidecar/src/gateway.rs

**/*.{rs,py,js,ts,tsx,go}

📄 CodeRabbit inference engine (.agents/skills/validate-change/SKILL.md)

During iteration, prefer uv run pre-commit run --files <changed files...> for targeted validation

Files:

crates/sidecar/src/gateway.rs

Updated the Codex configuration to replace the deprecated `features.codex_hooks` with `features.hooks`, requiring `codex-cli >= 0.129.0`. Adjusted related tests and documentation to reflect this change. Signed-off-by: Ajay Thorve <athorve@nvidia.com>

Without this, upstream providers honored the client's Accept-Encoding and returned gzip/br/zstd-encoded bodies. The bytes were forwarded to the client unchanged (which decoded normally), but the sidecar's observability capture stored the still-compressed bytes in `output.value` on LLM spans and in ATIF trajectory bodies, leaving both unreadable for any downstream consumer (Phoenix, ATIF, ATOF). Stripping the header forces upstreams to return identity-encoded bodies. The bandwidth cost is paid only on the sidecar-upstream hop; the client never asked for the encoding it would have received, so its decoders never trigger. Closes NMF-84. Signed-off-by: Ajay Thorve <athorve@nvidia.com>

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/sidecar/src/launcher.rs`:
- Around line 559-563: The toml_string function only escapes backslashes and
quotes, which can produce invalid TOML for values containing control characters
(newlines, tabs, etc.); update toml_string to produce correct TOML basic-string
escaping by either (1) replacing it with
toml::Value::String(value.to_string()).to_string() if the toml crate is
available, or (2) extend the escaping logic in toml_string to handle all
TOML-required escapes (e.g., \n, \t, \r, \uXXXX for control characters) so that
values like base_url always serialize into valid TOML.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: d340a09b-2be8-4446-841d-36be1689170c

📥 Commits

Reviewing files that changed from the base of the PR and between 74189e8 and eb16758.

📒 Files selected for processing (5)

crates/sidecar/src/gateway.rs
crates/sidecar/src/launcher.rs
crates/sidecar/tests/coverage/launcher_tests.rs
docs/integrate-frameworks/coding-agent-codex.md
integrations/coding-agents/codex/README.md

📜 Review details

🧰 Additional context used

📓 Path-based instructions (30)

**/*.{md,rst,html,txt}

📄 CodeRabbit inference engine (.agents/skills/review-doc-style/assets/nvidia-style-brand-terminology.md)

**/*.{md,rst,html,txt}: Always spell NVIDIA in all caps. Do not use Nvidia, nvidia, nVidia, nVIDIA, or NV.
Use an NVIDIA before a noun because the name starts with an 'en' sound.
Do not add a registered trademark symbol after NVIDIA when referring to the company.
Use trademark symbols with product names only when the document type or legal guidance requires them.
Verify official capitalization, spacing, and hyphenation for product names.
Precede NVIDIA product names with NVIDIA on first mention when it is natural and accurate.
Do not rewrite product names for grammar or title-case rules.
Preserve third-party product names according to the owner's spelling.
Include the company name and full model qualifier on first use when it helps identify the model.
Preserve the official capitalization and punctuation of model names.
Use shorter family names only after the full name is established.
Spell out a term on first use and put the acronym in parentheses unless the acronym is widely understood by the intended audience.
Use the acronym on later mentions after it has been defined.
For long documents, reintroduce the full term if readers might lose context.
Form plurals of acronyms with s, not an apostrophe, such as GPUs.
In headings, common acronyms can remain abbreviated. Spell out the term in the first or second sentence of the body.
Common terms such as CPU, GPU, PC, API, and UI usually do not need to be spelled out for developer audiences.

Files:

integrations/coding-agents/codex/README.md
docs/integrate-frameworks/coding-agent-codex.md

**/*.{md,rst,html}

📄 CodeRabbit inference engine (.agents/skills/review-doc-style/assets/nvidia-style-brand-terminology.md)

Link the first mention of a product name when the destination helps the reader.

Files:

integrations/coding-agents/codex/README.md
docs/integrate-frameworks/coding-agent-codex.md

**/*.{md,rst,txt}

📄 CodeRabbit inference engine (.agents/skills/review-doc-style/assets/nvidia-style-guide.md)

**/*.{md,rst,txt}: Spell NVIDIA in all caps. Do not use Nvidia, nvidia, or NV.
Format commands, code elements, expressions, package names, file names, and paths as inline code.
Use descriptive link text. Avoid raw URLs and weak anchors such as 'here' or 'read more.'
Use title case consistently for technical documentation headings.
Introduce code blocks, lists, tables, and images with complete sentences.
Write procedures as imperative steps. Keep steps parallel and split long procedures into smaller tasks.
Prefer active voice, present tense, short sentences, contractions, and plain English.
Use can for possibility and reserve may for permission.
Use after for temporal relationships instead of once.
Prefer refer to over see when the wording points readers to another resource.
Avoid culture-specific idioms, unnecessary Latinisms, jokes, and marketing exaggeration in technical documentation.
Spell out months in body text, avoid ordinal dates, and use clear time zones.
Spell out whole numbers from zero through nine unless they are technical values, parameters, versions, or UI values.
Use numerals for 10 or greater and include commas in thousands.
Do not add trademark symbols to learning-oriented documentation unless the source, platform, or legal guidance explicitly requires them.
Do not add trademark symbols to NeMo Flow learning documentation by default.
Do not rewrite API names, package names, command flags, or code literals for style reasons.

Files:

integrations/coding-agents/codex/README.md
docs/integrate-frameworks/coding-agent-codex.md

**/*.{md,markdown,rst}

📄 CodeRabbit inference engine (.agents/skills/review-doc-style/assets/nvidia-style-technical-docs.md)

**/*.{md,markdown,rst}: Use title case consistently in technical documentation headings
Avoid quotation marks, ampersands, and exclamation marks in headings
Keep product, event, research, and whitepaper names in their official title case
Use title case for table headers
Do not force social-media sentence case into technical docs
Use monospace formatting for code elements, commands, parameters, package names, and expressions
Use monospace formatting for directories, file names, and paths
Use angle brackets inside monospace for variables inside paths, such as /home/<username>/.login
Use quotation marks for error messages and strings in documentation
Use bold formatting for UI buttons, menus, fields, and labels in documentation
Use angle brackets between UI labels for menu paths, such as File > Save As
Use italics for new terms on first use in documentation
Use italics for publication titles in documentation
Use plain text formatting for keyboard shortcuts in documentation
Prefer [NVIDIA/NeMo](link) format for GitHub repository references over generic phrases like 'the GitHub repo'
Introduce every code block with a complete sentence
Do not make a code block complete the grammar of the previous sentence
Do not continue a sentence after a code block
Use syntax highlighting when the format supports it for code blocks
Avoid the word 'snippet' unless the surrounding docs already use it as a term of art
Keep inline method, function, and class references consistent with nearby docs, omitting empty parentheses for prose readability when no call is shown
Use descriptive anchor text that matches the destination title when possible for links
Avoid raw URLs in running text in documentation
Avoid generic link anchors such as 'here,' 'this page,' and 'read more' in documentation
Include the acronym in link text if a linked term includes an acronym
Do not link long sentences or multiple sentences in documentation
Avoid links that pull readers away from a procedure unles...

Files:

integrations/coding-agents/codex/README.md
docs/integrate-frameworks/coding-agent-codex.md

**/*.{html,md}

📄 CodeRabbit inference engine (CONTRIBUTING.md)

Include SPDX license headers in HTML and Markdown files using HTML comment syntax

Files:

integrations/coding-agents/codex/README.md
docs/integrate-frameworks/coding-agent-codex.md

**/README.md

📄 CodeRabbit inference engine (CONTRIBUTING.md)

Update relevant crate or package README when that surface changed

Relevant package or crate README.md files must be updated when examples or binding guidance changes

Files:

integrations/coding-agents/codex/README.md

**/*.md

📄 CodeRabbit inference engine (CONTRIBUTING.md)

Run Markdown link checking via lychee for README.md, CONTRIBUTING.md, and docs/ through pre-commit hooks

Files:

integrations/coding-agents/codex/README.md
docs/integrate-frameworks/coding-agent-codex.md

**/*.{md,markdown,py,sh,bash,js,ts,java,cpp,go,rust}

📄 CodeRabbit inference engine (.agents/skills/contribute-docs/SKILL.md)

Keep package names, repo references, and build commands current in documentation

Files:

integrations/coding-agents/codex/README.md
docs/integrate-frameworks/coding-agent-codex.md

**/*.{md,markdown,py,sh,bash}

📄 CodeRabbit inference engine (.agents/skills/contribute-docs/SKILL.md)

Keep stable user-facing wrappers at scripts/ root in docs and examples; only point at namespaced helper paths when documenting internal maintenance work

Files:

integrations/coding-agents/codex/README.md
docs/integrate-frameworks/coding-agent-codex.md

**/*.{md,markdown,py,sh,bash,js,ts,example}

📄 CodeRabbit inference engine (.agents/skills/contribute-docs/SKILL.md)

Example commands must match current package names and paths

Files:

integrations/coding-agents/codex/README.md
docs/integrate-frameworks/coding-agent-codex.md

**/{integrations,integration,*-integration}/**

📄 CodeRabbit inference engine (.agents/skills/contribute-integration/SKILL.md)

**/{integrations,integration,*-integration}/**: Keep NeMo Flow optional in framework integrations
Preserve the framework's original behavior when NeMo Flow is absent
Wrap tool and LLM paths at the correct framework boundary
Integration pattern must follow docs/integrate-frameworks/adding-scopes.md

Files:

integrations/coding-agents/codex/README.md

**/*.{md,txt,rst}

📄 CodeRabbit inference engine (.agents/skills/review-doc-style/SKILL.md)

**/*.{md,txt,rst}: Ensure commands, package names, file paths, and APIs in documentation are correct and not stale; flag incorrect or outdated information as blocking issues
Ensure examples and procedures in documentation will execute successfully with current APIs and commands
Use consistent user-facing terminology throughout documentation that matches current repo terminology
Capitalize NVIDIA correctly in all documentation and public-facing text
Format code, commands, paths, and filenames as inline code (monospace) in documentation
Use descriptive anchor text for links instead of bare URLs or weak labels like 'here' in documentation
Prefer active voice, present tense, short sentences, and plain English in documentation
Structure documentation procedures as imperative steps that are easy to scan and not too long for a single sequence
Prefer 'after' instead of 'once' for temporal references in documentation
Use 'can' instead of 'may' when describing possibility (rather than permission) in documentation
Avoid ambiguous numeric dates and ordinal dates in documentation body text

Files:

integrations/coding-agents/codex/README.md
docs/integrate-frameworks/coding-agent-codex.md

{README.md,docs/index.md,**/README.md}

📄 CodeRabbit inference engine (.agents/skills/review-doc-style/SKILL.md)

Update entry-point documentation (README.md, docs/index.md, package READMEs, binding-level source READMEs) whenever public behavior changes

Files:

integrations/coding-agents/codex/README.md

**/*.{rs,py,js,ts,go,sh,toml,yaml,yml,json,md,rst}

📄 CodeRabbit inference engine (AGENTS.md)

Keep SPDX headers on source, docs, scripts, and configuration files. The project is Apache-2.0.

Files:

integrations/coding-agents/codex/README.md
docs/integrate-frameworks/coding-agent-codex.md
crates/sidecar/src/gateway.rs
crates/sidecar/tests/coverage/launcher_tests.rs
crates/sidecar/src/launcher.rs

**/*.{py,js,ts,tsx,go,rs,md}

📄 CodeRabbit inference engine (.agents/skills/validate-change/SKILL.md)

Format changed files with the language-native formatter before the final lint/test pass

Files:

integrations/coding-agents/codex/README.md
docs/integrate-frameworks/coding-agent-codex.md
crates/sidecar/src/gateway.rs
crates/sidecar/tests/coverage/launcher_tests.rs
crates/sidecar/src/launcher.rs

docs/**/*.md

📄 CodeRabbit inference engine (CONTRIBUTING.md)

Run ./scripts/build-docs.sh for documentation site changes

docs/**/*.md: Relevant getting-started or reference docs must be updated when examples change
Release-policy docs must point to GitHub Releases as the only release-history source of truth

docs/**/*.md: Use title case for headings in technical documentation
Introduce code blocks, tables, and lists with complete lead-in sentences in documentation

Files:

docs/integrate-frameworks/coding-agent-codex.md

{RELEASING.md,CHANGELOG.md,docs/**/*.md}

📄 CodeRabbit inference engine (.agents/skills/contribute-docs/SKILL.md)

Keep release-process and release-notes guidance in repo-maintainer docs such as RELEASING.md, not as user-facing docs pages or CHANGELOG.md

Files:

docs/integrate-frameworks/coding-agent-codex.md

docs/integrate-frameworks/**/*.md

📄 CodeRabbit inference engine (.agents/skills/contribute-integration/SKILL.md)

Documentation must be updated if activation or usage of the integration changed

Files:

docs/integrate-frameworks/coding-agent-codex.md

{scripts/*.sh,docs/**/*.md}

📄 CodeRabbit inference engine (.agents/skills/contribute-integration/SKILL.md)

Use root ./scripts/*.sh commands in docs and contributor guidance as documented, with implementations under scripts/third-party/

Files:

docs/integrate-frameworks/coding-agent-codex.md

{docs/**,examples/**,crates/adaptive/**,python/nemo_flow/**,go/nemo_flow/**,**/{example,component}.{ts,tsx,js,rs,py,go}}

📄 CodeRabbit inference engine (.agents/skills/maintain-optimizer/SKILL.md)

Any new adaptive component kind must have documentation, examples, and binding coverage across all supported languages

Files:

docs/integrate-frameworks/coding-agent-codex.md

{README*,CHANGELOG*,docs/**/*.{md,rst,txt},examples/**/*,*.md}

📄 CodeRabbit inference engine (.agents/skills/rename-surfaces/SKILL.md)

Update documentation, examples, and getting-started guides with new package/module/crate names after rename operations

Files:

docs/integrate-frameworks/coding-agent-codex.md

{README.md,docs/**/*.md,examples/**/*.{js,ts,py,go,rs}}

📄 CodeRabbit inference engine (.agents/skills/maintain-packaging/SKILL.md)

Keep documentation and examples synchronized with current install, import, and build commands

Files:

docs/integrate-frameworks/coding-agent-codex.md

{README.md,CONTRIBUTING.md,docs/**/*.md}

📄 CodeRabbit inference engine (.agents/skills/validate-change/SKILL.md)

For docs-only changes, run targeted checks only if commands, package names, or examples changed. Use just docs for docs-site builds and just docs-linkcheck when links changed

Files:

docs/integrate-frameworks/coding-agent-codex.md

{docs/**,README.md,CONTRIBUTING.md,RELEASING.md,SECURITY.md}

⚙️ CodeRabbit configuration file

{docs/**,README.md,CONTRIBUTING.md,RELEASING.md,SECURITY.md}: Review documentation for technical accuracy against the current API, command correctness, and consistency across language bindings.
Flag stale examples, missing SPDX headers where required, and instructions that no longer match CI or pre-commit behavior.

Files:

docs/integrate-frameworks/coding-agent-codex.md

**/*.rs

📄 CodeRabbit inference engine (.agents/skills/test-ffi-surface/SKILL.md)

**/*.rs: Run cargo fmt --all for FFI work as it is Rust work
Run just test-rust for FFI validation
Run cargo clippy --workspace --all-targets -- -D warnings to enforce warnings-as-errors linting

When Rust files changed as part of Python work, run cargo fmt --all, just test-rust, and cargo clippy --workspace --all-targets -- -D warnings

**/*.rs: Run cargo fmt --all for Rust code formatting
Run cargo clippy --workspace --all-targets -- -D warnings to enforce Rust linting with no warnings
Run just test-rust as the shared-runtime build/test wrapper for Rust changes

Use Rust snake_case naming convention for Rust code

**/*.rs: Any Rust change must run just test-rust
Any Rust change must run cargo fmt --all
Any Rust change must run cargo clippy --workspace --all-targets -- -D warnings

**/*.rs: Run cargo fmt --all for Rust code formatting when Node changes touch Rust files
Run cargo clippy --workspace --all-targets -- -D warnings to enforce strict linting when Rust files changed as part of Node work

**/*.rs: Always run just test-rust when any Rust code changes
Always run cargo fmt --all when any Rust code changes
Always run cargo clippy --workspace --all-targets -- -D warnings when any Rust code changes

Files:

crates/sidecar/src/gateway.rs
crates/sidecar/tests/coverage/launcher_tests.rs
crates/sidecar/src/launcher.rs

**/*.{rs,go,js,ts,tsx}

📄 CodeRabbit inference engine (CONTRIBUTING.md)

Include SPDX license headers in all Rust, Go, JavaScript, and TypeScript source files using C-style comment syntax

Files:

crates/sidecar/src/gateway.rs
crates/sidecar/tests/coverage/launcher_tests.rs
crates/sidecar/src/launcher.rs

**/*.{rs,py,go,js,ts,tsx}

📄 CodeRabbit inference engine (CONTRIBUTING.md)

Use SONAR_IGNORE_START / SONAR_IGNORE_END markers only for documented false positives that cannot be resolved in code; keep ignored blocks small, add explanatory comments, and require reviewer sign-off

Files:

crates/sidecar/src/gateway.rs
crates/sidecar/tests/coverage/launcher_tests.rs
crates/sidecar/src/launcher.rs

**/*.{js,ts,tsx,jsx,py,rs,go,java,c,cpp,h,cc,cxx,cs,rb,php,swift,kt}

📄 CodeRabbit inference engine (.agents/skills/prepare-pr/SKILL.md)

Changed files must be formatted with the language-native formatter

Files:

crates/sidecar/src/gateway.rs
crates/sidecar/tests/coverage/launcher_tests.rs
crates/sidecar/src/launcher.rs

**/*.{rs,py,js,ts,tsx,go}

📄 CodeRabbit inference engine (.agents/skills/validate-change/SKILL.md)

During iteration, prefer uv run pre-commit run --files <changed files...> for targeted validation

Files:

crates/sidecar/src/gateway.rs
crates/sidecar/tests/coverage/launcher_tests.rs
crates/sidecar/src/launcher.rs

{crates/**/tests/**,python/tests/**,go/nemo_flow/**/*_test.go}

⚙️ CodeRabbit configuration file

{crates/**/tests/**,python/tests/**,go/nemo_flow/**/*_test.go}: Tests should cover the behavior promised by the changed API surface, including error paths and cross-request isolation where relevant.
Prefer assertions on lifecycle events, scope stacks, middleware ordering, and binding parity over shallow smoke tests.

Files:

crates/sidecar/tests/coverage/launcher_tests.rs

🔇 Additional comments (12)

crates/sidecar/src/gateway.rs (4)

245-246: Upstream response bodies are still fully buffered without a cap.

Line 245 and Line 378 still call upstream_response.bytes().await, which can OOM on large/malicious upstream payloads. This was previously raised and remains unresolved.

Also applies to: 378-378

503-510: Connection-token hop-by-hop headers are not stripped.

Filtering only fixed hop-by-hop names misses headers listed dynamically in Connection, so those can still be forwarded/returned. This was previously raised and still applies.

Also applies to: 530-532, 554-565

545-550: Sensitive header recording filter is still incomplete.

should_record_header still allows values like cookie and generic API-key aliases to be recorded. This was already flagged and is still unresolved.

16-17: Good fix: request-body reads are now bounded.

Using MAX_BODY_BYTES with to_bytes closes the prior unbounded-request-body risk cleanly.

Also applies to: 68-70
crates/sidecar/src/launcher.rs (3)
475-480: Product name miscapitalized in user-facing Codex provider config.

Line 477 uses "Nemo Flow OpenAI" but the product name is "NeMo Flow OpenAI" (capital M).
-        "model_providers.nemo-flow-openai={{name=\"Nemo Flow OpenAI\",base_url={},wire_api=\"responses\",requires_openai_auth=true,supports_websockets=false}}",
+        "model_providers.nemo-flow-openai={{name=\"NeMo Flow OpenAI\",base_url={},wire_api=\"responses\",requires_openai_auth=true,supports_websockets=false}}",
108-111: Server startup error discarded on health probe timeout.

When wait_for_health fails, let _ = running_server.stop().await discards any real server error, turning startup/config failures into a misleading readiness timeout message.
Suggested fix
     let running_server = RunningSidecar::start(listener, sidecar_config);
     if let Err(error) = wait_for_health(sidecar_url).await {
-        let _ = running_server.stop().await;
-        return Err(error);
+        return match running_server.stop().await {
+            Ok(()) => Err(error),
+            Err(server_error) => Err(SidecarError::Launch(format!(
+                "{server_error}; health check also failed: {error}"
+            ))),
+        };
     }
168-180: split_whitespace() breaks quoted paths in configured commands.

Paths with spaces (e.g., "C:\Program Files\Codex\codex.exe" --full-auto) will be corrupted. The comment acknowledges this as a convenience tradeoff, but consider using a shell-style splitter like shlex::split for cross-platform robustness.
integrations/coding-agents/codex/README.md (1)
103-104: Product name miscapitalized.

Line 104 uses "Nemo Flow OpenAI" but the product name is "NeMo Flow OpenAI".
-name = "Nemo Flow OpenAI"
+name = "NeMo Flow OpenAI"
docs/integrate-frameworks/coding-agent-codex.md (2)
89-90: Product name miscapitalized.

Line 90 uses "Nemo Flow OpenAI" but the product name is "NeMo Flow OpenAI".
-name = "Nemo Flow OpenAI"
+name = "NeMo Flow OpenAI"
111-114: Documentation references deprecated configuration keys.

Lines 112-113 document features.codex_hooks=true and codex_hooks = true, but the implementation at launcher.rs:321 uses the canonical features.hooks=true. This inconsistency will confuse users.
Suggested fix
 The transparent wrapper passes hook entries as Codex CLI config overrides and
-sets `features.codex_hooks=true` for that launched process. Persistent install
-writes `.codex/config.toml` with `codex_hooks = true` and merges generated hook
+sets `features.hooks=true` for that launched process. Persistent install
+writes `.codex/config.toml` with `hooks = true` and merges generated hook
 entries into `.codex/hooks.json`.
crates/sidecar/tests/coverage/launcher_tests.rs (2)
223-252: Process cwd restoration is not panic-safe.

Tests that mutate the process working directory (lines 227-228, 251) restore it with a trailing set_current_dir(previous). If the test panics before reaching line 251, subsequent tests run in the wrong directory. Use a RAII guard pattern.
Suggested pattern
struct CwdGuard(PathBuf);

impl Drop for CwdGuard {
    fn drop(&mut self) {
        let _ = std::env::set_current_dir(&self.0);
    }
}

#[test]
fn cursor_patching_can_be_disabled() {
    let _guard = current_dir_lock().lock().unwrap();
    let temp = tempfile::tempdir().unwrap();
    let _cwd_guard = CwdGuard(std::env::current_dir().unwrap());
    std::env::set_current_dir(temp.path()).unwrap();
    // ... test body ...
    // No manual restore needed; CwdGuard::drop handles it
}
This pattern should be applied to all affected tests: cursor_patching_can_be_disabled, cursor_patch_restore_restores_original_file, cursor_patch_restore_uses_nearest_project_cursor_dir, cursor_patch_restore_removes_temporary_file, and cursor_dry_run_does_not_write_hooks.

1-593: LGTM on test coverage.

The test suite comprehensively covers agent inference, configured command resolution, agent-specific preparation (Codex, Claude, Hermes, Cursor), cursor hook patch/restore scenarios, dry-run behavior, and async runtime behavior including sidecar URL injection and exit code propagation. The OS-specific fake agent commands are well-structured for cross-platform testing.

…odex) Three independent issues silently disabled transparent-run hooks for Claude Code; this commit fixes all three: 1. Hook subprocesses couldn't find the sidecar binary on PATH. `hook_forward_command` returned the bare name `nemo-flow-sidecar`, which Claude/Codex/Cursor passed to their hook subprocesses. Those subprocesses inherit the agent's environment, where the sidecar's install location (e.g. target/debug) typically isn't on PATH, so every hook exited with status 127 (command not found). Add an `executable` parameter so transparent-run callers can pass `current_exe()`; persistent install keeps the bare name. 2. Claude Code 2.1.x's hook loader strict-rejects unknown event names — any single unknown event causes the ENTIRE hooks file to be rejected silently, disabling all hooks. We were injecting `AfterAgentResponse` and `AfterAgentThought`, neither in Claude's whitelist. Drop both. Also add `PermissionRequest` and `PostCompact`, which are in Claude's whitelist and in Codex 0.129's event surface. 3. Claude's hook output validator rejects null for optional string fields. The Claude adapter's Stop response carried `stopReason: null` and was rejected with a runtime validation error. Omit the field entirely; the response collapses to the generic `{ continue: true }`. Also regenerate the static integration bundles for claude-code and codex with the corrected event list. The same whitelist mismatch broke persistent install for Claude users. Verified end-to-end against Claude Code 2.1.137: - SessionStart, UserPromptSubmit, PreToolUse:WebSearch, PostToolUse:WebSearch, Stop, SubagentStop all fire successfully - Phoenix shows [AGENT] claude-code root span with tool spans - ATIF written on session-end (1.7 MB / 1.3 MB across two sessions) Codex 0.129 lifecycle (SessionStart through PreToolUse/PostToolUse) now also fires correctly — Phoenix shows [AGENT] codex root with Bash tool spans. Codex transparent-run ATIF flush still pending: codex 0.129 has no SessionEnd-equivalent hook, so end_agent never runs without a wrapper-driven flush. Tracking that piece separately. Signed-off-by: Ajay Thorve <athorve@nvidia.com>

…gateway" Previously, when a real agent's gateway request beat its SessionStart hook, `SessionManager::start_llm` created the session with `AgentKind::Gateway`. Observer identities (ATIF agent name, Phoenix scope name) are baked at scope-open time inside `ensure_agent_started`, so the session stayed labeled "gateway" for its entire lifetime even after SessionStart arrived. Result: Phoenix root spans for real Claude/Codex sessions read as `[AGENT] gateway` instead of `[AGENT] claude-code` / `[AGENT] codex`. Two targeted fixes: 1. In `SessionManager::start_llm`, infer the agent kind from the gateway provider when creating a new session: `anthropic.messages` and `anthropic.count_tokens` → `ClaudeCode`; `openai.responses` → `Codex`. Unknown providers (e.g., `openai.chat_completions`, `openai.models`) keep the legacy `Gateway` label so synthetic gateway-only sessions for unattributed proxy traffic remain semantically intact. 2. In `SessionManager::apply_events`, when an `AgentStarted` event arrives for an existing session whose `agent_kind` is still `Gateway` (i.e., the gateway path created the session before the hook), upgrade the session's `agent_kind` to the event's real kind. This fixes session-level metadata for the rare case where provider inference fell back to `Gateway` (e.g., chat-completions traffic) but a SessionStart hook later identifies the agent. Tests cover all three branches: anthropic gateway → claude-code label, openai.responses gateway → codex label, ambiguous provider → gateway label preserved. Signed-off-by: Ajay Thorve <athorve@nvidia.com>

…nd hook flag Three small fixes from CodeRabbit's pass on PR NVIDIA#63: 1. `Nemo Flow OpenAI` → `NeMo Flow OpenAI` in the codex transparent-run provider config (`launcher.rs:477`) and matching docs in `docs/integrate-frameworks/coding-agent-codex.md` and `integrations/coding-agents/codex/README.md`. Use the official NeMo Flow capitalization in user-facing TOML config and docs. 2. Update codex hook configuration documentation to use the canonical `features.hooks` / `hooks = true` keys instead of the legacy `features.codex_hooks` / `codex_hooks = true`. The legacy spelling is still accepted as an alias on codex 0.129+ but the canonical form is what new users should configure. Retain a one-line note that the old spelling is the legacy alias so users on older codex still understand the deprecation path. Resolves CodeRabbit comments 3211156569, 3211760910, 3211760919, 3211156599. No code behavior changes; 128 sidecar tests still pass. Signed-off-by: Ajay Thorve <athorve@nvidia.com>

Codex 0.129's hook surface has no `SessionEnd`-equivalent event — its closest session-termination signal, `Stop`, fires per-conversation-turn rather than at session end. As a result, codex transparent runs through `nemo-flow-sidecar run -- codex` produced empty `.nemo-flow/atif/` directories: the existing `flush_observers` path (`end_agent` → `write_atif`) never fired. Add a new `NormalizedEvent::TurnEnded` variant. When `classify` sees a `Stop` hook AND the primary classification is not already terminal, it emits both the existing `LlmHint` (preserving correlation behavior for subsequent LLM calls) AND a `TurnEnded` event. `Session::apply` routes `TurnEnded` to a new `snapshot_atif` method that writes the cumulative ATIF without closing the agent scope or shutting down the OpenInference subscriber. The next turn keeps adding events; the final Stop's snapshot is the complete trajectory. The `is_terminal()` guard is critical: Cursor's adapter classifies `stop` as `AgentEnded`, so without the guard we'd emit `[AgentEnded, TurnEnded]` — the first call tears down the session and writes ATIF, the second creates an empty synthetic session and overwrites the freshly-written ATIF with an empty trajectory. Safety verified by reading `AtifExporter`: `export()` takes `&self`, documented as "does not clear the buffered events." Multiple snapshot writes are cumulative supersets, last-write-wins yields the complete trajectory. E2E verified against codex 0.129: - Real codex session writes a 1.6 MB ATIF on Ctrl+D exit (was 0 bytes before) - Multi-turn captured (5 steps: 2 user, 2 agent, 1 system) - `agent.name = "codex"` (from earlier provider-inference fix) - Phoenix root remains `[AGENT] codex` with proper nested LLM/tool spans Hermes is unaffected — it has no `Stop` hook event. Partially closes NMF-91. The remaining cross-process subagent unification gap in NMF-91 stays upstream-blocked: codex spawns subagents as separate codex processes with no `SubagentStart` hook, so each subagent's gateway-only synthetic session is still its own Phoenix trace and its own ATIF (or no ATIF when no Stop fires for it). Tests: - adapter: codex Stop emits TurnEnded - adapter: claude Stop emits TurnEnded - adapter: cursor stop does NOT also emit TurnEnded (regression: would wipe ATIF) - session: TurnEnded snapshots ATIF without closing scope; cumulative across turns - session: TurnEnded is a no-op when the session has no agent scope Signed-off-by: Ajay Thorve <athorve@nvidia.com>

Defines the streaming counterpart to LlmResponseCodec, used with llm_stream_call_execute. Each provider's StreamingCodec impl owns the incremental state needed to assemble per-chunk events into a single non-streaming-shape JSON payload, which the matching LlmResponseCodec then decodes into AnnotatedLlmResponse. Per Will's design suggestion in NMF-90: trait StreamingCodec { fn collector(&self) -> LlmCollectorFn; fn finalizer(&self) -> LlmFinalizerFn; } Includes an SseEventDecoder utility that incrementally splits a text/event-stream byte stream into parsed SSE events. Anthropic Messages, OpenAI Responses, and OpenAI Chat Completions all emit one JSON object per data: line, so the decoder is shared infrastructure for the per-provider StreamingCodec impls landing in follow-up commits. Refs NMF-90. Signed-off-by: Ajay Thorve <athorve@nvidia.com>

Implements StreamingCodec for the Anthropic Messages SSE event sequence. The collector observes message_start, content_block_start/delta/stop, message_delta, and message_stop events, accumulating per-block state (text, tool input partial_json, citations) keyed by content_block index. The finalizer produces a JSON object wire-compatible with RawAnthropicResponse — the same shape Anthropic returns for a non-streaming Messages request — so the assembled output can be fed back through AnthropicMessagesCodec::decode_response to produce the canonical AnnotatedLlmResponse. Streaming and non-streaming Anthropic requests now converge on the same observability output without per-route shape duplication. Internal state lives behind Arc<Mutex<...>> so the &self-produced collector and finalizer closures share access. Each instance is single-use because LlmFinalizerFn consumes the finalize step. Tests cover: - text-only response with usage and stop_reason mapping - tool_use input assembled from partial_json fragments - web_search_tool_result blocks (full content delivered at start) - text blocks with citations_delta accumulation - truncated streams falling back to raw partial_json string Refs NMF-90. Signed-off-by: Ajay Thorve <athorve@nvidia.com>

Implements StreamingCodec for the OpenAI Responses SSE event sequence. The Responses API streams either complete `response` snapshots (response.created, response.in_progress, response.completed, response.failed, response.incomplete) or final-state output items (response.output_item.done), so the codec: 1. Tracks the latest response snapshot — terminal events typically carry the complete state including `output`. 2. Tracks output items by output_index from output_item.done events, used as a fallback when the terminal snapshot lacks `output`. 3. Ignores per-token deltas (output_text.delta, function_call_arguments.delta) because their content is redelivered in the matching output_item.done. The finalizer produces a JSON object wire-compatible with RawResponsesResponse — feed it back through OpenAIResponsesCodec:: decode_response to produce the canonical AnnotatedLlmResponse. Tests cover: - terminal response.completed carries full state (common case) - terminal completed lacks output → fall back to output_item.done items - response.incomplete with reason maps to FinishReason::Length - per-token deltas don't double-accumulate Refs NMF-90. Signed-off-by: Ajay Thorve <athorve@nvidia.com>

Implements StreamingCodec for the OpenAI Chat Completions SSE chunk sequence. Each delta chunk may carry role (typically only the first), incremental content fragments, partial tool_calls whose function.arguments stream as a string, or a finish_reason terminator. Top-level fields (id, object, created, model) are repeated on every chunk; we capture them once. Final-chunk usage is preserved when emitted (only sent when stream_options.include_usage is set). The finalizer produces a JSON object wire-compatible with a non-streaming Chat Completions response — feed it back through OpenAIChatCodec::decode_response to produce the canonical AnnotatedLlmResponse. The `object` field is normalized from `chat.completion.chunk` to `chat.completion` so the assembled JSON matches what a non-streaming response would carry. Also: SseEventDecoder now drops the `data: [DONE]` end-of-stream sentinel that OpenAI Chat Completions emits. It's a wire-level marker, not a JSON payload, and would otherwise fail JSON parsing. Tests cover: - text-only response with usage and finish_reason mapping - tool_call arguments assembled from string fragments - content: null preserved when only tool calls streamed - multiple choices (n > 1 requests) tracked separately - intermediate null usage chunks don't overwrite real usage Refs NMF-90. Signed-off-by: Ajay Thorve <athorve@nvidia.com>

Match the project's existing test layout: every other codec file in crates/core/src/codec/ pulls its tests from crates/core/tests/unit/codec/<name>_tests.rs via #[path]. The streaming codec tests were inlined in src/codec/streaming.rs in the initial commit; move them out to follow the same convention so source files stay slim and all codec test files cluster under tests/unit/codec/. No behavior change. 283 core lib tests still passing. Refs NMF-90. Signed-off-by: Ajay Thorve <athorve@nvidia.com>

Refactors the gateway proxy to use llm_call_execute and llm_stream_call_execute so the runtime owns LLM start/end events, codec annotation, and stream lifecycle. Per-route dispatch wires Anthropic Messages, OpenAI Responses, and OpenAI Chat Completions to their existing typed codecs and the new streaming codecs landed earlier in this PR. A new SessionManager::prepare_gateway_call helper resolves the right session, opens the agent scope, and computes correlation metadata without holding the session lock during the upstream HTTP work. The buffered path returns the captured upstream bytes verbatim; the streaming path re-encodes parsed events back into SSE frames for the client (Option B). Upstream connection failures are still surfaced as 502 Bad Gateway via a side channel that captures the original reqwest error before the runtime collapses it to FlowError::Internal. Trade-off A vs B for streaming response forwarding: A. Byte tee: split the raw upstream byte stream between the client and a side capture that feeds the runtime collector. Bytes pass through unchanged, but observability hooks see the literal wire bytes (gzip/identity) rather than parsed events, and the collector code has to duplicate per-provider SSE framing logic just for capture. B. Re-encode (chosen): parse SSE on the upstream side, feed parsed JSON events to the runtime collector, then re-encode events as SSE for the client. Adds one parse + one serialize per chunk, but the runtime sees identical event shapes for streaming and non-streaming requests, and chunk-level observability matches the codec's annotated response. The legacy SessionManager::start_llm / end_llm path is retained behind cfg(test) so existing correlation unit tests continue to exercise resolve_llm_owner. Streaming tool-hint extraction is intentionally deferred because the runtime synthesizes the aggregate response inside LlmStreamWrapper and does not surface it back to the gateway. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Ajay Thorve <athorve@nvidia.com>

…on-lifecycle

Salonijain27

Approved from a dependency point of view

…aders `should_record_header` previously allowed `Cookie` and the generic `api-key` alias into the observability metadata (`LlmRequest.headers`) on every gateway request. Those values then propagate to ATIF trajectory headers and OpenInference span attributes, leaking session credentials and provider API keys to anyone with access to the trace store. Filter both alongside the existing `Authorization`, `x-api-key`, and `anthropic-api-key` excludes. `HeaderName::as_str()` returns the canonical lowercase form so the string comparisons remain case-insensitive by construction. `api-key` covers the generic alias used by Azure OpenAI and similar proxied deployments; we already block `x-api-key` (OpenAI SDK default) and `anthropic-api-key`. `Cookie` is added explicitly even though it has a dedicated header constant, because it carries session credentials with the same operational risk profile as bearer tokens. Test `removes_hop_by_hop_headers` extended to assert both new excludes. 122 sidecar unit + 9 CLI tests still pass. Resolves CodeRabbit comment #3211156538. Signed-off-by: Ajay Thorve <athorve@nvidia.com>

Signed-off-by: Will Killian <2007799+willkill07@users.noreply.github.com>

…e AgentEnded hermes-agent fires `on_session_end` at every user-turn boundary, distinct from `on_session_finalize` which marks the real session close. Sessions terminated before finalize (e.g., Ctrl+D, abrupt quit) left their ATIF un-flushed because nothing emitted a `TurnEnded` at the per-turn boundary. The hermes adapter now appends a `TurnEnded` after each `on_session_end`, mirroring the per-turn snapshot pattern that codex already uses for `stop`. Also harden `Session::end_agent` against duplicate AgentEnded events: today `ensure_agent_started` would re-open an empty agent scope after the real one had already closed, and `flush_observers` would overwrite the just-written ATIF with an empty trajectory. The new guard returns early when no agent scope is open. No behavioral change on the happy path; the existing `close_agent_scope` idempotency comment already anticipated this case. Tests: - `maps_hermes_real_session_boundary_without_closing_per_turn_end` extended to assert the per-turn TurnEnded emission alongside the HookMark. - `duplicate_agent_end_does_not_overwrite_atif_with_empty_session` added to pin the idempotency contract. Found while running hermes end-to-end through nemo-flow-sidecar against inference-api.nvidia.com with Phoenix and ATIF observers. Signed-off-by: Ajay Thorve <athorve@nvidia.com>

AjayThorve

Tested e2e with Codex, claude-code and hermes.

Claude-code: Works really well
Codex: Works well, but sub-agent hooks are missing (tracking in NMF-88 on linear)
Hermes: Works well, but sub-agent-start hook missing (tracking in NMF-93)

willkill07 · 2026-05-11T17:02:09Z

/merge

Add coding agent sidecar integrations

5f35146

Signed-off-by: Will Killian <wkillian@nvidia.com>

github-actions Bot added size:XXL PR is very large feat lang:rust PR changes/introduces Rust code labels May 5, 2026

willkill07 self-assigned this May 5, 2026

willkill07 added this to the 0.2.0 milestone May 5, 2026

willkill07 and others added 5 commits May 5, 2026 16:06

chore: publish sidecar crate in release helpers

c508ae0

Signed-off-by: Will Killian <wkillian@nvidia.com>

test: increase sidecar coverage

8d95966

Signed-off-by: Will Killian <wkillian@nvidia.com>

Add transparent sidecar execution

afe8a88

Signed-off-by: Will Killian <wkillian@nvidia.com>

Fix sidecar review findings

c087437

Signed-off-by: Will Killian <wkillian@nvidia.com>

willkill07 force-pushed the wkk_coding-agent-sidecar-integrations branch from 9594050 to 841d583 Compare May 6, 2026 18:53

willkill07 and others added 5 commits May 6, 2026 14:54

Merge branch 'main' into wkk_coding-agent-sidecar-integrations

aaf8c8c

feat(sidecar): capture Hermes API hook token metrics

173bca9

Signed-off-by: GSD Agent <bbednarski@nvidia.com>

feat: add LLM and Tool best-effort correlation; Hermes docs

c1be4ee

Signed-off-by: Will Killian <wkillian@nvidia.com>

feat(sidecar): capture Hermes API hook token metrics

bfd9f73

Signed-off-by: GSD Agent <bbednarski@nvidia.com>

ci: add sidecar binary publishing

1d896bf

Signed-off-by: Will Killian <wkillian@nvidia.com>

willkill07 force-pushed the wkk_coding-agent-sidecar-integrations branch from 173bca9 to 1d896bf Compare May 6, 2026 23:14

willkill07 and others added 4 commits May 6, 2026 19:29

test: make sidecar launcher test portable

23315c0

Signed-off-by: Will Killian <wkillian@nvidia.com>

Merge branch 'main' into wkk_coding-agent-sidecar-integrations

f5f9a38

ci(windows): get build to pass

2201962

Signed-off-by: Will Killian <wkillian@nvidia.com>

Merge branch 'main' into wkk_coding-agent-sidecar-integrations

b83d514

Signed-off-by: Will Killian <2007799+willkill07@users.noreply.github.com>

github-actions Bot added the lang:python PR changes/introduces Python code label May 7, 2026

bbednarski9 added 2 commits May 7, 2026 13:16

feat(python): support response annotations on manual LLM span end

7879885

Signed-off-by: Bryan Bednarski <bbednarski@nvidia.com>

bbednarski9 force-pushed the wkk_coding-agent-sidecar-integrations branch from bf50036 to 7879885 Compare May 7, 2026 20:16

Merge remote-tracking branch 'upstream/main' into wkk_coding-agent-si…

dec9bfd

…decar-integrations Signed-off-by: Will Killian <wkillian@nvidia.com> # Conflicts: # ATTRIBUTIONS-Rust.md # Cargo.lock

willkill07 marked this pull request as ready for review May 8, 2026 17:50

AjayThorve reviewed May 8, 2026

View reviewed changes

Comment thread crates/sidecar/src/launcher.rs Outdated

coderabbitai Bot reviewed May 8, 2026

View reviewed changes

coderabbitai Bot reviewed May 9, 2026

View reviewed changes

Comment thread crates/sidecar/src/gateway.rs Outdated

AjayThorve added 2 commits May 8, 2026 19:38

AjayThorve force-pushed the wkk_coding-agent-sidecar-integrations branch from 74189e8 to eb16758 Compare May 9, 2026 02:38

coderabbitai Bot reviewed May 9, 2026

View reviewed changes

Comment thread crates/sidecar/src/launcher.rs

AjayThorve and others added 11 commits May 8, 2026 19:51

Merge pull request #1 from AjayThorve/ajay/nmf-90-managed-llm-executi…

5a193ea

…on-lifecycle

Salonijain27 previously approved these changes May 9, 2026

View reviewed changes

AjayThorve dismissed Salonijain27’s stale review via 82f5fa9 May 11, 2026 06:03

AjayThorve reviewed May 11, 2026

View reviewed changes

Comment thread .gitlab-ci.yml

willkill07 and others added 2 commits May 11, 2026 11:12

Merge branch 'main' into wkk_coding-agent-sidecar-integrations

000cca6

Signed-off-by: Will Killian <2007799+willkill07@users.noreply.github.com>

AjayThorve approved these changes May 11, 2026

View reviewed changes

dagardner-nv approved these changes May 11, 2026

View reviewed changes

rapids-bot Bot merged commit 2b7f565 into NVIDIA:main May 11, 2026
116 of 117 checks passed

coderabbitai Bot mentioned this pull request May 11, 2026

refactor: rename sidecar crate to CLI #77

Merged

2 tasks

willkill07 added the Feature a new feature label May 12, 2026

Conversation

willkill07 commented May 5, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Details

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Uh oh!

copy-pr-bot Bot commented May 7, 2026

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Salonijain27 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

AjayThorve left a comment

Choose a reason for hiding this comment

Uh oh!

willkill07 commented May 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

willkill07 commented May 5, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 5, 2026 •

edited

Loading