Skip to content

feat: add coding-agent sidecar integrations#63

Merged
rapids-bot[bot] merged 36 commits into
NVIDIA:mainfrom
willkill07:wkk_coding-agent-sidecar-integrations
May 11, 2026
Merged

feat: add coding-agent sidecar integrations#63
rapids-bot[bot] merged 36 commits into
NVIDIA:mainfrom
willkill07:wkk_coding-agent-sidecar-integrations

Conversation

@willkill07
Copy link
Copy Markdown
Member

@willkill07 willkill07 commented May 5, 2026

Overview

Adds the Rust nemo-flow-sidecar binary and coding-agent integration surface for observing Claude Code, Codex, Cursor, and Hermes agent activity through hook ingestion, transparent sidecar execution, passthrough LLM gateway routes, ATIF export, and OpenInference export. The branch now covers the sidecar runtime, installer behavior, agent-specific hook bundles/docs, Hermes-specific shell hook support, best-effort LLM/tool correlation, token metrics extraction, and CI/release plumbing for publishing the sidecar crate and binary artifacts.

  • I confirm this contribution is my own work, or I have the right to submit it under this project's license.
  • I searched existing issues and open pull requests, and this does not duplicate existing work.

Details

  • Adds the nemo-flow-sidecar workspace crate with CLI entry points for serve, run, install, and hook-forward.
  • Adds hook endpoints and adapters for Claude Code, Codex, Cursor, and Hermes, including Hermes YAML config merge support and runtime NEMO_FLOW_SIDECAR_URL handling for ephemeral sidecar runs.
  • Adds sidecar session management for agent, subagent, tool, LLM, ATIF, and OpenInference lifecycle events, with shared extraction of task/session IDs, parent relationships, tool call IDs, results, and sidecar metadata.
  • Adds passthrough gateway routes for OpenAI Responses, OpenAI Chat Completions, Anthropic Messages, Anthropic token counting, and model listing, plus best-effort correlation between gateway LLM activity and hook-reported tool calls.
  • Updates ATIF export behavior to preserve the unwrapped LLM request payload, extract provider-native usage/token metrics, include cached-token variants, and improve source call correlation.
  • Adds installable Claude Code and Codex hook/plugin bundles plus a Cursor hook bundle under integrations/coding-agents/; Hermes is configured through the sidecar installer rather than a static bundle.
  • Adds end-to-end sidecar documentation, per-agent setup guides, Rust API index notes, install docs, and integration overview links.
  • Wires the sidecar into workspace metadata, release guidance, local dependency publishing helpers, attribution data, Codecov configuration, path filters, GitHub Actions, GitLab CI, and sidecar release-binary artifact publishing.
  • Adds sidecar CLI tests plus coverage for adapters, config, gateway behavior, installer merge/planning, transparent launcher behavior, server responses, session/export behavior, and ATIF exporter changes.
  • Follow-up CI fixes remove the stale sidecar generated-doc toctree entry and make the transparent launcher env/exit-code test use a platform-valid fake agent on Windows.

Validation recorded for this PR:

  • cargo test -p nemo-flow-sidecar
  • PATH="/opt/homebrew/bin:$PATH" /opt/homebrew/bin/just docs
  • cargo fmt --all
  • cargo test -p nemo-flow-sidecar launcher::tests::run_starts_sidecar_injects_env_and_returns_agent_exit_code
  • Previous branch validation included just test-rust, just docs, and cargo clippy -p nemo-flow-sidecar --all-targets -- -D warnings.
  • Commit pre-checks for the follow-up fix passed with lychee skipped because it still fails on unrelated generated Rust API doc links in docs/reference/api/rust/_generated/...; cargo fmt, cargo clippy, and cargo check passed with PyO3 pointed at Python 3.12.
  • Latest PR head is f5f9a385, which includes the follow-up fix commit 23315c09 plus the current main merge; GitHub CI is rerunning for this head.

Where should the reviewer start?

Start with crates/sidecar/src/session.rs for lifecycle normalization and export behavior, then crates/sidecar/src/gateway.rs and crates/sidecar/src/server.rs for the HTTP/gateway surface. For installation and user-facing behavior, review crates/sidecar/src/installer.rs, docs/integrate-frameworks/coding-agent-sidecar.md, and integrations/coding-agents/README.md.

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

  • Closes NMF-57
  • Closes NMF-58
  • Closes NMF-59
  • Closes NMF-60

Summary by CodeRabbit

  • New Features

    • Added a coding-agent gateway sidecar (nemo-flow-sidecar) with adapters, CLI/launcher/installer, server, and workspace/crate publication.
    • Python LLM API: optional annotated_response and response_codec for richer end-event annotations.
  • Improvements

    • Enhanced observability: request-level LLM payload capture and broader provider token-usage extraction; normalized response decoding behavior.
  • Documentation

    • New sidecar guides, integration READMEs, install instructions, and README badge.
  • Tests

    • Extensive unit/integration coverage for sidecar components.
  • Chores

    • CI/workflow updates to build/publish sidecar and Codecov coverage component.

Signed-off-by: Will Killian <wkillian@nvidia.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 5, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Introduces a new nemo-flow-sidecar crate (server, adapters, gateway, session manager), installer/launcher CLI, Python llm_call_end annotation/codec support, CI/publishing and Codecov updates, ATIF exporter improvements, extensive tests, docs, and packaged coding-agent integration bundles.

Changes

Sidecar crate, API, CI, tests, and docs

Layer / File(s) Summary
Core contracts and wiring
.github/..., .gitlab-ci.yml, Cargo.toml, codecov.yml, README.md, RELEASING.md, justfile
Adds crates/sidecar to the workspace and workspace.dependencies; includes sidecar in GitHub/GitLab publish loops and adds a CI packaging job; updates Codecov components/ignore; extends CI path filters for integrations/coding-agents/**; updates README badges and RELEASING notes; updates justfile helper.
Core ATIF exporter
crates/core/src/observability/atif.rs, crates/core/tests/unit/atif_tests.rs
Adds llm_request to AtifStepExtra; records request payload at LLM start and emits None for non-user finalized agent steps; refactors extract_metrics to handle provider usage shapes and adds helper extractors; tests updated.
Core LLM API / codecs
crates/core/src/api/llm.rs, crates/core/src/codec/traits.rs
Adds response_codec to LlmCallEndParams; llm_call_end sanitizes the provider response, derives annotated_response from the codec when omitted, emits the end event even if decode fails, and returns a decode error afterward; codec docs clarify fallible decode semantics.
Python API extension
crates/python/src/py_api/mod.rs, python/nemo_flow/_native.pyi, python/nemo_flow/llm.py, crates/python/tests/*, python/tests/*
Extends Python llm_call_end to accept optional annotated_response and response_codec; adds helpers to resolve/validate codec and annotated-response inputs; refactors execute/stream execute paths to use shared helper; updates stubs and tests to cover decoding, sanitized-payload decoding, and error-with-emitted-end-event semantics.
Sidecar manifest & models
crates/sidecar/Cargo.toml, crates/sidecar/src/model.rs
Adds crates/sidecar Cargo manifest and workspace dependency; defines AgentKind, NormalizedEvent, and typed event payload structs (SessionEvent, SubagentEvent, LlmHintEvent, LlmEvent, ToolEvent).
Adapters & normalization
crates/sidecar/src/adapters/*, crates/sidecar/src/adapters/mod.rs
Implements vendor adapters (Claude Code, Codex, Cursor, Hermes) and shared adapter utilities: AdapterOutcome, classification rules, session-id extraction, metadata builders, normalization helpers; adapters return vendor-shaped responses while producing normalized events.
Config, CLI, and error mapping
crates/sidecar/src/config.rs, crates/sidecar/src/error.rs
Implements CLI models (server/run/install/hook-forward), TOML discovery + deep-merge, environment overlays, header-derived per-session config parsing, and SidecarError with IntoResponse mapping to structured JSON and HTTP status codes.
Server and gateway
crates/sidecar/src/server.rs, crates/sidecar/src/gateway.rs
Adds Axum router with /healthz, per-agent /hooks/* endpoints that adapt payloads and apply normalized events; gateway passthrough for OpenAI-style routes with buffered and streaming handling, streaming preview/truncation, StreamingLlmGuard for client-drop cleanup, and /v1/models proxy; header filtering and provider route mapping included.
Session manager, correlation, and ATIF
crates/sidecar/src/session.rs
Implements SessionManager for in-memory session lifecycle management, lazy root-scope creation, LLM/tool/subagent ownership resolution via headers/pending hints/sticky ownership, hint scoring and matching, tool-hint extraction from provider responses, ATIF trajectory writing, and observer flush/cleanup.
Installer, launcher, and CLI entrypoint
crates/sidecar/src/installer.rs, crates/sidecar/src/launcher.rs, crates/sidecar/src/main.rs
Adds install planning/merge/write with idempotent merging and backups; hook-forward stdin->HTTP forwarding with fail-open/closed modes; run launcher that injects sidecar wiring for various agents (temporary plugin/hook patching, env/argv injection, health polling); adds async CLI entrypoint and dispatcher.
Tests
crates/sidecar/tests/**/*, crates/core/tests/*, crates/python/tests/*, python/tests/*
Adds wide test coverage: adapters, config resolution, gateway streaming/headers/models, server endpoints, session lifecycles and correlation, installer merge/idempotency, launcher prepare/restore/health, CLI hook-forward behaviors, Python codec decode/sanitization/failure semantics, and ATIF content validation.
Docs & integrations
docs/**, integrations/coding-agents/**
Adds sidecar getting-started and per-agent guides, integration bundles (Claude Code, Codex, Cursor, Hermes) with hooks and READMEs, updates Rust API reference to list nemo-flow-sidecar, and documents install-from-source and index navigation links.

Sequence Diagram(s)

sequenceDiagram
  participant Agent as Coding Agent
  participant Sidecar as Sidecar Server
  participant Upstream as Provider API
  participant Sess as SessionManager
  Agent->>Sidecar: POST /hooks/{agent} (hook JSON)
  Sidecar->>Sess: apply_events(normalized)
  Agent->>Sidecar: POST /v1/chat/completions
  Sidecar->>Sess: start_llm(request, headers)
  Sidecar->>Upstream: forward request
  Upstream-->>Sidecar: response/stream
  Sidecar->>Sess: end_llm(response/meta)
  Sidecar-->>Agent: proxied response/vendor hook reply
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

@github-actions github-actions Bot added size:XXL PR is very large feat lang:rust PR changes/introduces Rust code labels May 5, 2026
@willkill07 willkill07 self-assigned this May 5, 2026
@willkill07 willkill07 added this to the 0.2.0 milestone May 5, 2026
willkill07 and others added 5 commits May 5, 2026 16:06
Signed-off-by: Will Killian <wkillian@nvidia.com>
Signed-off-by: Will Killian <wkillian@nvidia.com>
Signed-off-by: Will Killian <wkillian@nvidia.com>
Signed-off-by: Will Killian <wkillian@nvidia.com>
  - Added Hermes as a first-class sidecar agent:
    --agent hermes, config support, command inference for hermes / hermes-agent, and AgentKind::Hermes.
  - Added a dedicated Hermes hook endpoint:
    /hooks/hermes now routes Hermes shell-hook payloads into the sidecar session manager.
  - Added a Hermes adapter:
    Maps Hermes lifecycle, tool, and subagent shell hooks into normalized NeMo Flow sidecar events.
  - Added Hermes installer support:
    nemo-flow-sidecar install hermes writes/merges .hermes/config.yaml hook config using YAML instead of JSON/TOML.
  - Added dynamic sidecar URL handling for Hermes:
    Hermes installed hooks prefer runtime NEMO_FLOW_SIDECAR_URL, so nemo-flow-sidecar run --agent hermes can use an ephemeral sidecar URL without reinstalling hooks.
  - Preserved Hermes hook consent semantics:
    The runner exports the dynamic sidecar URL but does not auto-enable HERMES_ACCEPT_HOOKS.
  - Improved ATIF LLM request fidelity:
    ATIF user steps now preserve the full unwrapped LLM request payload under extra.llm_request.
  - Improved ATIF token metrics extraction:
    Supports provider-native usage objects in addition to NeMo Flow token_usage, including cached-token variants from OpenAI/Anthropic-style payloads.
  - Expanded shared adapter extraction:
    Adds support for Hermes-friendly payload fields like task_id, parent_session_id, extra.tool_call_id, and extra.result.
  - Added test coverage:
    New/updated tests cover Hermes adapter mapping, config parsing, installer generation/merge behavior, runtime launcher behavior, server hook response shape, and ATIF request/usage export.

Signed-off-by: Bryan Bednarski <bbednarski@nvidia.com>
Signed-off-by: Will Killian <wkillian@nvidia.com>
@willkill07 willkill07 force-pushed the wkk_coding-agent-sidecar-integrations branch from 9594050 to 841d583 Compare May 6, 2026 18:53
willkill07 and others added 5 commits May 6, 2026 14:54
Signed-off-by: GSD Agent <bbednarski@nvidia.com>
Signed-off-by: Will Killian <wkillian@nvidia.com>
Signed-off-by: GSD Agent <bbednarski@nvidia.com>
Signed-off-by: Will Killian <wkillian@nvidia.com>
@willkill07 willkill07 force-pushed the wkk_coding-agent-sidecar-integrations branch from 173bca9 to 1d896bf Compare May 6, 2026 23:14
willkill07 and others added 4 commits May 6, 2026 19:29
Signed-off-by: Will Killian <wkillian@nvidia.com>
Signed-off-by: Will Killian <wkillian@nvidia.com>
Signed-off-by: Will Killian <2007799+willkill07@users.noreply.github.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 7, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions github-actions Bot added the lang:python PR changes/introduces Python code label May 7, 2026
…egrations' into wkk_coding-agent-sidecar-integrations

# Conflicts:
#	crates/sidecar/src/model.rs
#	crates/sidecar/src/session.rs
#	crates/sidecar/tests/coverage/installer_tests.rs
#	crates/sidecar/tests/coverage/session_tests.rs
Signed-off-by: Bryan Bednarski <bbednarski@nvidia.com>
@bbednarski9 bbednarski9 force-pushed the wkk_coding-agent-sidecar-integrations branch from bf50036 to 7879885 Compare May 7, 2026 20:16
…decar-integrations

Signed-off-by: Will Killian <wkillian@nvidia.com>

# Conflicts:
#	ATTRIBUTIONS-Rust.md
#	Cargo.lock
@willkill07 willkill07 marked this pull request as ready for review May 8, 2026 17:50
Comment thread crates/sidecar/src/launcher.rs Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/sidecar/src/launcher.rs`:
- Around line 168-179: The current configured_command function uses
command.split_whitespace(), which drops quotes and corrupts quoted paths/args
(e.g., "C:\Program Files\..."), so replace the whitespace-only split with a
quote-aware parser: call a shell-style splitter (e.g., shell_words::split or
shlex::split) on the command string inside configured_command (refer to the
configured_command function and CodingAgent/AgentConfigs lookup), handle the
Result (return None or log/error on parse failure), and return the parsed
Vec<String> only when non-empty; this preserves quoted executables and arguments
on Windows and wrapper commands.
- Around line 482-496: The current insert_after_agent implementation can place
agent flags after argv[0] when the agent was specified via an explicit --agent
flag but no argv element matches CodingAgent::infer, causing wrappers to be
configured instead of the agent; update insert_after_agent (referencing argv,
agent, and CodingAgent::infer) to first search argv for an explicit --agent
occurrence (both "--agent" followed by a value and "--agent=VALUE" forms) and,
if found, insert the generated args immediately after the agent value token (or
the same token for --agent=VALUE); only if no explicit --agent token is present
fall back to finding the last infer-able agent index (as currently done) and
finally fall back to index 0.
- Around line 108-111: The health probe timeout currently discards any actual
server error by ignoring the result of running_server.stop().await; change the
error handling so that when wait_for_health(sidecar_url).await returns
Err(health_err), you call running_server.stop().await and if that returns
Err(server_err) return server_err (or otherwise wrap/preserve both, preferring
server_err) instead of always returning health_err; reference
RunningSidecar::start, wait_for_health, and running_server.stop().await to
locate where to capture and propagate the server-side error.

In `@crates/sidecar/tests/coverage/launcher_tests.rs`:
- Around line 223-252: The test mutates process cwd and currently restores it
with a trailing std::env::set_current_dir(previous) which fails to run on panic;
introduce a small RAII guard (e.g., CwdGuard) that saves the previous cwd on
creation and restores it in Drop, then replace the manual previous/save/restore
pattern in tests like cursor_patching_can_be_disabled (and the other listed
tests) by creating a CwdGuard after setting temp dir (use current_dir_lock() as
before) so the cwd is always restored even if the test panics or returns early;
ensure the guard is used in the scope where PreparedRun::new and assertions run.

In `@docs/integrate-frameworks/coding-agent-codex.md`:
- Around line 89-90: The provider display name currently uses "Nemo Flow"
casing; update the value for the TOML table [model_providers.nemo-flow-openai]
so the name key uses the official product capitalization "NeMo Flow" (e.g.,
change name = "Nemo Flow OpenAI" to name = "NeMo Flow OpenAI") to ensure
consistent product naming across docs.
- Around line 111-114: The docs currently reference deprecated keys
features.codex_hooks and codex_hooks; update the text so the transparent wrapper
sets the CLI override features.hooks=true (instead of features.codex_hooks) and
the persistent install writes .codex/config.toml with hooks = true (instead of
codex_hooks = true), while still describing that generated hook entries are
merged into .codex/hooks.json; update both occurrences to use the canonical
features.hooks and hooks configuration names and leave the hooks.json behavior
unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 7d4ca10f-58e2-4750-9c45-2c77d6195b69

📥 Commits

Reviewing files that changed from the base of the PR and between d336a47 and b35f658.

📒 Files selected for processing (4)
  • crates/sidecar/src/launcher.rs
  • crates/sidecar/tests/coverage/launcher_tests.rs
  • docs/integrate-frameworks/coding-agent-codex.md
  • integrations/coding-agents/codex/README.md
📜 Review details
🧰 Additional context used
📓 Path-based instructions (30)
**/*.{md,rst,html,txt}

📄 CodeRabbit inference engine (.agents/skills/review-doc-style/assets/nvidia-style-brand-terminology.md)

**/*.{md,rst,html,txt}: Always spell NVIDIA in all caps. Do not use Nvidia, nvidia, nVidia, nVIDIA, or NV.
Use an NVIDIA before a noun because the name starts with an 'en' sound.
Do not add a registered trademark symbol after NVIDIA when referring to the company.
Use trademark symbols with product names only when the document type or legal guidance requires them.
Verify official capitalization, spacing, and hyphenation for product names.
Precede NVIDIA product names with NVIDIA on first mention when it is natural and accurate.
Do not rewrite product names for grammar or title-case rules.
Preserve third-party product names according to the owner's spelling.
Include the company name and full model qualifier on first use when it helps identify the model.
Preserve the official capitalization and punctuation of model names.
Use shorter family names only after the full name is established.
Spell out a term on first use and put the acronym in parentheses unless the acronym is widely understood by the intended audience.
Use the acronym on later mentions after it has been defined.
For long documents, reintroduce the full term if readers might lose context.
Form plurals of acronyms with s, not an apostrophe, such as GPUs.
In headings, common acronyms can remain abbreviated. Spell out the term in the first or second sentence of the body.
Common terms such as CPU, GPU, PC, API, and UI usually do not need to be spelled out for developer audiences.

Files:

  • integrations/coding-agents/codex/README.md
  • docs/integrate-frameworks/coding-agent-codex.md
**/*.{md,rst,html}

📄 CodeRabbit inference engine (.agents/skills/review-doc-style/assets/nvidia-style-brand-terminology.md)

Link the first mention of a product name when the destination helps the reader.

Files:

  • integrations/coding-agents/codex/README.md
  • docs/integrate-frameworks/coding-agent-codex.md
**/*.{md,rst,txt}

📄 CodeRabbit inference engine (.agents/skills/review-doc-style/assets/nvidia-style-guide.md)

**/*.{md,rst,txt}: Spell NVIDIA in all caps. Do not use Nvidia, nvidia, or NV.
Format commands, code elements, expressions, package names, file names, and paths as inline code.
Use descriptive link text. Avoid raw URLs and weak anchors such as 'here' or 'read more.'
Use title case consistently for technical documentation headings.
Introduce code blocks, lists, tables, and images with complete sentences.
Write procedures as imperative steps. Keep steps parallel and split long procedures into smaller tasks.
Prefer active voice, present tense, short sentences, contractions, and plain English.
Use can for possibility and reserve may for permission.
Use after for temporal relationships instead of once.
Prefer refer to over see when the wording points readers to another resource.
Avoid culture-specific idioms, unnecessary Latinisms, jokes, and marketing exaggeration in technical documentation.
Spell out months in body text, avoid ordinal dates, and use clear time zones.
Spell out whole numbers from zero through nine unless they are technical values, parameters, versions, or UI values.
Use numerals for 10 or greater and include commas in thousands.
Do not add trademark symbols to learning-oriented documentation unless the source, platform, or legal guidance explicitly requires them.
Do not add trademark symbols to NeMo Flow learning documentation by default.
Do not rewrite API names, package names, command flags, or code literals for style reasons.

Files:

  • integrations/coding-agents/codex/README.md
  • docs/integrate-frameworks/coding-agent-codex.md
**/*.{md,markdown,rst}

📄 CodeRabbit inference engine (.agents/skills/review-doc-style/assets/nvidia-style-technical-docs.md)

**/*.{md,markdown,rst}: Use title case consistently in technical documentation headings
Avoid quotation marks, ampersands, and exclamation marks in headings
Keep product, event, research, and whitepaper names in their official title case
Use title case for table headers
Do not force social-media sentence case into technical docs
Use monospace formatting for code elements, commands, parameters, package names, and expressions
Use monospace formatting for directories, file names, and paths
Use angle brackets inside monospace for variables inside paths, such as /home/<username>/.login
Use quotation marks for error messages and strings in documentation
Use bold formatting for UI buttons, menus, fields, and labels in documentation
Use angle brackets between UI labels for menu paths, such as File > Save As
Use italics for new terms on first use in documentation
Use italics for publication titles in documentation
Use plain text formatting for keyboard shortcuts in documentation
Prefer [NVIDIA/NeMo](link) format for GitHub repository references over generic phrases like 'the GitHub repo'
Introduce every code block with a complete sentence
Do not make a code block complete the grammar of the previous sentence
Do not continue a sentence after a code block
Use syntax highlighting when the format supports it for code blocks
Avoid the word 'snippet' unless the surrounding docs already use it as a term of art
Keep inline method, function, and class references consistent with nearby docs, omitting empty parentheses for prose readability when no call is shown
Use descriptive anchor text that matches the destination title when possible for links
Avoid raw URLs in running text in documentation
Avoid generic link anchors such as 'here,' 'this page,' and 'read more' in documentation
Include the acronym in link text if a linked term includes an acronym
Do not link long sentences or multiple sentences in documentation
Avoid links that pull readers away from a procedure unles...

Files:

  • integrations/coding-agents/codex/README.md
  • docs/integrate-frameworks/coding-agent-codex.md
**/*.{html,md}

📄 CodeRabbit inference engine (CONTRIBUTING.md)

Include SPDX license headers in HTML and Markdown files using HTML comment syntax

Files:

  • integrations/coding-agents/codex/README.md
  • docs/integrate-frameworks/coding-agent-codex.md
**/README.md

📄 CodeRabbit inference engine (CONTRIBUTING.md)

Update relevant crate or package README when that surface changed

Relevant package or crate README.md files must be updated when examples or binding guidance changes

Files:

  • integrations/coding-agents/codex/README.md
**/*.md

📄 CodeRabbit inference engine (CONTRIBUTING.md)

Run Markdown link checking via lychee for README.md, CONTRIBUTING.md, and docs/ through pre-commit hooks

Files:

  • integrations/coding-agents/codex/README.md
  • docs/integrate-frameworks/coding-agent-codex.md
**/*.{md,markdown,py,sh,bash,js,ts,java,cpp,go,rust}

📄 CodeRabbit inference engine (.agents/skills/contribute-docs/SKILL.md)

Keep package names, repo references, and build commands current in documentation

Files:

  • integrations/coding-agents/codex/README.md
  • docs/integrate-frameworks/coding-agent-codex.md
**/*.{md,markdown,py,sh,bash}

📄 CodeRabbit inference engine (.agents/skills/contribute-docs/SKILL.md)

Keep stable user-facing wrappers at scripts/ root in docs and examples; only point at namespaced helper paths when documenting internal maintenance work

Files:

  • integrations/coding-agents/codex/README.md
  • docs/integrate-frameworks/coding-agent-codex.md
**/*.{md,markdown,py,sh,bash,js,ts,example}

📄 CodeRabbit inference engine (.agents/skills/contribute-docs/SKILL.md)

Example commands must match current package names and paths

Files:

  • integrations/coding-agents/codex/README.md
  • docs/integrate-frameworks/coding-agent-codex.md
**/{integrations,integration,*-integration}/**

📄 CodeRabbit inference engine (.agents/skills/contribute-integration/SKILL.md)

**/{integrations,integration,*-integration}/**: Keep NeMo Flow optional in framework integrations
Preserve the framework's original behavior when NeMo Flow is absent
Wrap tool and LLM paths at the correct framework boundary
Integration pattern must follow docs/integrate-frameworks/adding-scopes.md

Files:

  • integrations/coding-agents/codex/README.md
**/*.{md,txt,rst}

📄 CodeRabbit inference engine (.agents/skills/review-doc-style/SKILL.md)

**/*.{md,txt,rst}: Ensure commands, package names, file paths, and APIs in documentation are correct and not stale; flag incorrect or outdated information as blocking issues
Ensure examples and procedures in documentation will execute successfully with current APIs and commands
Use consistent user-facing terminology throughout documentation that matches current repo terminology
Capitalize NVIDIA correctly in all documentation and public-facing text
Format code, commands, paths, and filenames as inline code (monospace) in documentation
Use descriptive anchor text for links instead of bare URLs or weak labels like 'here' in documentation
Prefer active voice, present tense, short sentences, and plain English in documentation
Structure documentation procedures as imperative steps that are easy to scan and not too long for a single sequence
Prefer 'after' instead of 'once' for temporal references in documentation
Use 'can' instead of 'may' when describing possibility (rather than permission) in documentation
Avoid ambiguous numeric dates and ordinal dates in documentation body text

Files:

  • integrations/coding-agents/codex/README.md
  • docs/integrate-frameworks/coding-agent-codex.md
{README.md,docs/index.md,**/README.md}

📄 CodeRabbit inference engine (.agents/skills/review-doc-style/SKILL.md)

Update entry-point documentation (README.md, docs/index.md, package READMEs, binding-level source READMEs) whenever public behavior changes

Files:

  • integrations/coding-agents/codex/README.md
**/*.{rs,py,js,ts,go,sh,toml,yaml,yml,json,md,rst}

📄 CodeRabbit inference engine (AGENTS.md)

Keep SPDX headers on source, docs, scripts, and configuration files. The project is Apache-2.0.

Files:

  • integrations/coding-agents/codex/README.md
  • docs/integrate-frameworks/coding-agent-codex.md
  • crates/sidecar/src/launcher.rs
  • crates/sidecar/tests/coverage/launcher_tests.rs
**/*.{py,js,ts,tsx,go,rs,md}

📄 CodeRabbit inference engine (.agents/skills/validate-change/SKILL.md)

Format changed files with the language-native formatter before the final lint/test pass

Files:

  • integrations/coding-agents/codex/README.md
  • docs/integrate-frameworks/coding-agent-codex.md
  • crates/sidecar/src/launcher.rs
  • crates/sidecar/tests/coverage/launcher_tests.rs
docs/**/*.md

📄 CodeRabbit inference engine (CONTRIBUTING.md)

Run ./scripts/build-docs.sh for documentation site changes

docs/**/*.md: Relevant getting-started or reference docs must be updated when examples change
Release-policy docs must point to GitHub Releases as the only release-history source of truth

docs/**/*.md: Use title case for headings in technical documentation
Introduce code blocks, tables, and lists with complete lead-in sentences in documentation

Files:

  • docs/integrate-frameworks/coding-agent-codex.md
{RELEASING.md,CHANGELOG.md,docs/**/*.md}

📄 CodeRabbit inference engine (.agents/skills/contribute-docs/SKILL.md)

Keep release-process and release-notes guidance in repo-maintainer docs such as RELEASING.md, not as user-facing docs pages or CHANGELOG.md

Files:

  • docs/integrate-frameworks/coding-agent-codex.md
docs/integrate-frameworks/**/*.md

📄 CodeRabbit inference engine (.agents/skills/contribute-integration/SKILL.md)

Documentation must be updated if activation or usage of the integration changed

Files:

  • docs/integrate-frameworks/coding-agent-codex.md
{scripts/*.sh,docs/**/*.md}

📄 CodeRabbit inference engine (.agents/skills/contribute-integration/SKILL.md)

Use root ./scripts/*.sh commands in docs and contributor guidance as documented, with implementations under scripts/third-party/

Files:

  • docs/integrate-frameworks/coding-agent-codex.md
{docs/**,examples/**,crates/adaptive/**,python/nemo_flow/**,go/nemo_flow/**,**/{example,component}.{ts,tsx,js,rs,py,go}}

📄 CodeRabbit inference engine (.agents/skills/maintain-optimizer/SKILL.md)

Any new adaptive component kind must have documentation, examples, and binding coverage across all supported languages

Files:

  • docs/integrate-frameworks/coding-agent-codex.md
{README*,CHANGELOG*,docs/**/*.{md,rst,txt},examples/**/*,*.md}

📄 CodeRabbit inference engine (.agents/skills/rename-surfaces/SKILL.md)

Update documentation, examples, and getting-started guides with new package/module/crate names after rename operations

Files:

  • docs/integrate-frameworks/coding-agent-codex.md
{README.md,docs/**/*.md,examples/**/*.{js,ts,py,go,rs}}

📄 CodeRabbit inference engine (.agents/skills/maintain-packaging/SKILL.md)

Keep documentation and examples synchronized with current install, import, and build commands

Files:

  • docs/integrate-frameworks/coding-agent-codex.md
{README.md,CONTRIBUTING.md,docs/**/*.md}

📄 CodeRabbit inference engine (.agents/skills/validate-change/SKILL.md)

For docs-only changes, run targeted checks only if commands, package names, or examples changed. Use just docs for docs-site builds and just docs-linkcheck when links changed

Files:

  • docs/integrate-frameworks/coding-agent-codex.md
{docs/**,README.md,CONTRIBUTING.md,RELEASING.md,SECURITY.md}

⚙️ CodeRabbit configuration file

{docs/**,README.md,CONTRIBUTING.md,RELEASING.md,SECURITY.md}: Review documentation for technical accuracy against the current API, command correctness, and consistency across language bindings.
Flag stale examples, missing SPDX headers where required, and instructions that no longer match CI or pre-commit behavior.

Files:

  • docs/integrate-frameworks/coding-agent-codex.md
**/*.rs

📄 CodeRabbit inference engine (.agents/skills/test-ffi-surface/SKILL.md)

**/*.rs: Run cargo fmt --all for FFI work as it is Rust work
Run just test-rust for FFI validation
Run cargo clippy --workspace --all-targets -- -D warnings to enforce warnings-as-errors linting

When Rust files changed as part of Python work, run cargo fmt --all, just test-rust, and cargo clippy --workspace --all-targets -- -D warnings

**/*.rs: Run cargo fmt --all for Rust code formatting
Run cargo clippy --workspace --all-targets -- -D warnings to enforce Rust linting with no warnings
Run just test-rust as the shared-runtime build/test wrapper for Rust changes

Use Rust snake_case naming convention for Rust code

**/*.rs: Any Rust change must run just test-rust
Any Rust change must run cargo fmt --all
Any Rust change must run cargo clippy --workspace --all-targets -- -D warnings

**/*.rs: Run cargo fmt --all for Rust code formatting when Node changes touch Rust files
Run cargo clippy --workspace --all-targets -- -D warnings to enforce strict linting when Rust files changed as part of Node work

**/*.rs: Always run just test-rust when any Rust code changes
Always run cargo fmt --all when any Rust code changes
Always run cargo clippy --workspace --all-targets -- -D warnings when any Rust code changes

Files:

  • crates/sidecar/src/launcher.rs
  • crates/sidecar/tests/coverage/launcher_tests.rs
**/*.{rs,go,js,ts,tsx}

📄 CodeRabbit inference engine (CONTRIBUTING.md)

Include SPDX license headers in all Rust, Go, JavaScript, and TypeScript source files using C-style comment syntax

Files:

  • crates/sidecar/src/launcher.rs
  • crates/sidecar/tests/coverage/launcher_tests.rs
**/*.{rs,py,go,js,ts,tsx}

📄 CodeRabbit inference engine (CONTRIBUTING.md)

Use SONAR_IGNORE_START / SONAR_IGNORE_END markers only for documented false positives that cannot be resolved in code; keep ignored blocks small, add explanatory comments, and require reviewer sign-off

Files:

  • crates/sidecar/src/launcher.rs
  • crates/sidecar/tests/coverage/launcher_tests.rs
**/*.{js,ts,tsx,jsx,py,rs,go,java,c,cpp,h,cc,cxx,cs,rb,php,swift,kt}

📄 CodeRabbit inference engine (.agents/skills/prepare-pr/SKILL.md)

Changed files must be formatted with the language-native formatter

Files:

  • crates/sidecar/src/launcher.rs
  • crates/sidecar/tests/coverage/launcher_tests.rs
**/*.{rs,py,js,ts,tsx,go}

📄 CodeRabbit inference engine (.agents/skills/validate-change/SKILL.md)

During iteration, prefer uv run pre-commit run --files <changed files...> for targeted validation

Files:

  • crates/sidecar/src/launcher.rs
  • crates/sidecar/tests/coverage/launcher_tests.rs
{crates/**/tests/**,python/tests/**,go/nemo_flow/**/*_test.go}

⚙️ CodeRabbit configuration file

{crates/**/tests/**,python/tests/**,go/nemo_flow/**/*_test.go}: Tests should cover the behavior promised by the changed API surface, including error paths and cross-request isolation where relevant.
Prefer assertions on lifecycle events, scope stacks, middleware ordering, and binding parity over shallow smoke tests.

Files:

  • crates/sidecar/tests/coverage/launcher_tests.rs
🔇 Additional comments (2)
integrations/coding-agents/codex/README.md (2)

103-104: Use Official Product Capitalization in Provider Name

The user-facing provider label should use NeMo Flow, not Nemo Flow.

Suggested doc fix
-name = "Nemo Flow OpenAI"
+name = "NeMo Flow OpenAI"

As per coding guidelines, "Verify official capitalization, spacing, and hyphenation for product names."


33-35: ⚡ Quick win

Documentation is correct as written.

The configuration key documented in lines 34–35 is codex_hooks = true under the [features] table in .codex/config.toml, which matches the canonical syntax for codex-cli >= 0.129.0. The suggested change to features.hooks = true is incorrect; that key does not exist.

Comment thread crates/sidecar/src/launcher.rs
Comment thread crates/sidecar/src/launcher.rs
Comment thread crates/sidecar/src/launcher.rs
Comment thread crates/sidecar/tests/coverage/launcher_tests.rs
Comment thread docs/integrate-frameworks/coding-agent-codex.md Outdated
Comment thread docs/integrate-frameworks/coding-agent-codex.md Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
crates/sidecar/src/gateway.rs (2)

545-549: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Stop recording cookie and generic API-key aliases.

should_record_header still allows secrets like cookie, api-key, and similar aliases into LlmRequest.headers, which then flow into downstream exports.

Proposed fix
 fn should_record_header(name: &HeaderName) -> bool {
     should_forward_request_header(name)
         && name != http::header::AUTHORIZATION
+        && name != http::header::COOKIE
         && name.as_str() != "x-api-key"
+        && name.as_str() != "api-key"
+        && name.as_str() != "apikey"
         && name.as_str() != "anthropic-api-key"
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/sidecar/src/gateway.rs` around lines 545 - 549, The
should_record_header function currently permits secret headers like "cookie" and
generic API-key aliases; update should_record_header (the function taking
&HeaderName) to explicitly exclude cookie-related headers ("cookie" and
"set-cookie") and common API-key variants ("api-key", "apikey", and any other
vendor aliases such as "x-api-key" and "anthropic-api-key") and perform
comparisons case-insensitively (e.g., use name.as_str().eq_ignore_ascii_case)
while keeping the existing should_forward_request_header and the authorization
check.

503-565: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Honor Connection header tokens when filtering headers.

The proxy still strips only the fixed hop-by-hop names. If a request or response carries Connection: x-foo, x-foo is still eligible for forwarding and recording because is_hop_by_hop never inspects the current header map. That breaks proxy semantics and can leak per-connection metadata.

Parse the message’s Connection header into a per-request/per-response exclusion set and apply it in both the request and response filters.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/sidecar/src/gateway.rs` around lines 503 - 565, The code currently
strips only fixed hop-by-hop header names; you must honor per-message Connection
tokens by parsing the request/response HeaderMap's "connection" header into a
set of token names and treating those tokens as hop-by-hop too. Update
response_headers to first collect connection tokens from the provided HeaderMap
and exclude any header whose name matches those tokens (in addition to calling
is_hop_by_hop), and change the request-side predicates
should_forward_request_header and should_record_header to accept either the
HeaderMap or a precomputed set of connection token names so they can exclude
those tokens as well; keep the existing fixed-name checks (HOST, CONTENT_LENGTH,
ACCEPT_ENCODING, AUTHORIZATION, x-api-key, anthropic-api-key) and reuse or
refactor is_hop_by_hop into a helper that only tests fixed names while combining
it with the per-message token set when filtering.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/sidecar/src/gateway.rs`:
- Around line 245-259: The code currently calls upstream_response.bytes() and
fully buffers the upstream body (see upstream_response.bytes(), the bytes
variable and response_json usage), which can OOM; replace this with a shared
bounded reader that enforces a maximum buffer size (e.g., a configured
MAX_BODY_BYTES) and choose an explicit overflow policy: either fail-fast by
returning a SidecarError::UpstreamOverflow and emitting end_llm(active, {
"error": "body_size_limit" }, { "http_status": status.as_u16(), "streaming":
false, "gateway_error": true, "stage": "body" }), or switch to passthrough
streaming by stopping buffering once the cap is hit and proxying the remaining
bytes directly to the client; implement the cap logic where bytes() is called
and also apply the same bounded-reader behavior to the other occurrence (around
lines referenced 375-378) so both non-streaming paths share the same
bounded-read helper and overflow handling.

---

Duplicate comments:
In `@crates/sidecar/src/gateway.rs`:
- Around line 545-549: The should_record_header function currently permits
secret headers like "cookie" and generic API-key aliases; update
should_record_header (the function taking &HeaderName) to explicitly exclude
cookie-related headers ("cookie" and "set-cookie") and common API-key variants
("api-key", "apikey", and any other vendor aliases such as "x-api-key" and
"anthropic-api-key") and perform comparisons case-insensitively (e.g., use
name.as_str().eq_ignore_ascii_case) while keeping the existing
should_forward_request_header and the authorization check.
- Around line 503-565: The code currently strips only fixed hop-by-hop header
names; you must honor per-message Connection tokens by parsing the
request/response HeaderMap's "connection" header into a set of token names and
treating those tokens as hop-by-hop too. Update response_headers to first
collect connection tokens from the provided HeaderMap and exclude any header
whose name matches those tokens (in addition to calling is_hop_by_hop), and
change the request-side predicates should_forward_request_header and
should_record_header to accept either the HeaderMap or a precomputed set of
connection token names so they can exclude those tokens as well; keep the
existing fixed-name checks (HOST, CONTENT_LENGTH, ACCEPT_ENCODING,
AUTHORIZATION, x-api-key, anthropic-api-key) and reuse or refactor is_hop_by_hop
into a helper that only tests fixed names while combining it with the
per-message token set when filtering.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: a870b963-f006-4ee5-8568-231bf1fa4f3e

📥 Commits

Reviewing files that changed from the base of the PR and between b35f658 and 74189e8.

📒 Files selected for processing (1)
  • crates/sidecar/src/gateway.rs
📜 Review details
🧰 Additional context used
📓 Path-based instructions (7)
**/*.rs

📄 CodeRabbit inference engine (.agents/skills/test-ffi-surface/SKILL.md)

**/*.rs: Run cargo fmt --all for FFI work as it is Rust work
Run just test-rust for FFI validation
Run cargo clippy --workspace --all-targets -- -D warnings to enforce warnings-as-errors linting

When Rust files changed as part of Python work, run cargo fmt --all, just test-rust, and cargo clippy --workspace --all-targets -- -D warnings

**/*.rs: Run cargo fmt --all for Rust code formatting
Run cargo clippy --workspace --all-targets -- -D warnings to enforce Rust linting with no warnings
Run just test-rust as the shared-runtime build/test wrapper for Rust changes

Use Rust snake_case naming convention for Rust code

**/*.rs: Any Rust change must run just test-rust
Any Rust change must run cargo fmt --all
Any Rust change must run cargo clippy --workspace --all-targets -- -D warnings

**/*.rs: Run cargo fmt --all for Rust code formatting when Node changes touch Rust files
Run cargo clippy --workspace --all-targets -- -D warnings to enforce strict linting when Rust files changed as part of Node work

**/*.rs: Always run just test-rust when any Rust code changes
Always run cargo fmt --all when any Rust code changes
Always run cargo clippy --workspace --all-targets -- -D warnings when any Rust code changes

Files:

  • crates/sidecar/src/gateway.rs
**/*.{rs,go,js,ts,tsx}

📄 CodeRabbit inference engine (CONTRIBUTING.md)

Include SPDX license headers in all Rust, Go, JavaScript, and TypeScript source files using C-style comment syntax

Files:

  • crates/sidecar/src/gateway.rs
**/*.{rs,py,go,js,ts,tsx}

📄 CodeRabbit inference engine (CONTRIBUTING.md)

Use SONAR_IGNORE_START / SONAR_IGNORE_END markers only for documented false positives that cannot be resolved in code; keep ignored blocks small, add explanatory comments, and require reviewer sign-off

Files:

  • crates/sidecar/src/gateway.rs
**/*.{js,ts,tsx,jsx,py,rs,go,java,c,cpp,h,cc,cxx,cs,rb,php,swift,kt}

📄 CodeRabbit inference engine (.agents/skills/prepare-pr/SKILL.md)

Changed files must be formatted with the language-native formatter

Files:

  • crates/sidecar/src/gateway.rs
**/*.{rs,py,js,ts,go,sh,toml,yaml,yml,json,md,rst}

📄 CodeRabbit inference engine (AGENTS.md)

Keep SPDX headers on source, docs, scripts, and configuration files. The project is Apache-2.0.

Files:

  • crates/sidecar/src/gateway.rs
**/*.{py,js,ts,tsx,go,rs,md}

📄 CodeRabbit inference engine (.agents/skills/validate-change/SKILL.md)

Format changed files with the language-native formatter before the final lint/test pass

Files:

  • crates/sidecar/src/gateway.rs
**/*.{rs,py,js,ts,tsx,go}

📄 CodeRabbit inference engine (.agents/skills/validate-change/SKILL.md)

During iteration, prefer uv run pre-commit run --files <changed files...> for targeted validation

Files:

  • crates/sidecar/src/gateway.rs

Comment thread crates/sidecar/src/gateway.rs Outdated
AjayThorve added 2 commits May 8, 2026 19:38
Updated the Codex configuration to replace the deprecated `features.codex_hooks` with `features.hooks`, requiring `codex-cli >= 0.129.0`. Adjusted related tests and documentation to reflect this change.

Signed-off-by: Ajay Thorve <athorve@nvidia.com>
Without this, upstream providers honored the client's Accept-Encoding
and returned gzip/br/zstd-encoded bodies. The bytes were forwarded to
the client unchanged (which decoded normally), but the sidecar's
observability capture stored the still-compressed bytes in
`output.value` on LLM spans and in ATIF trajectory bodies, leaving
both unreadable for any downstream consumer (Phoenix, ATIF, ATOF).

Stripping the header forces upstreams to return identity-encoded
bodies. The bandwidth cost is paid only on the sidecar-upstream hop;
the client never asked for the encoding it would have received, so
its decoders never trigger.

Closes NMF-84.

Signed-off-by: Ajay Thorve <athorve@nvidia.com>
@AjayThorve AjayThorve force-pushed the wkk_coding-agent-sidecar-integrations branch from 74189e8 to eb16758 Compare May 9, 2026 02:38
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/sidecar/src/launcher.rs`:
- Around line 559-563: The toml_string function only escapes backslashes and
quotes, which can produce invalid TOML for values containing control characters
(newlines, tabs, etc.); update toml_string to produce correct TOML basic-string
escaping by either (1) replacing it with
toml::Value::String(value.to_string()).to_string() if the toml crate is
available, or (2) extend the escaping logic in toml_string to handle all
TOML-required escapes (e.g., \n, \t, \r, \uXXXX for control characters) so that
values like base_url always serialize into valid TOML.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: d340a09b-2be8-4446-841d-36be1689170c

📥 Commits

Reviewing files that changed from the base of the PR and between 74189e8 and eb16758.

📒 Files selected for processing (5)
  • crates/sidecar/src/gateway.rs
  • crates/sidecar/src/launcher.rs
  • crates/sidecar/tests/coverage/launcher_tests.rs
  • docs/integrate-frameworks/coding-agent-codex.md
  • integrations/coding-agents/codex/README.md
📜 Review details
🧰 Additional context used
📓 Path-based instructions (30)
**/*.{md,rst,html,txt}

📄 CodeRabbit inference engine (.agents/skills/review-doc-style/assets/nvidia-style-brand-terminology.md)

**/*.{md,rst,html,txt}: Always spell NVIDIA in all caps. Do not use Nvidia, nvidia, nVidia, nVIDIA, or NV.
Use an NVIDIA before a noun because the name starts with an 'en' sound.
Do not add a registered trademark symbol after NVIDIA when referring to the company.
Use trademark symbols with product names only when the document type or legal guidance requires them.
Verify official capitalization, spacing, and hyphenation for product names.
Precede NVIDIA product names with NVIDIA on first mention when it is natural and accurate.
Do not rewrite product names for grammar or title-case rules.
Preserve third-party product names according to the owner's spelling.
Include the company name and full model qualifier on first use when it helps identify the model.
Preserve the official capitalization and punctuation of model names.
Use shorter family names only after the full name is established.
Spell out a term on first use and put the acronym in parentheses unless the acronym is widely understood by the intended audience.
Use the acronym on later mentions after it has been defined.
For long documents, reintroduce the full term if readers might lose context.
Form plurals of acronyms with s, not an apostrophe, such as GPUs.
In headings, common acronyms can remain abbreviated. Spell out the term in the first or second sentence of the body.
Common terms such as CPU, GPU, PC, API, and UI usually do not need to be spelled out for developer audiences.

Files:

  • integrations/coding-agents/codex/README.md
  • docs/integrate-frameworks/coding-agent-codex.md
**/*.{md,rst,html}

📄 CodeRabbit inference engine (.agents/skills/review-doc-style/assets/nvidia-style-brand-terminology.md)

Link the first mention of a product name when the destination helps the reader.

Files:

  • integrations/coding-agents/codex/README.md
  • docs/integrate-frameworks/coding-agent-codex.md
**/*.{md,rst,txt}

📄 CodeRabbit inference engine (.agents/skills/review-doc-style/assets/nvidia-style-guide.md)

**/*.{md,rst,txt}: Spell NVIDIA in all caps. Do not use Nvidia, nvidia, or NV.
Format commands, code elements, expressions, package names, file names, and paths as inline code.
Use descriptive link text. Avoid raw URLs and weak anchors such as 'here' or 'read more.'
Use title case consistently for technical documentation headings.
Introduce code blocks, lists, tables, and images with complete sentences.
Write procedures as imperative steps. Keep steps parallel and split long procedures into smaller tasks.
Prefer active voice, present tense, short sentences, contractions, and plain English.
Use can for possibility and reserve may for permission.
Use after for temporal relationships instead of once.
Prefer refer to over see when the wording points readers to another resource.
Avoid culture-specific idioms, unnecessary Latinisms, jokes, and marketing exaggeration in technical documentation.
Spell out months in body text, avoid ordinal dates, and use clear time zones.
Spell out whole numbers from zero through nine unless they are technical values, parameters, versions, or UI values.
Use numerals for 10 or greater and include commas in thousands.
Do not add trademark symbols to learning-oriented documentation unless the source, platform, or legal guidance explicitly requires them.
Do not add trademark symbols to NeMo Flow learning documentation by default.
Do not rewrite API names, package names, command flags, or code literals for style reasons.

Files:

  • integrations/coding-agents/codex/README.md
  • docs/integrate-frameworks/coding-agent-codex.md
**/*.{md,markdown,rst}

📄 CodeRabbit inference engine (.agents/skills/review-doc-style/assets/nvidia-style-technical-docs.md)

**/*.{md,markdown,rst}: Use title case consistently in technical documentation headings
Avoid quotation marks, ampersands, and exclamation marks in headings
Keep product, event, research, and whitepaper names in their official title case
Use title case for table headers
Do not force social-media sentence case into technical docs
Use monospace formatting for code elements, commands, parameters, package names, and expressions
Use monospace formatting for directories, file names, and paths
Use angle brackets inside monospace for variables inside paths, such as /home/<username>/.login
Use quotation marks for error messages and strings in documentation
Use bold formatting for UI buttons, menus, fields, and labels in documentation
Use angle brackets between UI labels for menu paths, such as File > Save As
Use italics for new terms on first use in documentation
Use italics for publication titles in documentation
Use plain text formatting for keyboard shortcuts in documentation
Prefer [NVIDIA/NeMo](link) format for GitHub repository references over generic phrases like 'the GitHub repo'
Introduce every code block with a complete sentence
Do not make a code block complete the grammar of the previous sentence
Do not continue a sentence after a code block
Use syntax highlighting when the format supports it for code blocks
Avoid the word 'snippet' unless the surrounding docs already use it as a term of art
Keep inline method, function, and class references consistent with nearby docs, omitting empty parentheses for prose readability when no call is shown
Use descriptive anchor text that matches the destination title when possible for links
Avoid raw URLs in running text in documentation
Avoid generic link anchors such as 'here,' 'this page,' and 'read more' in documentation
Include the acronym in link text if a linked term includes an acronym
Do not link long sentences or multiple sentences in documentation
Avoid links that pull readers away from a procedure unles...

Files:

  • integrations/coding-agents/codex/README.md
  • docs/integrate-frameworks/coding-agent-codex.md
**/*.{html,md}

📄 CodeRabbit inference engine (CONTRIBUTING.md)

Include SPDX license headers in HTML and Markdown files using HTML comment syntax

Files:

  • integrations/coding-agents/codex/README.md
  • docs/integrate-frameworks/coding-agent-codex.md
**/README.md

📄 CodeRabbit inference engine (CONTRIBUTING.md)

Update relevant crate or package README when that surface changed

Relevant package or crate README.md files must be updated when examples or binding guidance changes

Files:

  • integrations/coding-agents/codex/README.md
**/*.md

📄 CodeRabbit inference engine (CONTRIBUTING.md)

Run Markdown link checking via lychee for README.md, CONTRIBUTING.md, and docs/ through pre-commit hooks

Files:

  • integrations/coding-agents/codex/README.md
  • docs/integrate-frameworks/coding-agent-codex.md
**/*.{md,markdown,py,sh,bash,js,ts,java,cpp,go,rust}

📄 CodeRabbit inference engine (.agents/skills/contribute-docs/SKILL.md)

Keep package names, repo references, and build commands current in documentation

Files:

  • integrations/coding-agents/codex/README.md
  • docs/integrate-frameworks/coding-agent-codex.md
**/*.{md,markdown,py,sh,bash}

📄 CodeRabbit inference engine (.agents/skills/contribute-docs/SKILL.md)

Keep stable user-facing wrappers at scripts/ root in docs and examples; only point at namespaced helper paths when documenting internal maintenance work

Files:

  • integrations/coding-agents/codex/README.md
  • docs/integrate-frameworks/coding-agent-codex.md
**/*.{md,markdown,py,sh,bash,js,ts,example}

📄 CodeRabbit inference engine (.agents/skills/contribute-docs/SKILL.md)

Example commands must match current package names and paths

Files:

  • integrations/coding-agents/codex/README.md
  • docs/integrate-frameworks/coding-agent-codex.md
**/{integrations,integration,*-integration}/**

📄 CodeRabbit inference engine (.agents/skills/contribute-integration/SKILL.md)

**/{integrations,integration,*-integration}/**: Keep NeMo Flow optional in framework integrations
Preserve the framework's original behavior when NeMo Flow is absent
Wrap tool and LLM paths at the correct framework boundary
Integration pattern must follow docs/integrate-frameworks/adding-scopes.md

Files:

  • integrations/coding-agents/codex/README.md
**/*.{md,txt,rst}

📄 CodeRabbit inference engine (.agents/skills/review-doc-style/SKILL.md)

**/*.{md,txt,rst}: Ensure commands, package names, file paths, and APIs in documentation are correct and not stale; flag incorrect or outdated information as blocking issues
Ensure examples and procedures in documentation will execute successfully with current APIs and commands
Use consistent user-facing terminology throughout documentation that matches current repo terminology
Capitalize NVIDIA correctly in all documentation and public-facing text
Format code, commands, paths, and filenames as inline code (monospace) in documentation
Use descriptive anchor text for links instead of bare URLs or weak labels like 'here' in documentation
Prefer active voice, present tense, short sentences, and plain English in documentation
Structure documentation procedures as imperative steps that are easy to scan and not too long for a single sequence
Prefer 'after' instead of 'once' for temporal references in documentation
Use 'can' instead of 'may' when describing possibility (rather than permission) in documentation
Avoid ambiguous numeric dates and ordinal dates in documentation body text

Files:

  • integrations/coding-agents/codex/README.md
  • docs/integrate-frameworks/coding-agent-codex.md
{README.md,docs/index.md,**/README.md}

📄 CodeRabbit inference engine (.agents/skills/review-doc-style/SKILL.md)

Update entry-point documentation (README.md, docs/index.md, package READMEs, binding-level source READMEs) whenever public behavior changes

Files:

  • integrations/coding-agents/codex/README.md
**/*.{rs,py,js,ts,go,sh,toml,yaml,yml,json,md,rst}

📄 CodeRabbit inference engine (AGENTS.md)

Keep SPDX headers on source, docs, scripts, and configuration files. The project is Apache-2.0.

Files:

  • integrations/coding-agents/codex/README.md
  • docs/integrate-frameworks/coding-agent-codex.md
  • crates/sidecar/src/gateway.rs
  • crates/sidecar/tests/coverage/launcher_tests.rs
  • crates/sidecar/src/launcher.rs
**/*.{py,js,ts,tsx,go,rs,md}

📄 CodeRabbit inference engine (.agents/skills/validate-change/SKILL.md)

Format changed files with the language-native formatter before the final lint/test pass

Files:

  • integrations/coding-agents/codex/README.md
  • docs/integrate-frameworks/coding-agent-codex.md
  • crates/sidecar/src/gateway.rs
  • crates/sidecar/tests/coverage/launcher_tests.rs
  • crates/sidecar/src/launcher.rs
docs/**/*.md

📄 CodeRabbit inference engine (CONTRIBUTING.md)

Run ./scripts/build-docs.sh for documentation site changes

docs/**/*.md: Relevant getting-started or reference docs must be updated when examples change
Release-policy docs must point to GitHub Releases as the only release-history source of truth

docs/**/*.md: Use title case for headings in technical documentation
Introduce code blocks, tables, and lists with complete lead-in sentences in documentation

Files:

  • docs/integrate-frameworks/coding-agent-codex.md
{RELEASING.md,CHANGELOG.md,docs/**/*.md}

📄 CodeRabbit inference engine (.agents/skills/contribute-docs/SKILL.md)

Keep release-process and release-notes guidance in repo-maintainer docs such as RELEASING.md, not as user-facing docs pages or CHANGELOG.md

Files:

  • docs/integrate-frameworks/coding-agent-codex.md
docs/integrate-frameworks/**/*.md

📄 CodeRabbit inference engine (.agents/skills/contribute-integration/SKILL.md)

Documentation must be updated if activation or usage of the integration changed

Files:

  • docs/integrate-frameworks/coding-agent-codex.md
{scripts/*.sh,docs/**/*.md}

📄 CodeRabbit inference engine (.agents/skills/contribute-integration/SKILL.md)

Use root ./scripts/*.sh commands in docs and contributor guidance as documented, with implementations under scripts/third-party/

Files:

  • docs/integrate-frameworks/coding-agent-codex.md
{docs/**,examples/**,crates/adaptive/**,python/nemo_flow/**,go/nemo_flow/**,**/{example,component}.{ts,tsx,js,rs,py,go}}

📄 CodeRabbit inference engine (.agents/skills/maintain-optimizer/SKILL.md)

Any new adaptive component kind must have documentation, examples, and binding coverage across all supported languages

Files:

  • docs/integrate-frameworks/coding-agent-codex.md
{README*,CHANGELOG*,docs/**/*.{md,rst,txt},examples/**/*,*.md}

📄 CodeRabbit inference engine (.agents/skills/rename-surfaces/SKILL.md)

Update documentation, examples, and getting-started guides with new package/module/crate names after rename operations

Files:

  • docs/integrate-frameworks/coding-agent-codex.md
{README.md,docs/**/*.md,examples/**/*.{js,ts,py,go,rs}}

📄 CodeRabbit inference engine (.agents/skills/maintain-packaging/SKILL.md)

Keep documentation and examples synchronized with current install, import, and build commands

Files:

  • docs/integrate-frameworks/coding-agent-codex.md
{README.md,CONTRIBUTING.md,docs/**/*.md}

📄 CodeRabbit inference engine (.agents/skills/validate-change/SKILL.md)

For docs-only changes, run targeted checks only if commands, package names, or examples changed. Use just docs for docs-site builds and just docs-linkcheck when links changed

Files:

  • docs/integrate-frameworks/coding-agent-codex.md
{docs/**,README.md,CONTRIBUTING.md,RELEASING.md,SECURITY.md}

⚙️ CodeRabbit configuration file

{docs/**,README.md,CONTRIBUTING.md,RELEASING.md,SECURITY.md}: Review documentation for technical accuracy against the current API, command correctness, and consistency across language bindings.
Flag stale examples, missing SPDX headers where required, and instructions that no longer match CI or pre-commit behavior.

Files:

  • docs/integrate-frameworks/coding-agent-codex.md
**/*.rs

📄 CodeRabbit inference engine (.agents/skills/test-ffi-surface/SKILL.md)

**/*.rs: Run cargo fmt --all for FFI work as it is Rust work
Run just test-rust for FFI validation
Run cargo clippy --workspace --all-targets -- -D warnings to enforce warnings-as-errors linting

When Rust files changed as part of Python work, run cargo fmt --all, just test-rust, and cargo clippy --workspace --all-targets -- -D warnings

**/*.rs: Run cargo fmt --all for Rust code formatting
Run cargo clippy --workspace --all-targets -- -D warnings to enforce Rust linting with no warnings
Run just test-rust as the shared-runtime build/test wrapper for Rust changes

Use Rust snake_case naming convention for Rust code

**/*.rs: Any Rust change must run just test-rust
Any Rust change must run cargo fmt --all
Any Rust change must run cargo clippy --workspace --all-targets -- -D warnings

**/*.rs: Run cargo fmt --all for Rust code formatting when Node changes touch Rust files
Run cargo clippy --workspace --all-targets -- -D warnings to enforce strict linting when Rust files changed as part of Node work

**/*.rs: Always run just test-rust when any Rust code changes
Always run cargo fmt --all when any Rust code changes
Always run cargo clippy --workspace --all-targets -- -D warnings when any Rust code changes

Files:

  • crates/sidecar/src/gateway.rs
  • crates/sidecar/tests/coverage/launcher_tests.rs
  • crates/sidecar/src/launcher.rs
**/*.{rs,go,js,ts,tsx}

📄 CodeRabbit inference engine (CONTRIBUTING.md)

Include SPDX license headers in all Rust, Go, JavaScript, and TypeScript source files using C-style comment syntax

Files:

  • crates/sidecar/src/gateway.rs
  • crates/sidecar/tests/coverage/launcher_tests.rs
  • crates/sidecar/src/launcher.rs
**/*.{rs,py,go,js,ts,tsx}

📄 CodeRabbit inference engine (CONTRIBUTING.md)

Use SONAR_IGNORE_START / SONAR_IGNORE_END markers only for documented false positives that cannot be resolved in code; keep ignored blocks small, add explanatory comments, and require reviewer sign-off

Files:

  • crates/sidecar/src/gateway.rs
  • crates/sidecar/tests/coverage/launcher_tests.rs
  • crates/sidecar/src/launcher.rs
**/*.{js,ts,tsx,jsx,py,rs,go,java,c,cpp,h,cc,cxx,cs,rb,php,swift,kt}

📄 CodeRabbit inference engine (.agents/skills/prepare-pr/SKILL.md)

Changed files must be formatted with the language-native formatter

Files:

  • crates/sidecar/src/gateway.rs
  • crates/sidecar/tests/coverage/launcher_tests.rs
  • crates/sidecar/src/launcher.rs
**/*.{rs,py,js,ts,tsx,go}

📄 CodeRabbit inference engine (.agents/skills/validate-change/SKILL.md)

During iteration, prefer uv run pre-commit run --files <changed files...> for targeted validation

Files:

  • crates/sidecar/src/gateway.rs
  • crates/sidecar/tests/coverage/launcher_tests.rs
  • crates/sidecar/src/launcher.rs
{crates/**/tests/**,python/tests/**,go/nemo_flow/**/*_test.go}

⚙️ CodeRabbit configuration file

{crates/**/tests/**,python/tests/**,go/nemo_flow/**/*_test.go}: Tests should cover the behavior promised by the changed API surface, including error paths and cross-request isolation where relevant.
Prefer assertions on lifecycle events, scope stacks, middleware ordering, and binding parity over shallow smoke tests.

Files:

  • crates/sidecar/tests/coverage/launcher_tests.rs
🔇 Additional comments (12)
crates/sidecar/src/gateway.rs (4)

245-246: Upstream response bodies are still fully buffered without a cap.

Line 245 and Line 378 still call upstream_response.bytes().await, which can OOM on large/malicious upstream payloads. This was previously raised and remains unresolved.

Also applies to: 378-378


503-510: Connection-token hop-by-hop headers are not stripped.

Filtering only fixed hop-by-hop names misses headers listed dynamically in Connection, so those can still be forwarded/returned. This was previously raised and still applies.

Also applies to: 530-532, 554-565


545-550: Sensitive header recording filter is still incomplete.

should_record_header still allows values like cookie and generic API-key aliases to be recorded. This was already flagged and is still unresolved.


16-17: Good fix: request-body reads are now bounded.

Using MAX_BODY_BYTES with to_bytes closes the prior unbounded-request-body risk cleanly.

Also applies to: 68-70

crates/sidecar/src/launcher.rs (3)

475-480: Product name miscapitalized in user-facing Codex provider config.

Line 477 uses "Nemo Flow OpenAI" but the product name is "NeMo Flow OpenAI" (capital M).

-        "model_providers.nemo-flow-openai={{name=\"Nemo Flow OpenAI\",base_url={},wire_api=\"responses\",requires_openai_auth=true,supports_websockets=false}}",
+        "model_providers.nemo-flow-openai={{name=\"NeMo Flow OpenAI\",base_url={},wire_api=\"responses\",requires_openai_auth=true,supports_websockets=false}}",

108-111: Server startup error discarded on health probe timeout.

When wait_for_health fails, let _ = running_server.stop().await discards any real server error, turning startup/config failures into a misleading readiness timeout message.

Suggested fix
     let running_server = RunningSidecar::start(listener, sidecar_config);
     if let Err(error) = wait_for_health(sidecar_url).await {
-        let _ = running_server.stop().await;
-        return Err(error);
+        return match running_server.stop().await {
+            Ok(()) => Err(error),
+            Err(server_error) => Err(SidecarError::Launch(format!(
+                "{server_error}; health check also failed: {error}"
+            ))),
+        };
     }

168-180: split_whitespace() breaks quoted paths in configured commands.

Paths with spaces (e.g., "C:\Program Files\Codex\codex.exe" --full-auto) will be corrupted. The comment acknowledges this as a convenience tradeoff, but consider using a shell-style splitter like shlex::split for cross-platform robustness.

integrations/coding-agents/codex/README.md (1)

103-104: Product name miscapitalized.

Line 104 uses "Nemo Flow OpenAI" but the product name is "NeMo Flow OpenAI".

-name = "Nemo Flow OpenAI"
+name = "NeMo Flow OpenAI"
docs/integrate-frameworks/coding-agent-codex.md (2)

89-90: Product name miscapitalized.

Line 90 uses "Nemo Flow OpenAI" but the product name is "NeMo Flow OpenAI".

-name = "Nemo Flow OpenAI"
+name = "NeMo Flow OpenAI"

111-114: Documentation references deprecated configuration keys.

Lines 112-113 document features.codex_hooks=true and codex_hooks = true, but the implementation at launcher.rs:321 uses the canonical features.hooks=true. This inconsistency will confuse users.

Suggested fix
 The transparent wrapper passes hook entries as Codex CLI config overrides and
-sets `features.codex_hooks=true` for that launched process. Persistent install
-writes `.codex/config.toml` with `codex_hooks = true` and merges generated hook
+sets `features.hooks=true` for that launched process. Persistent install
+writes `.codex/config.toml` with `hooks = true` and merges generated hook
 entries into `.codex/hooks.json`.
crates/sidecar/tests/coverage/launcher_tests.rs (2)

223-252: Process cwd restoration is not panic-safe.

Tests that mutate the process working directory (lines 227-228, 251) restore it with a trailing set_current_dir(previous). If the test panics before reaching line 251, subsequent tests run in the wrong directory. Use a RAII guard pattern.

Suggested pattern
struct CwdGuard(PathBuf);

impl Drop for CwdGuard {
    fn drop(&mut self) {
        let _ = std::env::set_current_dir(&self.0);
    }
}

#[test]
fn cursor_patching_can_be_disabled() {
    let _guard = current_dir_lock().lock().unwrap();
    let temp = tempfile::tempdir().unwrap();
    let _cwd_guard = CwdGuard(std::env::current_dir().unwrap());
    std::env::set_current_dir(temp.path()).unwrap();
    // ... test body ...
    // No manual restore needed; CwdGuard::drop handles it
}

This pattern should be applied to all affected tests: cursor_patching_can_be_disabled, cursor_patch_restore_restores_original_file, cursor_patch_restore_uses_nearest_project_cursor_dir, cursor_patch_restore_removes_temporary_file, and cursor_dry_run_does_not_write_hooks.


1-593: LGTM on test coverage.

The test suite comprehensively covers agent inference, configured command resolution, agent-specific preparation (Codex, Claude, Hermes, Cursor), cursor hook patch/restore scenarios, dry-run behavior, and async runtime behavior including sidecar URL injection and exit code propagation. The OS-specific fake agent commands are well-structured for cross-platform testing.

Comment thread crates/sidecar/src/launcher.rs
AjayThorve and others added 11 commits May 8, 2026 19:51
…odex)

Three independent issues silently disabled transparent-run hooks for
Claude Code; this commit fixes all three:

1. Hook subprocesses couldn't find the sidecar binary on PATH.
   `hook_forward_command` returned the bare name `nemo-flow-sidecar`,
   which Claude/Codex/Cursor passed to their hook subprocesses. Those
   subprocesses inherit the agent's environment, where the sidecar's
   install location (e.g. target/debug) typically isn't on PATH, so
   every hook exited with status 127 (command not found). Add an
   `executable` parameter so transparent-run callers can pass
   `current_exe()`; persistent install keeps the bare name.

2. Claude Code 2.1.x's hook loader strict-rejects unknown event names —
   any single unknown event causes the ENTIRE hooks file to be rejected
   silently, disabling all hooks. We were injecting
   `AfterAgentResponse` and `AfterAgentThought`, neither in Claude's
   whitelist. Drop both. Also add `PermissionRequest` and `PostCompact`,
   which are in Claude's whitelist and in Codex 0.129's event surface.

3. Claude's hook output validator rejects null for optional string
   fields. The Claude adapter's Stop response carried `stopReason: null`
   and was rejected with a runtime validation error. Omit the field
   entirely; the response collapses to the generic `{ continue: true }`.

Also regenerate the static integration bundles for claude-code and
codex with the corrected event list. The same whitelist mismatch broke
persistent install for Claude users.

Verified end-to-end against Claude Code 2.1.137:
- SessionStart, UserPromptSubmit, PreToolUse:WebSearch,
  PostToolUse:WebSearch, Stop, SubagentStop all fire successfully
- Phoenix shows [AGENT] claude-code root span with tool spans
- ATIF written on session-end (1.7 MB / 1.3 MB across two sessions)

Codex 0.129 lifecycle (SessionStart through PreToolUse/PostToolUse) now
also fires correctly — Phoenix shows [AGENT] codex root with Bash tool
spans. Codex transparent-run ATIF flush still pending: codex 0.129 has
no SessionEnd-equivalent hook, so end_agent never runs without a
wrapper-driven flush. Tracking that piece separately.

Signed-off-by: Ajay Thorve <athorve@nvidia.com>
…gateway"

Previously, when a real agent's gateway request beat its SessionStart hook,
`SessionManager::start_llm` created the session with `AgentKind::Gateway`.
Observer identities (ATIF agent name, Phoenix scope name) are baked at
scope-open time inside `ensure_agent_started`, so the session stayed
labeled "gateway" for its entire lifetime even after SessionStart arrived.
Result: Phoenix root spans for real Claude/Codex sessions read as
`[AGENT] gateway` instead of `[AGENT] claude-code` / `[AGENT] codex`.

Two targeted fixes:

1. In `SessionManager::start_llm`, infer the agent kind from the gateway
   provider when creating a new session: `anthropic.messages` and
   `anthropic.count_tokens` → `ClaudeCode`; `openai.responses` → `Codex`.
   Unknown providers (e.g., `openai.chat_completions`, `openai.models`)
   keep the legacy `Gateway` label so synthetic gateway-only sessions for
   unattributed proxy traffic remain semantically intact.

2. In `SessionManager::apply_events`, when an `AgentStarted` event arrives
   for an existing session whose `agent_kind` is still `Gateway` (i.e., the
   gateway path created the session before the hook), upgrade the session's
   `agent_kind` to the event's real kind. This fixes session-level metadata
   for the rare case where provider inference fell back to `Gateway` (e.g.,
   chat-completions traffic) but a SessionStart hook later identifies the
   agent.

Tests cover all three branches: anthropic gateway → claude-code label,
openai.responses gateway → codex label, ambiguous provider → gateway label
preserved.

Signed-off-by: Ajay Thorve <athorve@nvidia.com>
…nd hook flag

Three small fixes from CodeRabbit's pass on PR NVIDIA#63:

1. `Nemo Flow OpenAI` → `NeMo Flow OpenAI` in the codex transparent-run
   provider config (`launcher.rs:477`) and matching docs in
   `docs/integrate-frameworks/coding-agent-codex.md` and
   `integrations/coding-agents/codex/README.md`. Use the official NeMo
   Flow capitalization in user-facing TOML config and docs.

2. Update codex hook configuration documentation to use the canonical
   `features.hooks` / `hooks = true` keys instead of the legacy
   `features.codex_hooks` / `codex_hooks = true`. The legacy spelling is
   still accepted as an alias on codex 0.129+ but the canonical form is
   what new users should configure. Retain a one-line note that the old
   spelling is the legacy alias so users on older codex still understand
   the deprecation path.

Resolves CodeRabbit comments 3211156569, 3211760910, 3211760919, 3211156599.
No code behavior changes; 128 sidecar tests still pass.

Signed-off-by: Ajay Thorve <athorve@nvidia.com>
Codex 0.129's hook surface has no `SessionEnd`-equivalent event — its closest
session-termination signal, `Stop`, fires per-conversation-turn rather than at
session end. As a result, codex transparent runs through `nemo-flow-sidecar
run -- codex` produced empty `.nemo-flow/atif/` directories: the existing
`flush_observers` path (`end_agent` → `write_atif`) never fired.

Add a new `NormalizedEvent::TurnEnded` variant. When `classify` sees a `Stop`
hook AND the primary classification is not already terminal, it emits both the
existing `LlmHint` (preserving correlation behavior for subsequent LLM calls)
AND a `TurnEnded` event. `Session::apply` routes `TurnEnded` to a new
`snapshot_atif` method that writes the cumulative ATIF without closing the
agent scope or shutting down the OpenInference subscriber. The next turn keeps
adding events; the final Stop's snapshot is the complete trajectory.

The `is_terminal()` guard is critical: Cursor's adapter classifies `stop` as
`AgentEnded`, so without the guard we'd emit `[AgentEnded, TurnEnded]` — the
first call tears down the session and writes ATIF, the second creates an empty
synthetic session and overwrites the freshly-written ATIF with an empty
trajectory.

Safety verified by reading `AtifExporter`: `export()` takes `&self`,
documented as "does not clear the buffered events." Multiple snapshot writes
are cumulative supersets, last-write-wins yields the complete trajectory.

E2E verified against codex 0.129:
- Real codex session writes a 1.6 MB ATIF on Ctrl+D exit (was 0 bytes before)
- Multi-turn captured (5 steps: 2 user, 2 agent, 1 system)
- `agent.name = "codex"` (from earlier provider-inference fix)
- Phoenix root remains `[AGENT] codex` with proper nested LLM/tool spans

Hermes is unaffected — it has no `Stop` hook event.

Partially closes NMF-91. The remaining cross-process subagent unification gap
in NMF-91 stays upstream-blocked: codex spawns subagents as separate codex
processes with no `SubagentStart` hook, so each subagent's gateway-only
synthetic session is still its own Phoenix trace and its own ATIF (or no ATIF
when no Stop fires for it).

Tests:
- adapter: codex Stop emits TurnEnded
- adapter: claude Stop emits TurnEnded
- adapter: cursor stop does NOT also emit TurnEnded (regression: would wipe ATIF)
- session: TurnEnded snapshots ATIF without closing scope; cumulative across turns
- session: TurnEnded is a no-op when the session has no agent scope

Signed-off-by: Ajay Thorve <athorve@nvidia.com>
Defines the streaming counterpart to LlmResponseCodec, used with
llm_stream_call_execute. Each provider's StreamingCodec impl owns the
incremental state needed to assemble per-chunk events into a single
non-streaming-shape JSON payload, which the matching LlmResponseCodec
then decodes into AnnotatedLlmResponse.

Per Will's design suggestion in NMF-90:

  trait StreamingCodec {
      fn collector(&self) -> LlmCollectorFn;
      fn finalizer(&self) -> LlmFinalizerFn;
  }

Includes an SseEventDecoder utility that incrementally splits a
text/event-stream byte stream into parsed SSE events. Anthropic
Messages, OpenAI Responses, and OpenAI Chat Completions all emit one
JSON object per data: line, so the decoder is shared infrastructure
for the per-provider StreamingCodec impls landing in follow-up commits.

Refs NMF-90.

Signed-off-by: Ajay Thorve <athorve@nvidia.com>
Implements StreamingCodec for the Anthropic Messages SSE event sequence.
The collector observes message_start, content_block_start/delta/stop,
message_delta, and message_stop events, accumulating per-block state
(text, tool input partial_json, citations) keyed by content_block index.

The finalizer produces a JSON object wire-compatible with
RawAnthropicResponse — the same shape Anthropic returns for a
non-streaming Messages request — so the assembled output can be fed
back through AnthropicMessagesCodec::decode_response to produce the
canonical AnnotatedLlmResponse. Streaming and non-streaming Anthropic
requests now converge on the same observability output without per-route
shape duplication.

Internal state lives behind Arc<Mutex<...>> so the &self-produced
collector and finalizer closures share access. Each instance is
single-use because LlmFinalizerFn consumes the finalize step.

Tests cover:
- text-only response with usage and stop_reason mapping
- tool_use input assembled from partial_json fragments
- web_search_tool_result blocks (full content delivered at start)
- text blocks with citations_delta accumulation
- truncated streams falling back to raw partial_json string

Refs NMF-90.

Signed-off-by: Ajay Thorve <athorve@nvidia.com>
Implements StreamingCodec for the OpenAI Responses SSE event sequence.
The Responses API streams either complete `response` snapshots
(response.created, response.in_progress, response.completed,
response.failed, response.incomplete) or final-state output items
(response.output_item.done), so the codec:

1. Tracks the latest response snapshot — terminal events typically
   carry the complete state including `output`.
2. Tracks output items by output_index from output_item.done events,
   used as a fallback when the terminal snapshot lacks `output`.
3. Ignores per-token deltas (output_text.delta, function_call_arguments.delta)
   because their content is redelivered in the matching output_item.done.

The finalizer produces a JSON object wire-compatible with
RawResponsesResponse — feed it back through OpenAIResponsesCodec::
decode_response to produce the canonical AnnotatedLlmResponse.

Tests cover:
- terminal response.completed carries full state (common case)
- terminal completed lacks output → fall back to output_item.done items
- response.incomplete with reason maps to FinishReason::Length
- per-token deltas don't double-accumulate

Refs NMF-90.

Signed-off-by: Ajay Thorve <athorve@nvidia.com>
Implements StreamingCodec for the OpenAI Chat Completions SSE chunk
sequence. Each delta chunk may carry role (typically only the first),
incremental content fragments, partial tool_calls whose
function.arguments stream as a string, or a finish_reason terminator.
Top-level fields (id, object, created, model) are repeated on every
chunk; we capture them once. Final-chunk usage is preserved when
emitted (only sent when stream_options.include_usage is set).

The finalizer produces a JSON object wire-compatible with a
non-streaming Chat Completions response — feed it back through
OpenAIChatCodec::decode_response to produce the canonical
AnnotatedLlmResponse. The `object` field is normalized from
`chat.completion.chunk` to `chat.completion` so the assembled JSON
matches what a non-streaming response would carry.

Also: SseEventDecoder now drops the `data: [DONE]` end-of-stream
sentinel that OpenAI Chat Completions emits. It's a wire-level marker,
not a JSON payload, and would otherwise fail JSON parsing.

Tests cover:
- text-only response with usage and finish_reason mapping
- tool_call arguments assembled from string fragments
- content: null preserved when only tool calls streamed
- multiple choices (n > 1 requests) tracked separately
- intermediate null usage chunks don't overwrite real usage

Refs NMF-90.

Signed-off-by: Ajay Thorve <athorve@nvidia.com>
Match the project's existing test layout: every other codec file in
crates/core/src/codec/ pulls its tests from
crates/core/tests/unit/codec/<name>_tests.rs via #[path]. The streaming
codec tests were inlined in src/codec/streaming.rs in the initial
commit; move them out to follow the same convention so source files
stay slim and all codec test files cluster under tests/unit/codec/.

No behavior change. 283 core lib tests still passing.

Refs NMF-90.

Signed-off-by: Ajay Thorve <athorve@nvidia.com>
Refactors the gateway proxy to use llm_call_execute and
llm_stream_call_execute so the runtime owns LLM start/end events,
codec annotation, and stream lifecycle. Per-route dispatch wires
Anthropic Messages, OpenAI Responses, and OpenAI Chat Completions to
their existing typed codecs and the new streaming codecs landed
earlier in this PR.

A new SessionManager::prepare_gateway_call helper resolves the right
session, opens the agent scope, and computes correlation metadata
without holding the session lock during the upstream HTTP work. The
buffered path returns the captured upstream bytes verbatim; the
streaming path re-encodes parsed events back into SSE frames for the
client (Option B). Upstream connection failures are still surfaced as
502 Bad Gateway via a side channel that captures the original
reqwest error before the runtime collapses it to FlowError::Internal.

Trade-off A vs B for streaming response forwarding:

  A. Byte tee: split the raw upstream byte stream between the client
     and a side capture that feeds the runtime collector. Bytes pass
     through unchanged, but observability hooks see the literal wire
     bytes (gzip/identity) rather than parsed events, and the
     collector code has to duplicate per-provider SSE framing logic
     just for capture.
  B. Re-encode (chosen): parse SSE on the upstream side, feed
     parsed JSON events to the runtime collector, then re-encode
     events as SSE for the client. Adds one parse + one serialize
     per chunk, but the runtime sees identical event shapes for
     streaming and non-streaming requests, and chunk-level
     observability matches the codec's annotated response.

The legacy SessionManager::start_llm / end_llm path is retained behind
cfg(test) so existing correlation unit tests continue to exercise
resolve_llm_owner. Streaming tool-hint extraction is intentionally
deferred because the runtime synthesizes the aggregate response inside
LlmStreamWrapper and does not surface it back to the gateway.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Ajay Thorve <athorve@nvidia.com>
Salonijain27
Salonijain27 previously approved these changes May 9, 2026
Copy link
Copy Markdown
Collaborator

@Salonijain27 Salonijain27 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved from a dependency point of view

…aders

`should_record_header` previously allowed `Cookie` and the generic `api-key`
alias into the observability metadata (`LlmRequest.headers`) on every gateway
request. Those values then propagate to ATIF trajectory headers and
OpenInference span attributes, leaking session credentials and provider API
keys to anyone with access to the trace store.

Filter both alongside the existing `Authorization`, `x-api-key`, and
`anthropic-api-key` excludes. `HeaderName::as_str()` returns the canonical
lowercase form so the string comparisons remain case-insensitive by
construction.

`api-key` covers the generic alias used by Azure OpenAI and similar proxied
deployments; we already block `x-api-key` (OpenAI SDK default) and
`anthropic-api-key`. `Cookie` is added explicitly even though it has a
dedicated header constant, because it carries session credentials with the
same operational risk profile as bearer tokens.

Test `removes_hop_by_hop_headers` extended to assert both new excludes.
122 sidecar unit + 9 CLI tests still pass.

Resolves CodeRabbit comment #3211156538.

Signed-off-by: Ajay Thorve <athorve@nvidia.com>
Comment thread .gitlab-ci.yml
willkill07 and others added 2 commits May 11, 2026 11:12
Signed-off-by: Will Killian <2007799+willkill07@users.noreply.github.com>
…e AgentEnded

hermes-agent fires `on_session_end` at every user-turn boundary, distinct from
`on_session_finalize` which marks the real session close. Sessions terminated
before finalize (e.g., Ctrl+D, abrupt quit) left their ATIF un-flushed because
nothing emitted a `TurnEnded` at the per-turn boundary. The hermes adapter now
appends a `TurnEnded` after each `on_session_end`, mirroring the per-turn
snapshot pattern that codex already uses for `stop`.

Also harden `Session::end_agent` against duplicate AgentEnded events: today
`ensure_agent_started` would re-open an empty agent scope after the real one
had already closed, and `flush_observers` would overwrite the just-written
ATIF with an empty trajectory. The new guard returns early when no agent
scope is open. No behavioral change on the happy path; the existing
`close_agent_scope` idempotency comment already anticipated this case.

Tests:
- `maps_hermes_real_session_boundary_without_closing_per_turn_end` extended
  to assert the per-turn TurnEnded emission alongside the HookMark.
- `duplicate_agent_end_does_not_overwrite_atif_with_empty_session` added to
  pin the idempotency contract.

Found while running hermes end-to-end through nemo-flow-sidecar against
inference-api.nvidia.com with Phoenix and ATIF observers.

Signed-off-by: Ajay Thorve <athorve@nvidia.com>
Copy link
Copy Markdown

@AjayThorve AjayThorve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested e2e with Codex, claude-code and hermes.

  1. Claude-code: Works really well
  2. Codex: Works well, but sub-agent hooks are missing (tracking in NMF-88 on linear)
  3. Hermes: Works well, but sub-agent-start hook missing (tracking in NMF-93)

@willkill07
Copy link
Copy Markdown
Member Author

/merge

@rapids-bot rapids-bot Bot merged commit 2b7f565 into NVIDIA:main May 11, 2026
116 of 117 checks passed
@coderabbitai coderabbitai Bot mentioned this pull request May 11, 2026
2 tasks
@willkill07 willkill07 added the Feature a new feature label May 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Feature a new feature lang:python PR changes/introduces Python code lang:rust PR changes/introduces Rust code size:XXL PR is very large

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants