fix(openai-agents): emit cache_read.input_tokens and reasoning_tokens by dvirski · Pull Request #4130 · traceloop/openllmetry

dvirski · 2026-05-12T13:03:18Z

Problem:
gen_ai.usage.reasoning_tokens was the wrong attribute name — the OTel GenAI spec defines it as gen_ai.usage.reasoning.output_tokens. Similarly, the LLM_USAGE_CACHE_READ_INPUT_TOKENS constant resolved to gen_ai.usage.cache_read_input_tokens (underscore) instead of
the spec-correct gen_ai.usage.cache_read.input_tokens (dot). The openai-agents instrumentor also wasn't recording cache read or reasoning tokens at all. The opentelemetry-semantic-conventions floor was too old (0.60b1) to provide the official upstream cache token constants.

Fix:
Bumped opentelemetry-semantic-conventions to >=0.62b1 so cache token constants come from the upstream OTel package. Added GEN_AI_USAGE_REASONING_OUTPUT_TOKENS = "gen_ai.usage.reasoning.output_tokens" to semconv_ai and removed the old incorrect constant. Updated the
openai and openai-agents instrumentors to use the corrected constants. Added two new tests with recorded cassettes for cache read and reasoning token recording in openai-agents.

Summary by CodeRabbit

New Features
- Instrumentation now records cached input token counts and reasoning token counts for OpenAI agent calls.
Dependencies
- Minimum required opentelemetry-semantic-conventions updated to 0.62b1.
Tests
- Added unit tests to verify the new token-tracking attributes are emitted.

CLAassistant · 2026-05-12T13:03:49Z

All committers have signed the CLA.

coderabbitai · 2026-05-12T13:11:20Z

📝 Walkthrough

Walkthrough

Extended OpenAI Agents instrumentation to extract cached input and reasoning tokens from response usage details. Updated dependency constraint to opentelemetry-semantic-conventions>=0.62b1. Added unit tests validating both token attributes are emitted to spans.

Changes

Cached and Reasoning Token Extraction for OpenAI Agents

Layer / File(s)	Summary
Hook implementation and dependency update `opentelemetry/instrumentation/openai_agents/_hooks.py`, `pyproject.toml`	Added conditional extraction of `input_tokens_details.cached_tokens` and `output_tokens_details.reasoning_tokens` in `_extract_response_attributes` to emit `GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS` and `GEN_AI_USAGE_REASONING_TOKENS`; bumped `opentelemetry-semantic-conventions` lower bound to `>=0.62b1`.
Unit tests for token attribute extraction `tests/test_openai_agents.py`	Imported `SimpleNamespace` and internal `_extract_response_attributes`, added `_make_fake_response` helper to construct minimal duck-typed responses with usage details, and added two tests asserting cached input token and reasoning token attributes are set on spans.

Sequence Diagram

sequenceDiagram
  participant ResponseUsage as Response.usage
  participant ExtractHook as _extract_response_attributes
  participant OTel_Span as OTel_Span
  ResponseUsage->>ExtractHook: pass usage object
  ExtractHook->>ExtractHook: read input_tokens_details.cached_tokens
  ExtractHook->>OTel_Span: set_attribute(GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS, cached)
  ExtractHook->>ExtractHook: read output_tokens_details.reasoning_tokens
  ExtractHook->>OTel_Span: set_attribute(GEN_AI_USAGE_REASONING_TOKENS, reasoning)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested reviewers

galzilber
nina-kollman
doronkopit5
max-deygin-traceloop
netanel-tl

Poem

🐰 I nibble on spans in the quiet night,
I find cached tokens tucked out of sight,
I count the reasoning, I hop and I trace,
Now spans glow with truth in their metadata space,
A tiny hop brings token counts to place.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check	✅ Passed	The PR fully implements the primary objectives from `#4060`: extracting cached_tokens from usage.input_tokens_details and reasoning_tokens from output_tokens_details, plus updating the dependency floor to >=0.62b1 to access upstream cache token constants.
Out of Scope Changes check	✅ Passed	All changes are directly scoped to `#4060`: hook implementation updates for cache/reasoning tokens, dependency bump for upstream constants, tests for the new functionality, and no unrelated modifications.
Title check	✅ Passed	The pull request title directly and clearly reflects the main changes: adding emission of cache_read.input_tokens and reasoning_tokens attributes in the openai-agents instrumentation.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch dr/feat(semconv)-align-reasoning-and-cache-token-attributes-with-OTel-GenAI-spec

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 5

🧹 Nitpick comments (1)

packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py (1)

554-557: ⚡ Quick win

Select the target span by attribute instead of positional index.

Using response_spans[0] is brittle when multiple openai.response spans are emitted; pick the span that actually has the asserted attribute.

More deterministic span selection

-    response_span = response_spans[0]
+    response_span = next(
+        (s for s in response_spans if GenAIAttributes.GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS in s.attributes),
+        None,
+    )
+    assert response_span is not None

-    response_span = response_spans[0]
+    response_span = next(
+        (s for s in response_spans if SpanAttributes.GEN_AI_USAGE_REASONING_OUTPUT_TOKENS in s.attributes),
+        None,
+    )
+    assert response_span is not None

Also applies to: 574-577

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py`
around lines 554 - 557, The test currently picks a span by position
(response_spans[0]) which is brittle; replace that with a deterministic
selection that finds the span by the attribute you assert later (use
response_spans and the specific attribute key used in the assertions). For
example, replace uses of response_spans[0] in the test with code that selects
the span via next(s for s in response_spans if '<ATTRIBUTE_KEY>' in s.attributes
or s.attributes.get('<ATTRIBUTE_KEY>') == '<EXPECTED_VALUE>'), where
<ATTRIBUTE_KEY> is the attribute name asserted later in the test; do this for
both occurrences (around the response_spans usage at the two ranges noted) so
the test picks the span by attribute, not index.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@packages/opentelemetry-instrumentation-openai-agents/tests/cassettes/test_openai_agents/test_cache_read_input_tokens_recorded.yaml`:
- Around line 90-93: Remove all Cookie and set-cookie header values from the
cassette test_openai_agents/test_cache_read_input_tokens_recorded.yaml (and the
other occurrences noted) by redacting those header lines and re-recording the
cassette; ensure the test harness that generates cassettes filters or normalizes
"Cookie" request headers and "set-cookie" response headers before saving (update
the cassette recording/filtering logic to strip or replace these headers with a
stable placeholder) so future recordings do not include session cookies and
re-run the tests to commit the sanitized cassette.

In
`@packages/opentelemetry-instrumentation-openai-agents/tests/cassettes/test_openai_agents/test_reasoning_output_tokens_recorded.yaml`:
- Around line 16-17: The cassette
test_openai_agents/test_reasoning_output_tokens_recorded.yaml contains
sensitive/volatile headers (e.g., Cookie and request/infra identifiers) that
must be filtered from VCR recordings; update the test recording configuration
used for this suite to strip or replace Cookie and any request/infra ID headers
(e.g., CF identifiers, X-Request-Id) during capture, re-record the cassette, and
commit the regenerated YAML so the cassette no longer contains those sensitive
values.

In
`@packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py`:
- Around line 546-547: The test's cache-hit assertion currently uses the
expression ">= 0" which is too weak; locate the assertion that checks for a
cache hit (the line asserting ">= 0") in tests/test_openai_agents.py and change
it to use "> 0" so the second call is enforced as a cache read per the test
docstring; update any variable names involved (e.g., cache_hit_count or similar)
accordingly and run the test to confirm it now fails when there is no cache hit.
- Around line 544-545: The VCR configuration in conftest.py only filters
["authorization", "api-key"] and misses CloudFlare cookies, which are present in
cassettes like test_reasoning_output_tokens_recorded.yaml; update the VCR
config's filter_headers list to include "cookie" (i.e., modify the
filter_headers variable/setting in the VCR setup in conftest.py to add "cookie")
and then re-run the test suite to re-record the affected cassettes so
session/tracking cookies are scrubbed from the YAML files.

In
`@packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py`:
- Line 82: Add a deprecated alias for the renamed constant so existing imports
still work: in opentelemetry.semconv_ai.__init__.py (next to
GEN_AI_USAGE_REASONING_OUTPUT_TOKENS) define the old symbol
SpanAttributes.GEN_AI_USAGE_REASONING_TOKENS (or simply
GEN_AI_USAGE_REASONING_TOKENS if constants are module-level) and assign it the
same string value as GEN_AI_USAGE_REASONING_OUTPUT_TOKENS, include a brief
comment noting it is deprecated and optionally emit a DeprecationWarning when
accessed to guide users to the new constant name.

---

Nitpick comments:
In
`@packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py`:
- Around line 554-557: The test currently picks a span by position
(response_spans[0]) which is brittle; replace that with a deterministic
selection that finds the span by the attribute you assert later (use
response_spans and the specific attribute key used in the assertions). For
example, replace uses of response_spans[0] in the test with code that selects
the span via next(s for s in response_spans if '<ATTRIBUTE_KEY>' in s.attributes
or s.attributes.get('<ATTRIBUTE_KEY>') == '<EXPECTED_VALUE>'), where
<ATTRIBUTE_KEY> is the attribute name asserted later in the test; do this for
both occurrences (around the response_spans usage at the two ranges noted) so
the test picks the span by attribute, not index.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 21e8e29b-0fa9-4aff-8aaf-b153db91fba1

📥 Commits

Reviewing files that changed from the base of the PR and between 6d3e696 and 5306120.

⛔ Files ignored due to path filters (28)

packages/opentelemetry-instrumentation-agno/uv.lock is excluded by !**/*.lock
packages/opentelemetry-instrumentation-anthropic/uv.lock is excluded by !**/*.lock
packages/opentelemetry-instrumentation-bedrock/uv.lock is excluded by !**/*.lock
packages/opentelemetry-instrumentation-chromadb/uv.lock is excluded by !**/*.lock
packages/opentelemetry-instrumentation-cohere/uv.lock is excluded by !**/*.lock
packages/opentelemetry-instrumentation-crewai/uv.lock is excluded by !**/*.lock
packages/opentelemetry-instrumentation-groq/uv.lock is excluded by !**/*.lock
packages/opentelemetry-instrumentation-haystack/uv.lock is excluded by !**/*.lock
packages/opentelemetry-instrumentation-lancedb/uv.lock is excluded by !**/*.lock
packages/opentelemetry-instrumentation-marqo/uv.lock is excluded by !**/*.lock
packages/opentelemetry-instrumentation-mcp/uv.lock is excluded by !**/*.lock
packages/opentelemetry-instrumentation-milvus/uv.lock is excluded by !**/*.lock
packages/opentelemetry-instrumentation-mistralai/uv.lock is excluded by !**/*.lock
packages/opentelemetry-instrumentation-ollama/uv.lock is excluded by !**/*.lock
packages/opentelemetry-instrumentation-openai-agents/uv.lock is excluded by !**/*.lock
packages/opentelemetry-instrumentation-openai/uv.lock is excluded by !**/*.lock
packages/opentelemetry-instrumentation-pinecone/uv.lock is excluded by !**/*.lock
packages/opentelemetry-instrumentation-qdrant/uv.lock is excluded by !**/*.lock
packages/opentelemetry-instrumentation-replicate/uv.lock is excluded by !**/*.lock
packages/opentelemetry-instrumentation-sagemaker/uv.lock is excluded by !**/*.lock
packages/opentelemetry-instrumentation-together/uv.lock is excluded by !**/*.lock
packages/opentelemetry-instrumentation-transformers/uv.lock is excluded by !**/*.lock
packages/opentelemetry-instrumentation-vertexai/uv.lock is excluded by !**/*.lock
packages/opentelemetry-instrumentation-voyageai/uv.lock is excluded by !**/*.lock
packages/opentelemetry-instrumentation-watsonx/uv.lock is excluded by !**/*.lock
packages/opentelemetry-instrumentation-writer/uv.lock is excluded by !**/*.lock
packages/sample-app/uv.lock is excluded by !**/*.lock
packages/traceloop-sdk/uv.lock is excluded by !**/*.lock

📒 Files selected for processing (13)

packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py
packages/opentelemetry-instrumentation-openai-agents/pyproject.toml
packages/opentelemetry-instrumentation-openai-agents/tests/cassettes/test_openai_agents/test_cache_read_input_tokens_recorded.yaml
packages/opentelemetry-instrumentation-openai-agents/tests/cassettes/test_openai_agents/test_reasoning_output_tokens_recorded.yaml
packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py
packages/opentelemetry-instrumentation-openai/tests/traces/test_azure.py
packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/_testing.py
packages/traceloop-sdk/tests/test_manual.py
packages/traceloop-sdk/traceloop/sdk/tracing/manual.py

coderabbitai · 2026-05-12T13:11:24Z

+      Cookie:
+      - __cf_bm=GqvMTTwxqvHlYO_zMGr15vhngiHBKowc4W9cytloB1U-1778513729.5012145-1.0.1.1-TuWmsWma1J3OWFenDizlhbf_5qCYt89CjxNLLDDm.zLzVxhm2ep.2mEgG6rExy5xHOM0Quwrvd09cIn.yYTv.TO2jCa5bit4sQJ6ZIeHevWLiejQyIXZYqEUyXQItPzY


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Scrub sensitive/volatile headers from cassette recordings.

Cookie and request/infra identifiers should not be committed in cassettes; they are sensitive/ephemeral and make fixtures less stable. Please filter these headers during VCR recording and regenerate this cassette.

Also applies to: 60-64, 104-105

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/opentelemetry-instrumentation-openai-agents/tests/cassettes/test_openai_agents/test_reasoning_output_tokens_recorded.yaml` around lines 16 - 17, The cassette test_openai_agents/test_reasoning_output_tokens_recorded.yaml contains sensitive/volatile headers (e.g., Cookie and request/infra identifiers) that must be filtered from VCR recordings; update the test recording configuration used for this suite to strip or replace Cookie and any request/infra ID headers (e.g., CF identifiers, X-Request-Id) during capture, re-record the cassette, and commit the regenerated YAML so the cassette no longer contains those sensitive values.

coderabbitai · 2026-05-12T13:11:24Z

+@pytest.mark.vcr
+def test_cache_read_input_tokens_recorded(exporter, test_agent):


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail # Locate the new cassettes referenced by these tests fd -a 'test_(cache_read_input_tokens_recorded|reasoning_output_tokens_recorded)\.yaml' \ packages/opentelemetry-instrumentation-openai-agents/tests/cassettes # Scan cassettes for common secret/PII indicators (review any hits; placeholders/masked values are acceptable) rg -n --iglob '*.yaml' -e 'Authorization|Bearer\s+[A-Za-z0-9\-_\.]+|api[_-]?key|sk-[A-Za-z0-9]|@|email' \ packages/opentelemetry-instrumentation-openai-agents/tests/cassettes

Repository: traceloop/openllmetry

Length of output: 365

🏁 Script executed:

#!/bin/bash # Check the cassette files for any sensitive content cd packages/opentelemetry-instrumentation-openai-agents/tests/cassettes/test_openai_agents # Show file sizes and examine key sections of the cassettes for cassette in test_cache_read_input_tokens_recorded.yaml test_reasoning_output_tokens_recorded.yaml; do if [ -f "$cassette" ]; then echo "=== $cassette ===" wc -l "$cassette" echo "--- First 50 lines ---" head -50 "$cassette" echo "--- Looking for potential secrets (detailed check) ---" rg -n 'sk_|Bearer|Authorization|api.?key|token|secret|password' "$cassette" || echo "No matches found" fi done

Repository: traceloop/openllmetry

Length of output: 6564

🏁 Script executed:

#!/bin/bash # Check the test file to see if VCR filters are configured cd packages/opentelemetry-instrumentation-openai-agents/tests # Search for VCR decorator and filter configuration around the test methods rg -B5 -A10 'def test_(cache_read_input_tokens_recorded|reasoning_output_tokens_recorded)' test_openai_agents.py

Repository: traceloop/openllmetry

Length of output: 1274

🏁 Script executed:

#!/bin/bash # Look for VCR configuration in conftest files fd conftest.py packages/opentelemetry-instrumentation-openai-agents/tests | head -5 # Check the conftest for VCR filter setup if [ -f packages/opentelemetry-instrumentation-openai-agents/tests/conftest.py ]; then echo "=== conftest.py content ===" cat -n packages/opentelemetry-instrumentation-openai-agents/tests/conftest.py fi

Repository: traceloop/openllmetry

Length of output: 11337

🏁 Script executed:

#!/bin/bash # Also check for pytest.ini or setup.cfg with VCR config find packages/opentelemetry-instrumentation-openai-agents/tests -maxdepth 1 $ -name "pytest.ini" -o -name "setup.cfg" -o -name "pyproject.toml" $ -exec echo "=== {} ===" \; -exec cat {} \;

Repository: traceloop/openllmetry

Length of output: 47

VCR cassettes need improved filtering configuration and re-recording.

The conftest.py VCR config (lines 284–286) only filters ["authorization", "api-key"] headers, but the cassettes contain unfiltered CloudFlare bot management cookies (e.g., __cf_bm=... in test_reasoning_output_tokens_recorded.yaml). Add "cookie" to the filter_headers list and re-record the cassettes to ensure all session/tracking data is properly scrubbed.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py` around lines 544 - 545, The VCR configuration in conftest.py only filters ["authorization", "api-key"] and misses CloudFlare cookies, which are present in cassettes like test_reasoning_output_tokens_recorded.yaml; update the VCR config's filter_headers list to include "cookie" (i.e., modify the filter_headers variable/setting in the VCR setup in conftest.py to add "cookie") and then re-run the test suite to re-record the affected cassettes so session/tracking cookies are scrubbed from the YAML files.

coderabbitai · 2026-05-12T13:11:24Z

    GEN_AI_CONTENT_COMPLETION_CHUNK = "gen_ai.content.completion.chunk"
    GEN_AI_REQUEST_REASONING_EFFORT = "gen_ai.request.reasoning_effort"
-    GEN_AI_USAGE_REASONING_TOKENS = "gen_ai.usage.reasoning_tokens"
+    GEN_AI_USAGE_REASONING_OUTPUT_TOKENS = "gen_ai.usage.reasoning.output_tokens"


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Preserve backward compatibility for the renamed reasoning constant.

Dropping SpanAttributes.GEN_AI_USAGE_REASONING_TOKENS can break consumers importing that symbol. Keep a deprecated alias pointing to the new key to avoid runtime import breaks.

Proposed compatibility patch

class SpanAttributes: @@ GEN_AI_USAGE_REASONING_OUTPUT_TOKENS = "gen_ai.usage.reasoning.output_tokens" + # Backward-compatible alias (deprecated): remove in next major version + GEN_AI_USAGE_REASONING_TOKENS = GEN_AI_USAGE_REASONING_OUTPUT_TOKENS

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

GEN_AI_USAGE_REASONING_OUTPUT_TOKENS = "gen_ai.usage.reasoning.output_tokens"

GEN_AI_USAGE_REASONING_OUTPUT_TOKENS = "gen_ai.usage.reasoning.output_tokens"

# Backward-compatible alias (deprecated): remove in next major version

GEN_AI_USAGE_REASONING_TOKENS = GEN_AI_USAGE_REASONING_OUTPUT_TOKENS

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py` at line 82, Add a deprecated alias for the renamed constant so existing imports still work: in opentelemetry.semconv_ai.__init__.py (next to GEN_AI_USAGE_REASONING_OUTPUT_TOKENS) define the old symbol SpanAttributes.GEN_AI_USAGE_REASONING_TOKENS (or simply GEN_AI_USAGE_REASONING_TOKENS if constants are module-level) and assign it the same string value as GEN_AI_USAGE_REASONING_OUTPUT_TOKENS, include a brief comment noting it is deprecated and optionally emit a DeprecationWarning when accessed to guide users to the new constant name.

doronkopit5 · 2026-05-12T13:44:56Z

Reviewed the change — direction is right (aligning attribute names with the OTel GenAI spec). A few things worth addressing:

1. Two sources of truth for the cache constants

The same constants exist in both places, resolving to the same string today:

opentelemetry.semconv._incubating.attributes.gen_ai_attributes.GEN_AI_USAGE_CACHE_*_INPUT_TOKENS (upstream OTel)
opentelemetry.semconv_ai.SpanAttributes.GEN_AI_USAGE_CACHE_*_INPUT_TOKENS (this repo)

The PR mixes both: _hooks.py and manual.py use GenAIAttributes (upstream), while pre-existing code in chat_wrappers.py, responses_wrappers.py, anthropic, langchain uses SpanAttributes from semconv_ai. Both work, but it's nicer to have one source of truth.

I'd lean toward standardizing on the upstream OTel-native GenAIAttributes here — that's the spec source — but happy to leave the call to you. Whatever you pick, please apply it consistently across the touched files so callers don't have to know which path leads where.

2. `traceloop-sdk` needs an explicit upstream semconv floor

manual.py now uses GenAIAttributes.GEN_AI_USAGE_CACHE_*_INPUT_TOKENS, but packages/traceloop-sdk/pyproject.toml doesn't pin opentelemetry-semantic-conventions. With <0.62b1 installed transitively, that attribute either doesn't exist (→ AttributeError) or has the old underscore string (silently wrong telemetry).

Please add opentelemetry-semantic-conventions>=0.62b1 to packages/traceloop-sdk/pyproject.toml (same bump you did for openai-agents).

3. Removing `GEN_AI_USAGE_REASONING_TOKENS` is a breaking change with no alias

Deleting the class attribute from SpanAttributes will break any downstream code importing SpanAttributes.GEN_AI_USAGE_REASONING_TOKENS with AttributeError. The repo already has prior art for graceful renames (see the "Legacy LLM_* aliases" block in semconv_ai/__init__.py). Suggest keeping a deprecation alias:

GEN_AI_USAGE_REASONING_OUTPUT_TOKENS = "gen_ai.usage.reasoning.output_tokens"
# Deprecated alias; use GEN_AI_USAGE_REASONING_OUTPUT_TOKENS
GEN_AI_USAGE_REASONING_TOKENS = GEN_AI_USAGE_REASONING_OUTPUT_TOKENS

Also note that the string value changes too (gen_ai.usage.reasoning_tokens → gen_ai.usage.reasoning.output_tokens), which is a wire-level break for dashboards. Worth a line in MIGRATION.md under the "Changed string values" section.

5. New test assertions don't actually verify the feature

assert response_span.attributes[GenAIAttributes.GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS] >= 0
assert response_span.attributes[SpanAttributes.GEN_AI_USAGE_REASONING_OUTPUT_TOKENS] >= 0

Both >= 0 assertions are trivially true once the attribute is recorded. The cache test is intentionally structured (first call writes cache → clear() → second call reads) precisely to exercise caching — so it should assert > 0 to prove the second call actually got a cache hit. Same for o3-mini reasoning tokens, which per the docstring always reports > 0. Tightening to > 0 makes these tests actually catch regressions.

dvirski · 2026-05-17T16:35:32Z

@doronkopit5

Changes in response to review feedback

Reverted out-of-scope changes — Removed edits to traceloop-sdk/manual.py, openai instrumentor (chat_wrappers.py, responses_wrappers.py, test_azure.py, test_chat.py), and the
GEN_AI_USAGE_REASONING_TOKENS rename in semconv_ai. PR now contains only changes required for 🐛 Bug Report: openai-agents instrumentation doesn't record cached input tokens #4060.
Resolves CR feat: sdk code + openai instrumentation #4 / Doron feat: Pinecone Instrumentation #3 (reasoning rename breaking change) — Reverted; constant name and string value unchanged from main. No alias needed.
Resolves Doron chore: nx migration #2 (traceloop-sdk semconv floor) — Moot after reverting manual.py.
Resolves CR chore: add instrumentations and sample app boilerplate #5 / Doron chore: add instrumentations and sample app boilerplate #5 (weak >= 0 assertions) — Replaced both VCR tests with two deterministic unit tests that call _extract_response_attributes directly with a documented fake Response
and assert exact emitted values (1024, 256). Fake-response shape is documented inline against the openai-agents SDK source (v0.14.2 pinned floor), a real Responses API payload in an
existing cassette in this repo, and the OpenAI API reference.
Resolves CR initial monorepo structure #1, chore: nx migration #2, feat: Pinecone Instrumentation #3 (cookie/sensitive headers in cassettes) — Deleted both new cassettes. Unit tests need no VCR, no API key.
Resolves CR nitpick fix: wrong imports and missing code components #6 (brittle response_spans[0]) — Gone with the VCR tests; new tests use a MagicMock span directly.
Partially addresses Doron initial monorepo structure #1 (single source of truth) — _hooks.py uses upstream GenAIAttributes.GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS for cache. Reasoning continues to use
SpanAttributes.GEN_AI_USAG

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py`:
- Around line 591-595: The current code sets the
GenAIAttributes.GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS attribute even when
cached_tokens == 0; change the guard so you only set the attribute when
cached_tokens > 0. Locate the block in _hooks.py where cached_tokens =
getattr(input_details, "cached_tokens", None) and replace the if cached_tokens
is not None check with an explicit positive check (e.g., cached_tokens is not
None and cached_tokens > 0) before calling otel_span.set_attribute to ensure
only real cache hits are recorded.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: da513cee-a70f-4b55-983f-b65d5991a3f9

📥 Commits

Reviewing files that changed from the base of the PR and between 5306120 and 895aa3c.

⛔ Files ignored due to path filters (1)

packages/traceloop-sdk/uv.lock is excluded by !**/*.lock

📒 Files selected for processing (2)

packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py
packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py

coderabbitai · 2026-05-17T16:36:08Z

+            cached_tokens = getattr(input_details, "cached_tokens", None)
+            if cached_tokens is not None:
+                otel_span.set_attribute(
+                    GenAIAttributes.GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS, cached_tokens
+                )


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Only set cache-read usage on actual cache hits.

Line 592 currently records cache-read tokens even when cached_tokens == 0. Gate this to > 0 to match cache-hit semantics and avoid noisy attributes.

Suggested fix

- if cached_tokens is not None: + if cached_tokens is not None and cached_tokens > 0: otel_span.set_attribute( GenAIAttributes.GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS, cached_tokens )

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py` around lines 591 - 595, The current code sets the GenAIAttributes.GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS attribute even when cached_tokens == 0; change the guard so you only set the attribute when cached_tokens > 0. Locate the block in _hooks.py where cached_tokens = getattr(input_details, "cached_tokens", None) and replace the if cached_tokens is not None check with an explicit positive check (e.g., cached_tokens is not None and cached_tokens > 0) before calling otel_span.set_attribute to ensure only real cache hits are recorded.

…enAI spec

…anuel instrumentation

doronkopit5 · 2026-05-17T19:13:32Z

Re-reviewed after the latest push (86e2d74 + d77ffe4e). Major improvements — happy with the direction:

✅ Resolved

Scope tightened: PR is now 82 lines of code change in 3 files (excluding lock-file churn), focused squarely on 🐛 Bug Report: openai-agents instrumentation doesn't record cached input tokens #4060. The manual.py change, chat_wrappers/responses_wrappers rename, and SpanAttributes rename are all out — that addresses my chore: nx migration #2, feat: Pinecone Instrumentation #3, and feat: sdk code + openai instrumentation #4 cleanly.
Tests are now meaningful: SimpleNamespace fakes with exact value assertions (1024, 256) are much better than the >= 0 VCR tests. The docstring linking the fake shape to the SDK source, the live cassette, and the OpenAI docs is nice — easy to keep in sync. Also drops the cassettes with Cookie / set-cookie headers, which is a separate win.
Cache fix using upstream constant: GenAIAttributes.GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS is the right pick — matches the OTel-native preference and the >=0.62b1 pyproject floor backs it correctly (verified: that constant only exists in upstream from 0.62b1 onward, with the dot-form string).

⚠️ Worth a small follow-up

PR title and description are now stale. The title still says "align reasoning and cache token attributes with OTel GenAI spec" and the description still claims "Added GEN_AI_USAGE_REASONING_OUTPUT_TOKENS …" — but the reasoning rename was reverted. After this PR, _hooks.py will emit gen_ai.usage.reasoning_tokens (the non-spec string from SpanAttributes.GEN_AI_USAGE_REASONING_TOKENS), not gen_ai.usage.reasoning.output_tokens. Please retitle to something like fix(openai-agents): record cached input + reasoning tokens (#4060) and trim the description to match.
Reasoning attribute string is still non-spec — for awareness, not blocking this PR. The OTel GenAI spec calls for gen_ai.usage.reasoning.output_tokens, and the upstream 0.62b1 package doesn't yet define a constant for it (verified). Once the semconv_ai constant is corrected (or the upstream catches up), _hooks.py will pick up the right string automatically since it uses the named constant. Worth tracking as a follow-up issue so it's not forgotten.

Tests don't exercise the None-guard paths — minor. Both input_tokens_details is None and cached_tokens is None short-circuits go untested. Two extra one-liners would lock in the guard behavior:

def test_extract_response_attributes_no_cache_when_details_missing():
    response = _make_fake_response()  # both None
    span = MagicMock()
    _extract_response_attributes(span, response, trace_content=False)
    for call in span.set_attribute.call_args_list:
        assert call.args[0] != GenAIAttributes.GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS
        assert call.args[0] != SpanAttributes.GEN_AI_USAGE_REASONING_TOKENS

Not blocking — the existing tests cover the positive paths well.

LGTM once the title/description are updated. Nice cleanup.

doronkopit5

just fix the last comment form code rabbit about the non zero attribute

dvirski · 2026-05-18T07:06:14Z

just fix the last comment form code rabbit about the non zero attribute

@doronkopit5

isn't zero cache token also meaningful? like it doesn't mean "no cache hit".

Adding a check summary from otel spec:

Spec check (semconv reference (https://opentelemetry.io/docs/specs/semconv/registry/attributes/gen-ai/)): gen_ai.usage.cache_read.input_tokens is classified as Recommended with no "when applicable" qualifier — the same level as gen_ai.usage.input_tokens and gen_ai.usage.output_tokens. The spec
doesn't suggest omitting it when zero. Suppressing 0 would make "no cache hit" indistinguishable from "instrumentor doesn't capture it," and break aggregations like cache-hit-rate. Matching how the existing openai instrumentor emits this attribute is also the safer choice for trace consistency.
Keeping is not None as the guard.

dvirski requested review from doronkopit5, galzilber, max-deygin-traceloop, netanel-tl and nina-kollman May 12, 2026 13:03

coderabbitai Bot reviewed May 12, 2026

View reviewed changes

coderabbitai Bot reviewed May 17, 2026

View reviewed changes

dvirski force-pushed the dr/feat(semconv)-align-reasoning-and-cache-token-attributes-with-OTel-GenAI-spec branch from 895aa3c to 8d827c4 Compare May 17, 2026 16:36

dvirski and others added 4 commits May 17, 2026 21:26

feat(semconv): align reasoning and cache token attributes with OTel G…

cff5a0a

…enAI spec

Using GEN_AI_USAGE_CACHE_CREATION_INPUT_TOKENS corrent attribute in m…

56fc0e7

…anuel instrumentation

Reverted changes unrelated to the #4060 issue

86e2d74

CR Fixes

d77ffe4

dvirski force-pushed the dr/feat(semconv)-align-reasoning-and-cache-token-attributes-with-OTel-GenAI-spec branch from 8d827c4 to d77ffe4 Compare May 17, 2026 18:26

doronkopit5 approved these changes May 17, 2026

View reviewed changes

dvirski changed the title ~~fix(openai): align reasoning and cache token attributes with OTel GenAI spec~~ fix(openai-agents): emit cache_read.input_tokens and reasoning_tokens May 17, 2026

		Cookie:
		- __cf_bm=GqvMTTwxqvHlYO_zMGr15vhngiHBKowc4W9cytloB1U-1778513729.5012145-1.0.1.1-TuWmsWma1J3OWFenDizlhbf_5qCYt89CjxNLLDDm.zLzVxhm2ep.2mEgG6rExy5xHOM0Quwrvd09cIn.yYTv.TO2jCa5bit4sQJ6ZIeHevWLiejQyIXZYqEUyXQItPzY

		@pytest.mark.vcr
		def test_cache_read_input_tokens_recorded(exporter, test_agent):

Conversation

dvirski commented May 12, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

CLAassistant commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot May 12, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 12, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot May 12, 2026

Choose a reason for hiding this comment

Uh oh!

doronkopit5 commented May 12, 2026

1. Two sources of truth for the cache constants

2. traceloop-sdk needs an explicit upstream semconv floor

3. Removing GEN_AI_USAGE_REASONING_TOKENS is a breaking change with no alias

5. New test assertions don't actually verify the feature

Uh oh!

dvirski commented May 17, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 17, 2026

Choose a reason for hiding this comment

Uh oh!

doronkopit5 commented May 17, 2026

✅ Resolved

⚠️ Worth a small follow-up

Uh oh!

doronkopit5 left a comment

Choose a reason for hiding this comment

Uh oh!

dvirski commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dvirski commented May 12, 2026 •

edited by coderabbitai Bot

Loading

CLAassistant commented May 12, 2026 •

edited

Loading

coderabbitai Bot commented May 12, 2026 •

edited

Loading

2. `traceloop-sdk` needs an explicit upstream semconv floor

3. Removing `GEN_AI_USAGE_REASONING_TOKENS` is a breaking change with no alias