Skip to content

fix(openai-agents): emit cache_read.input_tokens and reasoning_tokens#4130

Open
dvirski wants to merge 4 commits into
mainfrom
dr/feat(semconv)-align-reasoning-and-cache-token-attributes-with-OTel-GenAI-spec
Open

fix(openai-agents): emit cache_read.input_tokens and reasoning_tokens#4130
dvirski wants to merge 4 commits into
mainfrom
dr/feat(semconv)-align-reasoning-and-cache-token-attributes-with-OTel-GenAI-spec

Conversation

@dvirski
Copy link
Copy Markdown
Contributor

@dvirski dvirski commented May 12, 2026

Fixes #4060

Problem:
gen_ai.usage.reasoning_tokens was the wrong attribute name — the OTel GenAI spec defines it as gen_ai.usage.reasoning.output_tokens. Similarly, the LLM_USAGE_CACHE_READ_INPUT_TOKENS constant resolved to gen_ai.usage.cache_read_input_tokens (underscore) instead of
the spec-correct gen_ai.usage.cache_read.input_tokens (dot). The openai-agents instrumentor also wasn't recording cache read or reasoning tokens at all. The opentelemetry-semantic-conventions floor was too old (0.60b1) to provide the official upstream cache token constants.

Fix:
Bumped opentelemetry-semantic-conventions to >=0.62b1 so cache token constants come from the upstream OTel package. Added GEN_AI_USAGE_REASONING_OUTPUT_TOKENS = "gen_ai.usage.reasoning.output_tokens" to semconv_ai and removed the old incorrect constant. Updated the
openai and openai-agents instrumentors to use the corrected constants. Added two new tests with recorded cassettes for cache read and reasoning token recording in openai-agents.

Summary by CodeRabbit

  • New Features

    • Instrumentation now records cached input token counts and reasoning token counts for OpenAI agent calls.
  • Dependencies

    • Minimum required opentelemetry-semantic-conventions updated to 0.62b1.
  • Tests

    • Added unit tests to verify the new token-tracking attributes are emitted.

Review Change Stack

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented May 12, 2026

CLA assistant check
All committers have signed the CLA.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 12, 2026

📝 Walkthrough

Walkthrough

Extended OpenAI Agents instrumentation to extract cached input and reasoning tokens from response usage details. Updated dependency constraint to opentelemetry-semantic-conventions>=0.62b1. Added unit tests validating both token attributes are emitted to spans.

Changes

Cached and Reasoning Token Extraction for OpenAI Agents

Layer / File(s) Summary
Hook implementation and dependency update
opentelemetry/instrumentation/openai_agents/_hooks.py, pyproject.toml
Added conditional extraction of input_tokens_details.cached_tokens and output_tokens_details.reasoning_tokens in _extract_response_attributes to emit GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS and GEN_AI_USAGE_REASONING_TOKENS; bumped opentelemetry-semantic-conventions lower bound to >=0.62b1.
Unit tests for token attribute extraction
tests/test_openai_agents.py
Imported SimpleNamespace and internal _extract_response_attributes, added _make_fake_response helper to construct minimal duck-typed responses with usage details, and added two tests asserting cached input token and reasoning token attributes are set on spans.

Sequence Diagram

sequenceDiagram
  participant ResponseUsage as Response.usage
  participant ExtractHook as _extract_response_attributes
  participant OTel_Span as OTel_Span
  ResponseUsage->>ExtractHook: pass usage object
  ExtractHook->>ExtractHook: read input_tokens_details.cached_tokens
  ExtractHook->>OTel_Span: set_attribute(GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS, cached)
  ExtractHook->>ExtractHook: read output_tokens_details.reasoning_tokens
  ExtractHook->>OTel_Span: set_attribute(GEN_AI_USAGE_REASONING_TOKENS, reasoning)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested reviewers

  • galzilber
  • nina-kollman
  • doronkopit5
  • max-deygin-traceloop
  • netanel-tl

Poem

🐰 I nibble on spans in the quiet night,
I find cached tokens tucked out of sight,
I count the reasoning, I hop and I trace,
Now spans glow with truth in their metadata space,
A tiny hop brings token counts to place.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check ✅ Passed The PR fully implements the primary objectives from #4060: extracting cached_tokens from usage.input_tokens_details and reasoning_tokens from output_tokens_details, plus updating the dependency floor to >=0.62b1 to access upstream cache token constants.
Out of Scope Changes check ✅ Passed All changes are directly scoped to #4060: hook implementation updates for cache/reasoning tokens, dependency bump for upstream constants, tests for the new functionality, and no unrelated modifications.
Title check ✅ Passed The pull request title directly and clearly reflects the main changes: adding emission of cache_read.input_tokens and reasoning_tokens attributes in the openai-agents instrumentation.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch dr/feat(semconv)-align-reasoning-and-cache-token-attributes-with-OTel-GenAI-spec

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (1)
packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py (1)

554-557: ⚡ Quick win

Select the target span by attribute instead of positional index.

Using response_spans[0] is brittle when multiple openai.response spans are emitted; pick the span that actually has the asserted attribute.

More deterministic span selection
-    response_span = response_spans[0]
+    response_span = next(
+        (s for s in response_spans if GenAIAttributes.GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS in s.attributes),
+        None,
+    )
+    assert response_span is not None
-    response_span = response_spans[0]
+    response_span = next(
+        (s for s in response_spans if SpanAttributes.GEN_AI_USAGE_REASONING_OUTPUT_TOKENS in s.attributes),
+        None,
+    )
+    assert response_span is not None

Also applies to: 574-577

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py`
around lines 554 - 557, The test currently picks a span by position
(response_spans[0]) which is brittle; replace that with a deterministic
selection that finds the span by the attribute you assert later (use
response_spans and the specific attribute key used in the assertions). For
example, replace uses of response_spans[0] in the test with code that selects
the span via next(s for s in response_spans if '<ATTRIBUTE_KEY>' in s.attributes
or s.attributes.get('<ATTRIBUTE_KEY>') == '<EXPECTED_VALUE>'), where
<ATTRIBUTE_KEY> is the attribute name asserted later in the test; do this for
both occurrences (around the response_spans usage at the two ranges noted) so
the test picks the span by attribute, not index.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@packages/opentelemetry-instrumentation-openai-agents/tests/cassettes/test_openai_agents/test_cache_read_input_tokens_recorded.yaml`:
- Around line 90-93: Remove all Cookie and set-cookie header values from the
cassette test_openai_agents/test_cache_read_input_tokens_recorded.yaml (and the
other occurrences noted) by redacting those header lines and re-recording the
cassette; ensure the test harness that generates cassettes filters or normalizes
"Cookie" request headers and "set-cookie" response headers before saving (update
the cassette recording/filtering logic to strip or replace these headers with a
stable placeholder) so future recordings do not include session cookies and
re-run the tests to commit the sanitized cassette.

In
`@packages/opentelemetry-instrumentation-openai-agents/tests/cassettes/test_openai_agents/test_reasoning_output_tokens_recorded.yaml`:
- Around line 16-17: The cassette
test_openai_agents/test_reasoning_output_tokens_recorded.yaml contains
sensitive/volatile headers (e.g., Cookie and request/infra identifiers) that
must be filtered from VCR recordings; update the test recording configuration
used for this suite to strip or replace Cookie and any request/infra ID headers
(e.g., CF identifiers, X-Request-Id) during capture, re-record the cassette, and
commit the regenerated YAML so the cassette no longer contains those sensitive
values.

In
`@packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py`:
- Around line 546-547: The test's cache-hit assertion currently uses the
expression ">= 0" which is too weak; locate the assertion that checks for a
cache hit (the line asserting ">= 0") in tests/test_openai_agents.py and change
it to use "> 0" so the second call is enforced as a cache read per the test
docstring; update any variable names involved (e.g., cache_hit_count or similar)
accordingly and run the test to confirm it now fails when there is no cache hit.
- Around line 544-545: The VCR configuration in conftest.py only filters
["authorization", "api-key"] and misses CloudFlare cookies, which are present in
cassettes like test_reasoning_output_tokens_recorded.yaml; update the VCR
config's filter_headers list to include "cookie" (i.e., modify the
filter_headers variable/setting in the VCR setup in conftest.py to add "cookie")
and then re-run the test suite to re-record the affected cassettes so
session/tracking cookies are scrubbed from the YAML files.

In
`@packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py`:
- Line 82: Add a deprecated alias for the renamed constant so existing imports
still work: in opentelemetry.semconv_ai.__init__.py (next to
GEN_AI_USAGE_REASONING_OUTPUT_TOKENS) define the old symbol
SpanAttributes.GEN_AI_USAGE_REASONING_TOKENS (or simply
GEN_AI_USAGE_REASONING_TOKENS if constants are module-level) and assign it the
same string value as GEN_AI_USAGE_REASONING_OUTPUT_TOKENS, include a brief
comment noting it is deprecated and optionally emit a DeprecationWarning when
accessed to guide users to the new constant name.

---

Nitpick comments:
In
`@packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py`:
- Around line 554-557: The test currently picks a span by position
(response_spans[0]) which is brittle; replace that with a deterministic
selection that finds the span by the attribute you assert later (use
response_spans and the specific attribute key used in the assertions). For
example, replace uses of response_spans[0] in the test with code that selects
the span via next(s for s in response_spans if '<ATTRIBUTE_KEY>' in s.attributes
or s.attributes.get('<ATTRIBUTE_KEY>') == '<EXPECTED_VALUE>'), where
<ATTRIBUTE_KEY> is the attribute name asserted later in the test; do this for
both occurrences (around the response_spans usage at the two ranges noted) so
the test picks the span by attribute, not index.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 21e8e29b-0fa9-4aff-8aaf-b153db91fba1

📥 Commits

Reviewing files that changed from the base of the PR and between 6d3e696 and 5306120.

⛔ Files ignored due to path filters (28)
  • packages/opentelemetry-instrumentation-agno/uv.lock is excluded by !**/*.lock
  • packages/opentelemetry-instrumentation-anthropic/uv.lock is excluded by !**/*.lock
  • packages/opentelemetry-instrumentation-bedrock/uv.lock is excluded by !**/*.lock
  • packages/opentelemetry-instrumentation-chromadb/uv.lock is excluded by !**/*.lock
  • packages/opentelemetry-instrumentation-cohere/uv.lock is excluded by !**/*.lock
  • packages/opentelemetry-instrumentation-crewai/uv.lock is excluded by !**/*.lock
  • packages/opentelemetry-instrumentation-groq/uv.lock is excluded by !**/*.lock
  • packages/opentelemetry-instrumentation-haystack/uv.lock is excluded by !**/*.lock
  • packages/opentelemetry-instrumentation-lancedb/uv.lock is excluded by !**/*.lock
  • packages/opentelemetry-instrumentation-marqo/uv.lock is excluded by !**/*.lock
  • packages/opentelemetry-instrumentation-mcp/uv.lock is excluded by !**/*.lock
  • packages/opentelemetry-instrumentation-milvus/uv.lock is excluded by !**/*.lock
  • packages/opentelemetry-instrumentation-mistralai/uv.lock is excluded by !**/*.lock
  • packages/opentelemetry-instrumentation-ollama/uv.lock is excluded by !**/*.lock
  • packages/opentelemetry-instrumentation-openai-agents/uv.lock is excluded by !**/*.lock
  • packages/opentelemetry-instrumentation-openai/uv.lock is excluded by !**/*.lock
  • packages/opentelemetry-instrumentation-pinecone/uv.lock is excluded by !**/*.lock
  • packages/opentelemetry-instrumentation-qdrant/uv.lock is excluded by !**/*.lock
  • packages/opentelemetry-instrumentation-replicate/uv.lock is excluded by !**/*.lock
  • packages/opentelemetry-instrumentation-sagemaker/uv.lock is excluded by !**/*.lock
  • packages/opentelemetry-instrumentation-together/uv.lock is excluded by !**/*.lock
  • packages/opentelemetry-instrumentation-transformers/uv.lock is excluded by !**/*.lock
  • packages/opentelemetry-instrumentation-vertexai/uv.lock is excluded by !**/*.lock
  • packages/opentelemetry-instrumentation-voyageai/uv.lock is excluded by !**/*.lock
  • packages/opentelemetry-instrumentation-watsonx/uv.lock is excluded by !**/*.lock
  • packages/opentelemetry-instrumentation-writer/uv.lock is excluded by !**/*.lock
  • packages/sample-app/uv.lock is excluded by !**/*.lock
  • packages/traceloop-sdk/uv.lock is excluded by !**/*.lock
📒 Files selected for processing (13)
  • packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py
  • packages/opentelemetry-instrumentation-openai-agents/pyproject.toml
  • packages/opentelemetry-instrumentation-openai-agents/tests/cassettes/test_openai_agents/test_cache_read_input_tokens_recorded.yaml
  • packages/opentelemetry-instrumentation-openai-agents/tests/cassettes/test_openai_agents/test_reasoning_output_tokens_recorded.yaml
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_azure.py
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py
  • packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py
  • packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/_testing.py
  • packages/traceloop-sdk/tests/test_manual.py
  • packages/traceloop-sdk/traceloop/sdk/tracing/manual.py

Comment on lines +16 to +17
Cookie:
- __cf_bm=GqvMTTwxqvHlYO_zMGr15vhngiHBKowc4W9cytloB1U-1778513729.5012145-1.0.1.1-TuWmsWma1J3OWFenDizlhbf_5qCYt89CjxNLLDDm.zLzVxhm2ep.2mEgG6rExy5xHOM0Quwrvd09cIn.yYTv.TO2jCa5bit4sQJ6ZIeHevWLiejQyIXZYqEUyXQItPzY
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Scrub sensitive/volatile headers from cassette recordings.

Cookie and request/infra identifiers should not be committed in cassettes; they are sensitive/ephemeral and make fixtures less stable. Please filter these headers during VCR recording and regenerate this cassette.

Also applies to: 60-64, 104-105

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@packages/opentelemetry-instrumentation-openai-agents/tests/cassettes/test_openai_agents/test_reasoning_output_tokens_recorded.yaml`
around lines 16 - 17, The cassette
test_openai_agents/test_reasoning_output_tokens_recorded.yaml contains
sensitive/volatile headers (e.g., Cookie and request/infra identifiers) that
must be filtered from VCR recordings; update the test recording configuration
used for this suite to strip or replace Cookie and any request/infra ID headers
(e.g., CF identifiers, X-Request-Id) during capture, re-record the cassette, and
commit the regenerated YAML so the cassette no longer contains those sensitive
values.

Comment on lines +544 to +545
@pytest.mark.vcr
def test_cache_read_input_tokens_recorded(exporter, test_agent):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Locate the new cassettes referenced by these tests
fd -a 'test_(cache_read_input_tokens_recorded|reasoning_output_tokens_recorded)\.yaml' \
  packages/opentelemetry-instrumentation-openai-agents/tests/cassettes

# Scan cassettes for common secret/PII indicators (review any hits; placeholders/masked values are acceptable)
rg -n --iglob '*.yaml' -e 'Authorization|Bearer\s+[A-Za-z0-9\-_\.]+|api[_-]?key|sk-[A-Za-z0-9]|@|email' \
  packages/opentelemetry-instrumentation-openai-agents/tests/cassettes

Repository: traceloop/openllmetry

Length of output: 365


🏁 Script executed:

#!/bin/bash
# Check the cassette files for any sensitive content
cd packages/opentelemetry-instrumentation-openai-agents/tests/cassettes/test_openai_agents

# Show file sizes and examine key sections of the cassettes
for cassette in test_cache_read_input_tokens_recorded.yaml test_reasoning_output_tokens_recorded.yaml; do
  if [ -f "$cassette" ]; then
    echo "=== $cassette ==="
    wc -l "$cassette"
    echo "--- First 50 lines ---"
    head -50 "$cassette"
    echo "--- Looking for potential secrets (detailed check) ---"
    rg -n 'sk_|Bearer|Authorization|api.?key|token|secret|password' "$cassette" || echo "No matches found"
  fi
done

Repository: traceloop/openllmetry

Length of output: 6564


🏁 Script executed:

#!/bin/bash
# Check the test file to see if VCR filters are configured
cd packages/opentelemetry-instrumentation-openai-agents/tests

# Search for VCR decorator and filter configuration around the test methods
rg -B5 -A10 'def test_(cache_read_input_tokens_recorded|reasoning_output_tokens_recorded)' test_openai_agents.py

Repository: traceloop/openllmetry

Length of output: 1274


🏁 Script executed:

#!/bin/bash
# Look for VCR configuration in conftest files
fd conftest.py packages/opentelemetry-instrumentation-openai-agents/tests | head -5

# Check the conftest for VCR filter setup
if [ -f packages/opentelemetry-instrumentation-openai-agents/tests/conftest.py ]; then
  echo "=== conftest.py content ==="
  cat -n packages/opentelemetry-instrumentation-openai-agents/tests/conftest.py
fi

Repository: traceloop/openllmetry

Length of output: 11337


🏁 Script executed:

#!/bin/bash
# Also check for pytest.ini or setup.cfg with VCR config
find packages/opentelemetry-instrumentation-openai-agents/tests -maxdepth 1 \( -name "pytest.ini" -o -name "setup.cfg" -o -name "pyproject.toml" \) -exec echo "=== {} ===" \; -exec cat {} \;

Repository: traceloop/openllmetry

Length of output: 47


VCR cassettes need improved filtering configuration and re-recording.

The conftest.py VCR config (lines 284–286) only filters ["authorization", "api-key"] headers, but the cassettes contain unfiltered CloudFlare bot management cookies (e.g., __cf_bm=... in test_reasoning_output_tokens_recorded.yaml). Add "cookie" to the filter_headers list and re-record the cassettes to ensure all session/tracking data is properly scrubbed.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py`
around lines 544 - 545, The VCR configuration in conftest.py only filters
["authorization", "api-key"] and misses CloudFlare cookies, which are present in
cassettes like test_reasoning_output_tokens_recorded.yaml; update the VCR
config's filter_headers list to include "cookie" (i.e., modify the
filter_headers variable/setting in the VCR setup in conftest.py to add "cookie")
and then re-run the test suite to re-record the affected cassettes so
session/tracking cookies are scrubbed from the YAML files.

Comment thread packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py Outdated
GEN_AI_CONTENT_COMPLETION_CHUNK = "gen_ai.content.completion.chunk"
GEN_AI_REQUEST_REASONING_EFFORT = "gen_ai.request.reasoning_effort"
GEN_AI_USAGE_REASONING_TOKENS = "gen_ai.usage.reasoning_tokens"
GEN_AI_USAGE_REASONING_OUTPUT_TOKENS = "gen_ai.usage.reasoning.output_tokens"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Preserve backward compatibility for the renamed reasoning constant.

Dropping SpanAttributes.GEN_AI_USAGE_REASONING_TOKENS can break consumers importing that symbol. Keep a deprecated alias pointing to the new key to avoid runtime import breaks.

Proposed compatibility patch
 class SpanAttributes:
@@
     GEN_AI_USAGE_REASONING_OUTPUT_TOKENS = "gen_ai.usage.reasoning.output_tokens"
+    # Backward-compatible alias (deprecated): remove in next major version
+    GEN_AI_USAGE_REASONING_TOKENS = GEN_AI_USAGE_REASONING_OUTPUT_TOKENS
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
GEN_AI_USAGE_REASONING_OUTPUT_TOKENS = "gen_ai.usage.reasoning.output_tokens"
GEN_AI_USAGE_REASONING_OUTPUT_TOKENS = "gen_ai.usage.reasoning.output_tokens"
# Backward-compatible alias (deprecated): remove in next major version
GEN_AI_USAGE_REASONING_TOKENS = GEN_AI_USAGE_REASONING_OUTPUT_TOKENS
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py`
at line 82, Add a deprecated alias for the renamed constant so existing imports
still work: in opentelemetry.semconv_ai.__init__.py (next to
GEN_AI_USAGE_REASONING_OUTPUT_TOKENS) define the old symbol
SpanAttributes.GEN_AI_USAGE_REASONING_TOKENS (or simply
GEN_AI_USAGE_REASONING_TOKENS if constants are module-level) and assign it the
same string value as GEN_AI_USAGE_REASONING_OUTPUT_TOKENS, include a brief
comment noting it is deprecated and optionally emit a DeprecationWarning when
accessed to guide users to the new constant name.

@doronkopit5
Copy link
Copy Markdown
Member

Reviewed the change — direction is right (aligning attribute names with the OTel GenAI spec). A few things worth addressing:

1. Two sources of truth for the cache constants

The same constants exist in both places, resolving to the same string today:

  • opentelemetry.semconv._incubating.attributes.gen_ai_attributes.GEN_AI_USAGE_CACHE_*_INPUT_TOKENS (upstream OTel)
  • opentelemetry.semconv_ai.SpanAttributes.GEN_AI_USAGE_CACHE_*_INPUT_TOKENS (this repo)

The PR mixes both: _hooks.py and manual.py use GenAIAttributes (upstream), while pre-existing code in chat_wrappers.py, responses_wrappers.py, anthropic, langchain uses SpanAttributes from semconv_ai. Both work, but it's nicer to have one source of truth.

I'd lean toward standardizing on the upstream OTel-native GenAIAttributes here — that's the spec source — but happy to leave the call to you. Whatever you pick, please apply it consistently across the touched files so callers don't have to know which path leads where.

2. traceloop-sdk needs an explicit upstream semconv floor

manual.py now uses GenAIAttributes.GEN_AI_USAGE_CACHE_*_INPUT_TOKENS, but packages/traceloop-sdk/pyproject.toml doesn't pin opentelemetry-semantic-conventions. With <0.62b1 installed transitively, that attribute either doesn't exist (→ AttributeError) or has the old underscore string (silently wrong telemetry).

Please add opentelemetry-semantic-conventions>=0.62b1 to packages/traceloop-sdk/pyproject.toml (same bump you did for openai-agents).

3. Removing GEN_AI_USAGE_REASONING_TOKENS is a breaking change with no alias

Deleting the class attribute from SpanAttributes will break any downstream code importing SpanAttributes.GEN_AI_USAGE_REASONING_TOKENS with AttributeError. The repo already has prior art for graceful renames (see the "Legacy LLM_* aliases" block in semconv_ai/__init__.py). Suggest keeping a deprecation alias:

GEN_AI_USAGE_REASONING_OUTPUT_TOKENS = "gen_ai.usage.reasoning.output_tokens"
# Deprecated alias; use GEN_AI_USAGE_REASONING_OUTPUT_TOKENS
GEN_AI_USAGE_REASONING_TOKENS = GEN_AI_USAGE_REASONING_OUTPUT_TOKENS

Also note that the string value changes too (gen_ai.usage.reasoning_tokensgen_ai.usage.reasoning.output_tokens), which is a wire-level break for dashboards. Worth a line in MIGRATION.md under the "Changed string values" section.

5. New test assertions don't actually verify the feature

assert response_span.attributes[GenAIAttributes.GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS] >= 0
assert response_span.attributes[SpanAttributes.GEN_AI_USAGE_REASONING_OUTPUT_TOKENS] >= 0

Both >= 0 assertions are trivially true once the attribute is recorded. The cache test is intentionally structured (first call writes cache → clear() → second call reads) precisely to exercise caching — so it should assert > 0 to prove the second call actually got a cache hit. Same for o3-mini reasoning tokens, which per the docstring always reports > 0. Tightening to > 0 makes these tests actually catch regressions.

@dvirski
Copy link
Copy Markdown
Contributor Author

dvirski commented May 17, 2026

@doronkopit5

Changes in response to review feedback

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py`:
- Around line 591-595: The current code sets the
GenAIAttributes.GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS attribute even when
cached_tokens == 0; change the guard so you only set the attribute when
cached_tokens > 0. Locate the block in _hooks.py where cached_tokens =
getattr(input_details, "cached_tokens", None) and replace the if cached_tokens
is not None check with an explicit positive check (e.g., cached_tokens is not
None and cached_tokens > 0) before calling otel_span.set_attribute to ensure
only real cache hits are recorded.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: da513cee-a70f-4b55-983f-b65d5991a3f9

📥 Commits

Reviewing files that changed from the base of the PR and between 5306120 and 895aa3c.

⛔ Files ignored due to path filters (1)
  • packages/traceloop-sdk/uv.lock is excluded by !**/*.lock
📒 Files selected for processing (2)
  • packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py

Comment on lines +591 to +595
cached_tokens = getattr(input_details, "cached_tokens", None)
if cached_tokens is not None:
otel_span.set_attribute(
GenAIAttributes.GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS, cached_tokens
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Only set cache-read usage on actual cache hits.

Line 592 currently records cache-read tokens even when cached_tokens == 0. Gate this to > 0 to match cache-hit semantics and avoid noisy attributes.

Suggested fix
-            if cached_tokens is not None:
+            if cached_tokens is not None and cached_tokens > 0:
                 otel_span.set_attribute(
                     GenAIAttributes.GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS, cached_tokens
                 )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py`
around lines 591 - 595, The current code sets the
GenAIAttributes.GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS attribute even when
cached_tokens == 0; change the guard so you only set the attribute when
cached_tokens > 0. Locate the block in _hooks.py where cached_tokens =
getattr(input_details, "cached_tokens", None) and replace the if cached_tokens
is not None check with an explicit positive check (e.g., cached_tokens is not
None and cached_tokens > 0) before calling otel_span.set_attribute to ensure
only real cache hits are recorded.

@dvirski dvirski force-pushed the dr/feat(semconv)-align-reasoning-and-cache-token-attributes-with-OTel-GenAI-spec branch from 895aa3c to 8d827c4 Compare May 17, 2026 16:36
@dvirski dvirski force-pushed the dr/feat(semconv)-align-reasoning-and-cache-token-attributes-with-OTel-GenAI-spec branch from 8d827c4 to d77ffe4 Compare May 17, 2026 18:26
@doronkopit5
Copy link
Copy Markdown
Member

Re-reviewed after the latest push (86e2d74 + d77ffe4e). Major improvements — happy with the direction:

✅ Resolved

  • Scope tightened: PR is now 82 lines of code change in 3 files (excluding lock-file churn), focused squarely on 🐛 Bug Report: openai-agents instrumentation doesn't record cached input tokens #4060. The manual.py change, chat_wrappers/responses_wrappers rename, and SpanAttributes rename are all out — that addresses my chore: nx migration #2, feat: Pinecone Instrumentation #3, and feat: sdk code + openai instrumentation #4 cleanly.
  • Tests are now meaningful: SimpleNamespace fakes with exact value assertions (1024, 256) are much better than the >= 0 VCR tests. The docstring linking the fake shape to the SDK source, the live cassette, and the OpenAI docs is nice — easy to keep in sync. Also drops the cassettes with Cookie / set-cookie headers, which is a separate win.
  • Cache fix using upstream constant: GenAIAttributes.GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS is the right pick — matches the OTel-native preference and the >=0.62b1 pyproject floor backs it correctly (verified: that constant only exists in upstream from 0.62b1 onward, with the dot-form string).

⚠️ Worth a small follow-up

  1. PR title and description are now stale. The title still says "align reasoning and cache token attributes with OTel GenAI spec" and the description still claims "Added GEN_AI_USAGE_REASONING_OUTPUT_TOKENS …" — but the reasoning rename was reverted. After this PR, _hooks.py will emit gen_ai.usage.reasoning_tokens (the non-spec string from SpanAttributes.GEN_AI_USAGE_REASONING_TOKENS), not gen_ai.usage.reasoning.output_tokens. Please retitle to something like fix(openai-agents): record cached input + reasoning tokens (#4060) and trim the description to match.

  2. Reasoning attribute string is still non-spec — for awareness, not blocking this PR. The OTel GenAI spec calls for gen_ai.usage.reasoning.output_tokens, and the upstream 0.62b1 package doesn't yet define a constant for it (verified). Once the semconv_ai constant is corrected (or the upstream catches up), _hooks.py will pick up the right string automatically since it uses the named constant. Worth tracking as a follow-up issue so it's not forgotten.

  3. Tests don't exercise the None-guard paths — minor. Both input_tokens_details is None and cached_tokens is None short-circuits go untested. Two extra one-liners would lock in the guard behavior:

    def test_extract_response_attributes_no_cache_when_details_missing():
        response = _make_fake_response()  # both None
        span = MagicMock()
        _extract_response_attributes(span, response, trace_content=False)
        for call in span.set_attribute.call_args_list:
            assert call.args[0] != GenAIAttributes.GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS
            assert call.args[0] != SpanAttributes.GEN_AI_USAGE_REASONING_TOKENS

    Not blocking — the existing tests cover the positive paths well.

LGTM once the title/description are updated. Nice cleanup.

Copy link
Copy Markdown
Member

@doronkopit5 doronkopit5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just fix the last comment form code rabbit about the non zero attribute

@dvirski dvirski changed the title fix(openai): align reasoning and cache token attributes with OTel GenAI spec fix(openai-agents): emit cache_read.input_tokens and reasoning_tokens May 17, 2026
@dvirski
Copy link
Copy Markdown
Contributor Author

dvirski commented May 18, 2026

just fix the last comment form code rabbit about the non zero attribute

@doronkopit5

isn't zero cache token also meaningful? like it doesn't mean "no cache hit".

Adding a check summary from otel spec:

Spec check (semconv reference (https://opentelemetry.io/docs/specs/semconv/registry/attributes/gen-ai/)): gen_ai.usage.cache_read.input_tokens is classified as Recommended with no "when applicable" qualifier — the same level as gen_ai.usage.input_tokens and gen_ai.usage.output_tokens. The spec
doesn't suggest omitting it when zero. Suppressing 0 would make "no cache hit" indistinguishable from "instrumentor doesn't capture it," and break aggregations like cache-hit-rate. Matching how the existing openai instrumentor emits this attribute is also the safer choice for trace consistency.
Keeping is not None as the guard.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

🐛 Bug Report: openai-agents instrumentation doesn't record cached input tokens

3 participants