fix(langchain,anthropic,groq,mistralai,bedrock,ollama,sagemaker,together): record exceptions and set ERROR status on failed spans#4101
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds explicit span error handling across instrumentations: import ERROR_TYPE, set error.type on exceptions, record_exception, set StatusCode.ERROR with message, ensure spans end on stream failures; tests added/updated to assert a single errored span, error.type, and one exception event. ChangesCross‑package span error instrumentation and tests
Estimated code review effort: Possibly related PRs:
Suggested reviewers:
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
packages/opentelemetry-instrumentation-sagemaker/opentelemetry/instrumentation/sagemaker/__init__.py (1)
114-129:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftStreaming endpoint invocations lack error handling.
The
_instrumented_endpoint_invoke_with_response_streamfunction does not implement the same error handling pattern as the non-streaming version. Iffn(*args, **kwargs)raises an exception on line 122:
- The exception will not be recorded on the span
- The span status will not be set to ERROR
- The span will remain open (never ended)
This creates inconsistent error visibility between streaming and non-streaming SageMaker invocations.
🔧 Proposed fix to add error handling to streaming invocations
def _instrumented_endpoint_invoke_with_response_stream(fn, tracer, event_logger): `@wraps`(fn) def with_instrumentation(*args, **kwargs): if context_api.get_value(SUPPRESS_LANGUAGE_MODEL_INSTRUMENTATION_KEY): return fn(*args, **kwargs) span = tracer.start_span("sagemaker.completion", kind=SpanKind.CLIENT) - response = fn(*args, **kwargs) + try: + response = fn(*args, **kwargs) + except Exception as e: + span.record_exception(e) + span.set_status(Status(StatusCode.ERROR, str(e))) + span.end() + raise if span.is_recording(): _handle_stream_call(span, event_logger, kwargs, response) return response return with_instrumentation🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/opentelemetry-instrumentation-sagemaker/opentelemetry/instrumentation/sagemaker/__init__.py` around lines 114 - 129, The streaming wrapper _instrumented_endpoint_invoke_with_response_stream should mirror the non-streaming error handling: wrap the call to fn(*args, **kwargs) in a try/except/finally inside with_instrumentation (located in _instrumented_endpoint_invoke_with_response_stream) so that any exception is recorded on the span, the span status is set to ERROR, and the span is ended; on success still call _handle_stream_call when span.is_recording() and end the span in finally. Ensure you still respect context_api.get_value(SUPPRESS_LANGUAGE_MODEL_INSTRUMENTATION_KEY) and use the provided tracer/span created via tracer.start_span("sagemaker.completion", kind=SpanKind.CLIENT).
🧹 Nitpick comments (2)
packages/opentelemetry-instrumentation-sagemaker/tests/test_error_handling.py (1)
15-30: ⚡ Quick winConsider adding a test for streaming endpoint error handling.
Once error handling is implemented for
invoke_endpoint_with_response_stream(see the issue flagged in__init__.py), add a corresponding test to ensure streaming invocations also properly record exceptions and set span status to ERROR.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/opentelemetry-instrumentation-sagemaker/tests/test_error_handling.py` around lines 15 - 30, Add a parallel test for streaming invocations that mirrors test_sagemaker_invoke_error_sets_span_status: create a new test (e.g., test_sagemaker_invoke_stream_error_sets_span_status) that uses the same fixtures (instrument_legacy, smrt, span_exporter), patch botocore.endpoint.URLLib3Session.send to raise the same ClientError, call smrt.invoke_endpoint_with_response_stream with the same parameters and consume/iterate the response (to force the error to surface), catch/expect ClientError with pytest.raises, then fetch spans from span_exporter and assert there is one finished span with span.status.status_code == StatusCode.ERROR and an "exception" event whose attributes["exception.message"] contains "ModelError".packages/opentelemetry-instrumentation-bedrock/tests/traces/test_error_handling.py (1)
28-30: ⚡ Quick winAdd
error.typeassertion to lock in cross-instrumentation parityThese tests verify status/events but not the
error.typeattribute. Please assert it in both tests so Bedrock enforces the same failure contract as the other instrumentations in this PR.Suggested assertion
assert span.status.status_code == StatusCode.ERROR + assert span.attributes.get("error.type") is not None events = [e for e in span.events if e.name == "exception"] assert len(events) == 1Also applies to: 44-46
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/opentelemetry-instrumentation-bedrock/tests/traces/test_error_handling.py` around lines 28 - 30, The tests currently assert span.status and that there is one "exception" event but omit checking the event's error type; update both tests to assert that the exception event's attributes include the correct error type by checking events[0].attributes.get("error.type") equals the raised exception's class name (e.g., type(err).__name__ or the literal class name used in the test), using the existing span and events variables so the assertion mirrors other instrumentations' failure contract.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.py`:
- Around line 225-230: The span exception handlers record exceptions and set
Status but omit the error.type attribute; update each exception block (the
try/except around the wrapped call to fn that uses span.record_exception and
span.set_status) to also set the span attribute error.type to the exception
class name (e.g., span.set_attribute("error.type", type(e).__name__)), and apply
this same change to the other three exception blocks referenced (the handlers at
the other wrapper sites).
In
`@packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py`:
- Around line 1065-1070: The error handling path in _handle_error (used by
on_llm_error, on_chain_error, on_agent_error, on_retriever_error) sets status
and records the exception but doesn't set the SpanAttributes.ERROR_TYPE like
on_tool_error does; update _handle_error (and those non-tool error callbacks if
they delegate elsewhere) to call _set_span_attribute(span,
SpanAttributes.ERROR_TYPE, type(error).__name__) (or similar using
error.__class__.__name__) before setting status/recording exception so the span
always contains error.type for all error callbacks.
In
`@packages/opentelemetry-instrumentation-ollama/opentelemetry/instrumentation/ollama/__init__.py`:
- Around line 314-318: In both exception handlers that call
span.record_exception(e), span.set_status(Status(StatusCode.ERROR, str(e))), and
span.end() (i.e., the except Exception as e blocks where span.record_exception
is used), add a span attribute for the error type before ending the span — e.g.,
call span.set_attribute("error.type", type(e).__name__) (or str(type(e))) right
after recording the exception and before span.end(); make the same change in the
second exception path that mirrors these calls so both paths consistently set
"error.type".
In `@packages/opentelemetry-instrumentation-ollama/tests/test_error_handling.py`:
- Around line 39-40: The test currently only asserts that an "exception" event
exists (events = [e for e in span.events if e.name == "exception"]; assert
len(events) == 1) but doesn't verify the exception payload; update the test to
also assert that events[0].attributes["exception.message"] equals the same
expected exception message used in the corresponding sync test so async
exception message capture is covered (locate the span/events usage in this test
and add the assertion against events[0].attributes["exception.message"] with the
shared expected string).
In
`@packages/opentelemetry-instrumentation-together/opentelemetry/instrumentation/together/__init__.py`:
- Around line 127-129: The exception handler in the together instrumentation
uses span.record_exception(e) and span.set_status(...) but does not set the
error.type span attribute; update the except block handling the exception (the
code around span.record_exception, span.set_status, and the exception variable
e) to also call span.set_attribute(ERROR_TYPE, e.__class__.__name__) (or
span.set_attribute("error.type", e.__class__.__name__) if ERROR_TYPE constant is
not available) so the canonical exception class name is recorded per
OpenTelemetry semantic conventions for GenAI error spans.
In
`@packages/opentelemetry-instrumentation-together/tests/test_error_handling.py`:
- Line 8: The test currently patches requests.Session.request but Together uses
httpx; update the patch target in tests/test_error_handling.py to patch
"httpx.Client.request" (or the exact httpx method Together calls) so the HTTP
call is intercepted. Replace the context manager that uses
patch("requests.Session.request", side_effect=...) with
patch("httpx.Client.request", side_effect=...) and run the test to confirm the
exception path in your error handling (adjust any import/mocking scope if the
test uses a different httpx client instance).
---
Outside diff comments:
In
`@packages/opentelemetry-instrumentation-sagemaker/opentelemetry/instrumentation/sagemaker/__init__.py`:
- Around line 114-129: The streaming wrapper
_instrumented_endpoint_invoke_with_response_stream should mirror the
non-streaming error handling: wrap the call to fn(*args, **kwargs) in a
try/except/finally inside with_instrumentation (located in
_instrumented_endpoint_invoke_with_response_stream) so that any exception is
recorded on the span, the span status is set to ERROR, and the span is ended; on
success still call _handle_stream_call when span.is_recording() and end the span
in finally. Ensure you still respect
context_api.get_value(SUPPRESS_LANGUAGE_MODEL_INSTRUMENTATION_KEY) and use the
provided tracer/span created via tracer.start_span("sagemaker.completion",
kind=SpanKind.CLIENT).
---
Nitpick comments:
In
`@packages/opentelemetry-instrumentation-bedrock/tests/traces/test_error_handling.py`:
- Around line 28-30: The tests currently assert span.status and that there is
one "exception" event but omit checking the event's error type; update both
tests to assert that the exception event's attributes include the correct error
type by checking events[0].attributes.get("error.type") equals the raised
exception's class name (e.g., type(err).__name__ or the literal class name used
in the test), using the existing span and events variables so the assertion
mirrors other instrumentations' failure contract.
In
`@packages/opentelemetry-instrumentation-sagemaker/tests/test_error_handling.py`:
- Around line 15-30: Add a parallel test for streaming invocations that mirrors
test_sagemaker_invoke_error_sets_span_status: create a new test (e.g.,
test_sagemaker_invoke_stream_error_sets_span_status) that uses the same fixtures
(instrument_legacy, smrt, span_exporter), patch
botocore.endpoint.URLLib3Session.send to raise the same ClientError, call
smrt.invoke_endpoint_with_response_stream with the same parameters and
consume/iterate the response (to force the error to surface), catch/expect
ClientError with pytest.raises, then fetch spans from span_exporter and assert
there is one finished span with span.status.status_code == StatusCode.ERROR and
an "exception" event whose attributes["exception.message"] contains
"ModelError".
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 89211e6f-d815-4d3e-862e-0d1da3b3562b
⛔ Files ignored due to path filters (10)
packages/opentelemetry-instrumentation-anthropic/uv.lockis excluded by!**/*.lockpackages/opentelemetry-instrumentation-bedrock/uv.lockis excluded by!**/*.lockpackages/opentelemetry-instrumentation-groq/uv.lockis excluded by!**/*.lockpackages/opentelemetry-instrumentation-mistralai/uv.lockis excluded by!**/*.lockpackages/opentelemetry-instrumentation-ollama/uv.lockis excluded by!**/*.lockpackages/opentelemetry-instrumentation-openai-agents/uv.lockis excluded by!**/*.lockpackages/opentelemetry-instrumentation-sagemaker/uv.lockis excluded by!**/*.lockpackages/opentelemetry-instrumentation-together/uv.lockis excluded by!**/*.lockpackages/sample-app/uv.lockis excluded by!**/*.lockpackages/traceloop-sdk/uv.lockis excluded by!**/*.lock
📒 Files selected for processing (17)
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.pypackages/opentelemetry-instrumentation-anthropic/tests/test_messages.pypackages/opentelemetry-instrumentation-anthropic/tests/test_prompt_caching.pypackages/opentelemetry-instrumentation-anthropic/tests/test_thinking.pypackages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.pypackages/opentelemetry-instrumentation-bedrock/tests/traces/test_error_handling.pypackages/opentelemetry-instrumentation-groq/opentelemetry/instrumentation/groq/__init__.pypackages/opentelemetry-instrumentation-groq/tests/traces/test_error_handling.pypackages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.pypackages/opentelemetry-instrumentation-mistralai/opentelemetry/instrumentation/mistralai/__init__.pypackages/opentelemetry-instrumentation-mistralai/tests/test_error_handling.pypackages/opentelemetry-instrumentation-ollama/opentelemetry/instrumentation/ollama/__init__.pypackages/opentelemetry-instrumentation-ollama/tests/test_error_handling.pypackages/opentelemetry-instrumentation-sagemaker/opentelemetry/instrumentation/sagemaker/__init__.pypackages/opentelemetry-instrumentation-sagemaker/tests/test_error_handling.pypackages/opentelemetry-instrumentation-together/opentelemetry/instrumentation/together/__init__.pypackages/opentelemetry-instrumentation-together/tests/test_error_handling.py
|
|
||
|
|
||
| def test_together_error_sets_span_status(instrument_legacy, together_client, span_exporter): | ||
| with patch("requests.Session.request", side_effect=Exception("API connection error")): |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check what HTTP client the Together library uses
# First, find Together library installation to inspect its dependencies
python -c "import together; print(together.__file__)"
# Search for HTTP client imports in Together library
rg -n 'import (requests|httpx|aiohttp|urllib)' --type py -g 'together/**/*.py' 2>/dev/null || \
echo "Together package not found in current repository; checking installed package..."
# Check Together's dependencies
pip show together | grep -i "requires"Repository: traceloop/openllmetry
Length of output: 325
🌐 Web query:
Together AI Python library HTTP client implementation requests or httpx
💡 Result:
Together’s Python SDK uses httpx as its HTTP client backend (not requests). [1][2] Evidence: - The Together Python library documentation states that it “offers both synchronous and asynchronous clients powered by httpx,” and that the async client uses httpx by default, with aiohttp as an optional alternative backend. [2] - The Python v2 migration guide explicitly notes: “Modern HTTP Client: Uses httpx instead of requests.” [1] - The SDK v2 announcement likewise says it “uses httpx instead of requests.” [3] Implication for implementation: - If you use the Together SDK (e.g., from together import Together), you can typically pass an httpx.Client(...) (via the http_client parameter) or rely on the SDK’s default httpx-based client. [1][2]
Citations:
- 1: https://docs.together.ai/docs/pythonv2-migration-guide
- 2: https://docs.together.ai/python-library
- 3: https://www.together.ai/blog/together-python-sdk-2-0
Update patch target to use httpx.Client.request instead of requests.Session.request.
The Together library uses httpx as its HTTP client backend, not requests. Patching requests.Session.request won't intercept the HTTP call, so this test won't actually validate error handling. Change the patch target to httpx.Client.request or the appropriate httpx method that Together uses internally.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packages/opentelemetry-instrumentation-together/tests/test_error_handling.py`
at line 8, The test currently patches requests.Session.request but Together uses
httpx; update the patch target in tests/test_error_handling.py to patch
"httpx.Client.request" (or the exact httpx method Together calls) so the HTTP
call is intercepted. Replace the context manager that uses
patch("requests.Session.request", side_effect=...) with
patch("httpx.Client.request", side_effect=...) and run the test to confirm the
exception path in your error handling (adjust any import/mocking scope if the
test uses a different httpx client instance).
There was a problem hiding this comment.
this is a false positive for the version under test. This instrumentation pins together >= 1.2.0, < 2 (see _instruments in init.py and the together>=1.2.0,<2 entry in pyproject.toml), and together v1.x uses requests.Session() as its HTTP backend (see
▎ together/abstract/api_requestor.py in the installed package: import requests and requests.Session() in _make_session). The httpx switch you cited landed in together v2.0, which is excluded from our supported range here. The patch target is correct, and the test passes because requests.Session.request
▎ does intercept the real HTTP call. Happy to revisit if/when this instrumentation adds v2 support.
There was a problem hiding this comment.
@dvirski, you're absolutely right — thank you for the detailed explanation. I missed that this instrumentation explicitly pins together >= 1.2.0, < 2, and v1.x uses requests.Session() as its HTTP backend. The web search I relied on returned v2.x documentation, which doesn't apply here. patch("requests.Session.request", ...) is the correct target for the supported version range, and I apologize for the noise.
✏️ Learnings added
Learnt from: dvirski
Repo: traceloop/openllmetry PR: 4101
File: packages/opentelemetry-instrumentation-together/tests/test_error_handling.py:8-8
Timestamp: 2026-05-14T13:27:45.655Z
Learning: In the traceloop/openllmetry repository, the `opentelemetry-instrumentation-together` package pins `together >= 1.2.0, < 2`. Together v1.x uses `requests.Session()` as its HTTP backend (via `together/abstract/api_requestor.py`). The switch to `httpx` only landed in together v2.0. Therefore, test patches targeting `requests.Session.request` are correct and should not be flagged as wrong patch targets for this instrumentation.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
🧠 Learnings used
Learnt from: banlor
Repo: traceloop/openllmetry PR: 4089
File: packages/opentelemetry-instrumentation-dspy/opentelemetry/instrumentation/dspy/instrumentation.py:185-192
Timestamp: 2026-05-10T21:59:30.384Z
Learning: In this repository (traceloop/openllmetry), do not flag DSPy/OpenTelemetry instrumentation that handles GenAI client exceptions using `span.record_exception()` as a violation of GenAI semantic conventions. The newer GenAI-specific event `gen_ai.client.operation.exception` is intentionally not used yet (still in Development), and a repo-wide migration is deferred to a future PR. Only treat `record_exception()` as problematic if it is clearly unintentional for GenAI client spans (i.e., not part of this repo’s intentionally supported exception-handling pattern).
|
Thanks for consolidating #3970 and #4005 — the langchain 1.
|
| Package | record_exception | set_status | ERROR_TYPE |
|---|---|---|---|
| anthropic | ✅ | ✅ | ✅ |
| groq | ✅ | ✅ | ✅ |
| mistralai | ✅ | ✅ | ✅ |
| bedrock | ✅ | ✅ | ❌ |
| ollama | ✅ | ✅ | ❌ |
| sagemaker | ✅ | ✅ | ❌ |
| together | ✅ | ✅ | ❌ |
langchain _handle_error |
✅ | ✅ | ❌ |
error.type is the canonical attribute backends use to bucket failures per the OTel GenAI semconv. The tests already reflect this drift — groq/mistralai assert it, bedrock/ollama/sagemaker/together don't. Please add span.set_attribute(ERROR_TYPE, e.__class__.__name__) everywhere (and to _handle_error in the langchain callback handler).
2. Sagemaker streaming wrapper has no error handling
Only _instrumented_endpoint_invoke got the try/except — _instrumented_endpoint_invoke_with_response_stream is untouched. Errors from streaming invocations will still produce UNSET-status spans, so issue #412 is only partly closed for sagemaker. Bedrock's streaming wrappers correctly got the same treatment; sagemaker's should too.
3. Together test patches requests.Session.request while everything else patches httpx.Client.send
Together SDK ≥1.2 does use requests, so this is probably correct — but please confirm by running the test locally and verifying it actually fails without the instrumentation change (a green test against an unpatched path is a silent regression risk). If it does exercise the right path, leave it; otherwise switch to httpx.
4. raise e vs bare raise inconsistency
raise e: anthropic, groqraise: mistralai, ollama, sagemaker, bedrock, together
Bare raise is the idiomatic re-raise and preserves the original traceback. Recommend raise everywhere.
5. Groq's is_recording() guard is only partial
span.set_attribute(ERROR_TYPE, e.__class__.__name__) # unguarded
span.record_exception(e) # unguarded
if span.is_recording():
span.set_status(Status(StatusCode.ERROR, str(e))) # guarded
span.end()Either guard all three or none. OpenAI's reference (the pattern this PR claims to follow) guards none — recommend dropping the guard for symmetry with the rest of the PR.
6. Anthropic test churn — defensive filter added to ~25 unrelated tests
spans = [s for s in span_exporter.get_finished_spans() if s.status.status_code != StatusCode.ERROR]This pattern is repeated across test_messages.py, test_prompt_caching.py, test_thinking.py. The reason it's necessary: many of those tests have try: ... except Exception: pass blocks that previously hid failures silently, and now produce error spans the assertions trip over. The filter works but masks the real question — are those tests actually exercising the success path they claim? Two ways to make this less smelly:
- Extract a
successful_spans(exporter)helper to DRY the filter - Better: fix the tests that are silently swallowing exceptions, so they don't need a filter at all
Re: the goal of "simple and robust" — items 1, 2, and 4 are the ones that matter most. Once they're addressed, the pattern across all 8 packages becomes a single rule: on exception → set ERROR_TYPE, record_exception, set_status(ERROR, msg), end span, bare raise. That's the version that's easy to reason about and easy to extend to the next instrumentation.
cbac7ec to
88f021b
Compare
|
@doronkopit5 Addressed all PR comments. Production code — error handling
Tests — new dedicated error-handling coverage Tests — assertion additions Tests — anthropic cleanup (the big one)
Why the anthropic test cleanup was needed Doron flagged that the original PR added a defensive filter ([s for s in spans if s.status.status_code != StatusCode.ERROR]) in 65 places across the anthropic test files. The filter was bridging a try: The original intent was a smoke test — "verify our instrumentation doesn't crash when the SDK rejects bad input." But on inspection it didn't actually test that, because:
The cleanup removes both the cause and the symptom: the bad-call blocks are deleted (they verified nothing), so no error spans get produced, so no filter is needed, and the legacy len(spans) == N verify_metrics was also touched (item 19) because it had grown a coupling to the bad-call pattern: it required the LLM_ANTHROPIC_COMPLETION_EXCEPTIONS metric to have been incremented and had a Verified but not changed
Total: 51 bad-kwarg blocks removed, 51 defensive filters removed, ~440 net lines deleted in test cleanup, plus the production fixes and new error-span coverage. |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.py (1)
258-263:⚠️ Potential issue | 🟠 Major | ⚡ Quick winSync streaming exceptions still don’t mark the span as failed.
On Line 258–Line 263, the sync stream path only increments exception metrics and re-raises. It still misses the failed-span updates (
record_exception,StatusCode.ERROR, and ending the span), so stream errors can remain underreported compared to the async path.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.py` around lines 258 - 263, The sync streaming exception handler currently only updates metrics and re-raises, so update the except block to mirror the async path by recording the exception on the span, setting the span status to StatusCode.ERROR, and ending the span before re-raising; specifically, in the except Exception as e handler where error_metrics_attributes(e) and self._exception_counter are used, call self._span.record_exception(e), set self._span.set_status(StatusCode.ERROR) (import StatusCode if needed), then call self._span.end() and only then re-raise the exception so the span reflects the failure like the async path does.
🧹 Nitpick comments (2)
packages/opentelemetry-instrumentation-bedrock/tests/traces/test_error_handling.py (1)
28-31: ⚡ Quick winAssert the span status description to lock in
set_status(ERROR, message)behavior.These tests currently verify
StatusCode.ERRORbut not the error message/description. Adding that assertion would catch regressions where status description gets dropped (the same failure mode this PR addresses elsewhere).Suggested test assertion additions
assert span.status.status_code == StatusCode.ERROR + assert "Rate exceeded" in (span.status.description or "") assert span.attributes.get("error.type") == "ClientError" events = [e for e in span.events if e.name == "exception"] assert len(events) == 1 @@ assert span.status.status_code == StatusCode.ERROR + assert "Rate exceeded" in (span.status.description or "") assert span.attributes.get("error.type") == "ClientError" events = [e for e in span.events if e.name == "exception"] assert len(events) == 1Also applies to: 45-48
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/opentelemetry-instrumentation-bedrock/tests/traces/test_error_handling.py` around lines 28 - 31, Add an assertion that the span's status description matches the error message used when calling set_status(ERROR, message): after asserting span.status.status_code == StatusCode.ERROR and before checking exception events, assert span.status.description == error_message (or the literal message used in the test) for the first span block (span variable) and do the same for the second block around lines 45-48 so the tests lock in that the status description is preserved.packages/opentelemetry-instrumentation-mistralai/tests/test_error_handling.py (1)
7-43: ⚡ Quick winAdd stream/stream_async failure regression tests for parity with wrapped methods.
These tests cover
complete/complete_asynconly. Since instrumentation also wraps streaming methods, add failure-path assertions forchat.streamandchat.stream_asyncto prevent regressions in stream exception handling.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/opentelemetry-instrumentation-mistralai/tests/test_error_handling.py` around lines 7 - 43, Add two new tests mirroring the existing complete/complete_async ones but exercising the streaming wrappers: create test_mistralai_stream_error_sets_span_status that patches "httpx.Client.send" to raise Exception("API connection error"), calls mistralai_client.chat.stream(...) inside pytest.raises, then asserts one finished span with StatusCode.ERROR, span.status.description contains the error message, span.attributes["error.type"] == "Exception", and that there is one exception event whose "exception.message" contains the error; likewise add test_async_mistralai_stream_error_sets_span_status that patches "httpx.AsyncClient.send" to raise Exception("Async API error"), awaits mistralai_async_client.chat.stream_async(...) inside pytest.raises, and performs the same span and event assertions for the async case.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@packages/opentelemetry-instrumentation-mistralai/opentelemetry/instrumentation/mistralai/__init__.py`:
- Around line 418-425: The streaming accumulator functions
_accumulate_streaming_response and _aaccumulate_streaming_response currently
iterate without handling exceptions, so exceptions during iteration skip
span.set_attribute(ERROR_TYPE,...), span.record_exception(...),
span.set_status(Status(...)), and span.end(); update both functions to wrap
their for/async for loops in a try-except-else-finally: in except Exception as e
set span.set_attribute(ERROR_TYPE, e.__class__.__name__),
span.record_exception(e), span.set_status(Status(StatusCode.ERROR, str(e))) and
re-raise, in else call the existing _handle_response() logic, and in finally
always call span.end() to ensure the span is finalized regardless of success or
error.
---
Outside diff comments:
In
`@packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.py`:
- Around line 258-263: The sync streaming exception handler currently only
updates metrics and re-raises, so update the except block to mirror the async
path by recording the exception on the span, setting the span status to
StatusCode.ERROR, and ending the span before re-raising; specifically, in the
except Exception as e handler where error_metrics_attributes(e) and
self._exception_counter are used, call self._span.record_exception(e), set
self._span.set_status(StatusCode.ERROR) (import StatusCode if needed), then call
self._span.end() and only then re-raise the exception so the span reflects the
failure like the async path does.
---
Nitpick comments:
In
`@packages/opentelemetry-instrumentation-bedrock/tests/traces/test_error_handling.py`:
- Around line 28-31: Add an assertion that the span's status description matches
the error message used when calling set_status(ERROR, message): after asserting
span.status.status_code == StatusCode.ERROR and before checking exception
events, assert span.status.description == error_message (or the literal message
used in the test) for the first span block (span variable) and do the same for
the second block around lines 45-48 so the tests lock in that the status
description is preserved.
In
`@packages/opentelemetry-instrumentation-mistralai/tests/test_error_handling.py`:
- Around line 7-43: Add two new tests mirroring the existing
complete/complete_async ones but exercising the streaming wrappers: create
test_mistralai_stream_error_sets_span_status that patches "httpx.Client.send" to
raise Exception("API connection error"), calls mistralai_client.chat.stream(...)
inside pytest.raises, then asserts one finished span with StatusCode.ERROR,
span.status.description contains the error message,
span.attributes["error.type"] == "Exception", and that there is one exception
event whose "exception.message" contains the error; likewise add
test_async_mistralai_stream_error_sets_span_status that patches
"httpx.AsyncClient.send" to raise Exception("Async API error"), awaits
mistralai_async_client.chat.stream_async(...) inside pytest.raises, and performs
the same span and event assertions for the async case.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 719527e3-e290-4d92-a2fd-27d595c6551e
⛔ Files ignored due to path filters (7)
packages/opentelemetry-instrumentation-bedrock/uv.lockis excluded by!**/*.lockpackages/opentelemetry-instrumentation-groq/uv.lockis excluded by!**/*.lockpackages/opentelemetry-instrumentation-mistralai/uv.lockis excluded by!**/*.lockpackages/opentelemetry-instrumentation-ollama/uv.lockis excluded by!**/*.lockpackages/opentelemetry-instrumentation-openai-agents/uv.lockis excluded by!**/*.lockpackages/opentelemetry-instrumentation-sagemaker/uv.lockis excluded by!**/*.lockpackages/opentelemetry-instrumentation-together/uv.lockis excluded by!**/*.lock
📒 Files selected for processing (20)
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.pypackages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.pypackages/opentelemetry-instrumentation-anthropic/tests/test_completion.pypackages/opentelemetry-instrumentation-anthropic/tests/test_messages.pypackages/opentelemetry-instrumentation-anthropic/tests/test_prompt_caching.pypackages/opentelemetry-instrumentation-anthropic/tests/test_thinking.pypackages/opentelemetry-instrumentation-anthropic/tests/utils.pypackages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.pypackages/opentelemetry-instrumentation-bedrock/tests/traces/test_error_handling.pypackages/opentelemetry-instrumentation-groq/opentelemetry/instrumentation/groq/__init__.pypackages/opentelemetry-instrumentation-groq/tests/traces/test_error_handling.pypackages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.pypackages/opentelemetry-instrumentation-mistralai/opentelemetry/instrumentation/mistralai/__init__.pypackages/opentelemetry-instrumentation-mistralai/tests/test_error_handling.pypackages/opentelemetry-instrumentation-ollama/opentelemetry/instrumentation/ollama/__init__.pypackages/opentelemetry-instrumentation-ollama/tests/test_error_handling.pypackages/opentelemetry-instrumentation-sagemaker/opentelemetry/instrumentation/sagemaker/__init__.pypackages/opentelemetry-instrumentation-sagemaker/tests/test_error_handling.pypackages/opentelemetry-instrumentation-together/opentelemetry/instrumentation/together/__init__.pypackages/opentelemetry-instrumentation-together/tests/test_error_handling.py
💤 Files with no reviewable changes (5)
- packages/opentelemetry-instrumentation-anthropic/tests/test_completion.py
- packages/opentelemetry-instrumentation-anthropic/tests/utils.py
- packages/opentelemetry-instrumentation-anthropic/tests/test_thinking.py
- packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py
- packages/opentelemetry-instrumentation-anthropic/tests/test_prompt_caching.py
🚧 Files skipped from review as they are similar to previous changes (8)
- packages/opentelemetry-instrumentation-ollama/opentelemetry/instrumentation/ollama/init.py
- packages/opentelemetry-instrumentation-groq/tests/traces/test_error_handling.py
- packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py
- packages/opentelemetry-instrumentation-together/tests/test_error_handling.py
- packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/init.py
- packages/opentelemetry-instrumentation-groq/opentelemetry/instrumentation/groq/init.py
- packages/opentelemetry-instrumentation-together/opentelemetry/instrumentation/together/init.py
- packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/init.py
88f021b to
9b8fa3d
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@packages/opentelemetry-instrumentation-mistralai/tests/test_error_handling.py`:
- Around line 41-43: The test currently only checks that one exception event was
recorded; also assert that the exception payload includes the expected async
message by checking events[0].attributes["exception.message"] contains "Async
API error". Update the test in tests/test_error_handling.py to add an assertion
after the existing length check that verifies
events[0].attributes["exception.message"] includes the substring "Async API
error" so it matches the sync regression test parity.
In
`@packages/opentelemetry-instrumentation-ollama/opentelemetry/instrumentation/ollama/__init__.py`:
- Around line 145-156: The code computes response_data from last_response and
then merges it with accumulated_response using the dict-union operator, which
raises TypeError when last_response is None; update the logic in the block that
sets response_data (and in the other symmetric spot where the same merge occurs)
to coerce None to an empty dict (e.g., response_data = response_data or {})
before calling _handle_response so that response=response_data |
accumulated_response never attempts to union None with a dict; look for the
variable names last_response, response_data, accumulated_response and the
_handle_response(...) call to apply the fix.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 5f64a663-344d-4a42-ae5b-818ba77f1247
📒 Files selected for processing (20)
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.pypackages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.pypackages/opentelemetry-instrumentation-anthropic/tests/test_completion.pypackages/opentelemetry-instrumentation-anthropic/tests/test_messages.pypackages/opentelemetry-instrumentation-anthropic/tests/test_prompt_caching.pypackages/opentelemetry-instrumentation-anthropic/tests/test_thinking.pypackages/opentelemetry-instrumentation-anthropic/tests/utils.pypackages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.pypackages/opentelemetry-instrumentation-bedrock/tests/traces/test_error_handling.pypackages/opentelemetry-instrumentation-groq/opentelemetry/instrumentation/groq/__init__.pypackages/opentelemetry-instrumentation-groq/tests/traces/test_error_handling.pypackages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.pypackages/opentelemetry-instrumentation-mistralai/opentelemetry/instrumentation/mistralai/__init__.pypackages/opentelemetry-instrumentation-mistralai/tests/test_error_handling.pypackages/opentelemetry-instrumentation-ollama/opentelemetry/instrumentation/ollama/__init__.pypackages/opentelemetry-instrumentation-ollama/tests/test_error_handling.pypackages/opentelemetry-instrumentation-sagemaker/opentelemetry/instrumentation/sagemaker/__init__.pypackages/opentelemetry-instrumentation-sagemaker/tests/test_error_handling.pypackages/opentelemetry-instrumentation-together/opentelemetry/instrumentation/together/__init__.pypackages/opentelemetry-instrumentation-together/tests/test_error_handling.py
💤 Files with no reviewable changes (5)
- packages/opentelemetry-instrumentation-anthropic/tests/utils.py
- packages/opentelemetry-instrumentation-anthropic/tests/test_completion.py
- packages/opentelemetry-instrumentation-anthropic/tests/test_prompt_caching.py
- packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py
- packages/opentelemetry-instrumentation-anthropic/tests/test_thinking.py
🚧 Files skipped from review as they are similar to previous changes (8)
- packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.py
- packages/opentelemetry-instrumentation-together/tests/test_error_handling.py
- packages/opentelemetry-instrumentation-sagemaker/opentelemetry/instrumentation/sagemaker/init.py
- packages/opentelemetry-instrumentation-sagemaker/tests/test_error_handling.py
- packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/init.py
- packages/opentelemetry-instrumentation-together/opentelemetry/instrumentation/together/init.py
- packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/init.py
- packages/opentelemetry-instrumentation-bedrock/tests/traces/test_error_handling.py
| if self._exception_counter: | ||
| self._exception_counter.add(1, attributes=attributes) | ||
| raise e | ||
| raise |
There was a problem hiding this comment.
Why this change ?
Isnt is safer to rase the exception itself ?
There was a problem hiding this comment.
@nina-kollman
We need bare raise here so the logged exception exact real line will show (if for example it fails in the providers sdk code), and not in our instrumentation wrapper.
safety wise they are identical. e is just a name bound to the exception that's already active. Bare raise re-raises that same exception.
this follows Doron suggestion
- raise e vs bare raise inconsistency
raise e: anthropic, groq
raise: mistralai, ollama, sagemaker, bedrock, together
Bare raise is the idiomatic re-raise and preserves the original traceback. Recommend raise everywhere.
| metric_params.exception_counter.add(1, attributes=attributes) | ||
|
|
||
| raise e | ||
| raise |
…her): record exceptions and set ERROR status on failed spans
…alai, groq, and ollama
e9e5af5 to
6408ee9
Compare
Fixes #412.
Applies consistent error handling pattern for:
langchain, anthropic, groq, mistralai, bedrock, ollama, sagemaker, together.
record_exception + set_status(ERROR) + ERROR_TYPE attribute on all
instrumented call sites. Also fixes pre-existing bug in LangChain where
set_status() was called with wrong argument order, silently dropping the
error description from spans. Adds regression tests for each package.
Supersedes PRs #3970 and #4005.
Summary by CodeRabbit
Bug Fixes
Tests