fix(langchain,anthropic,groq,mistralai,bedrock,ollama,sagemaker,together): record exceptions and set ERROR status on failed spans by dvirski · Pull Request #4101 · traceloop/openllmetry

dvirski · 2026-05-10T12:49:57Z

Fixes #412.
Applies consistent error handling pattern for:
langchain, anthropic, groq, mistralai, bedrock, ollama, sagemaker, together.
record_exception + set_status(ERROR) + ERROR_TYPE attribute on all
instrumented call sites. Also fixes pre-existing bug in LangChain where
set_status() was called with wrong argument order, silently dropping the
error description from spans. Adds regression tests for each package.
Supersedes PRs #3970 and #4005.

Summary by CodeRabbit

Bug Fixes
- Instrumentations now consistently record exception type and message on spans, set span status to error, preserve proper exception propagation, and ensure streaming spans are finalized on failures.
Tests
- Added coverage asserting spans include error details and are finished on failure; removed silent exception-workarounds and updated assertions to match explicit error handling.

CLAassistant · 2026-05-10T12:50:03Z

All committers have signed the CLA.

coderabbitai · 2026-05-10T12:50:13Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds explicit span error handling across instrumentations: import ERROR_TYPE, set error.type on exceptions, record_exception, set StatusCode.ERROR with message, ensure spans end on stream failures; tests added/updated to assert a single errored span, error.type, and one exception event.

Changes

Cross‑package span error instrumentation and tests

Layer / File(s)	Summary
Wrapper imports and non‑streaming handling `packages/opentelemetry-instrumentation-/opentelemetry/instrumentation//__init__.py`, `packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py`	Imports of `ERROR_TYPE`/`Status`/`StatusCode` and updates to non-streaming wrappers to set `error.type`, call `span.record_exception(e)`, set `StatusCode.ERROR` with exception message, end spans, and re-raise.
Streaming iteration error handling & finalization `packages//opentelemetry/instrumentation//streaming`, `packages//opentelemetry/instrumentation//__init__.py`	Streaming processors refactored to `try/except/else/finally` around iteration to set `error.type`, record exceptions, set `StatusCode.ERROR`, ensure `span.end()`, and preserve tracebacks on re-raise.
New/updated tests asserting errored spans `packages//tests/test_error_handling.py`, `packages/opentelemetry-instrumentation-groq/tests/traces/`, `packages/opentelemetry-instrumentation-ollama/tests/`	Adds tests that patch transports/clients or construct failing generators to assert exactly one finished span with `StatusCode.ERROR`, `error.type`, and one `"exception"` event (and status.description contains the message).
Anthropic tests cleanup and metrics helper `packages/opentelemetry-instrumentation-anthropic/tests/*`, `packages/opentelemetry-instrumentation-anthropic/tests/utils.py`	Removes redundant exception-swallowing `unknown_parameter` setup calls across Anthropic tests and removes the assertion that an exception metric was present in `verify_metrics`.

Estimated code review effort:
🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs:

traceloop/openllmetry#4135: Modifies Bedrock span wrapper logic for the same invoke_model*/converse* entry points and error/stream handling.

Suggested reviewers:

galzilber
doronkopit5
max-deygin-traceloop
netanel-tl
nina-kollman

"I nibble through traces with whiskers bright,
When errors leap out in the dead of night.
I stamp 'ERROR' on spans with a hop and a wink,
Record exceptions, then quietly blink.
Hooray — observability's carrot is in sight!" 🐰

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 19.05% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly summarizes the main change: adding consistent exception recording and ERROR status setting across multiple instrumentation packages.
Linked Issues check	✅ Passed	All code changes implement the core requirement from `#412`: instrumentations now record exceptions, set ERROR_TYPE attribute, set span status to ERROR, and re-raise exceptions consistently across all affected packages.
Out of Scope Changes check	✅ Passed	All changes align with PR objectives. Test removals of try/except blocks and defensive filters were needed to eliminate patterns that masked exceptions. No unrelated changes detected.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch dr/fix(langchain,anthropic,groq,mistralai,bedrock,ollama,sagemaker,together)-record-exceptions-and-set-ERROR-status-on-failed-spans

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

packages/opentelemetry-instrumentation-sagemaker/opentelemetry/instrumentation/sagemaker/__init__.py (1)

114-129: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Streaming endpoint invocations lack error handling.

The _instrumented_endpoint_invoke_with_response_stream function does not implement the same error handling pattern as the non-streaming version. If fn(*args, **kwargs) raises an exception on line 122:

The exception will not be recorded on the span
The span status will not be set to ERROR
The span will remain open (never ended)

This creates inconsistent error visibility between streaming and non-streaming SageMaker invocations.

🔧 Proposed fix to add error handling to streaming invocations

 def _instrumented_endpoint_invoke_with_response_stream(fn, tracer, event_logger):
     `@wraps`(fn)
     def with_instrumentation(*args, **kwargs):
         if context_api.get_value(SUPPRESS_LANGUAGE_MODEL_INSTRUMENTATION_KEY):
             return fn(*args, **kwargs)
 
         span = tracer.start_span("sagemaker.completion", kind=SpanKind.CLIENT)
 
-        response = fn(*args, **kwargs)
+        try:
+            response = fn(*args, **kwargs)
+        except Exception as e:
+            span.record_exception(e)
+            span.set_status(Status(StatusCode.ERROR, str(e)))
+            span.end()
+            raise
 
         if span.is_recording():
             _handle_stream_call(span, event_logger, kwargs, response)
 
         return response
 
     return with_instrumentation

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@packages/opentelemetry-instrumentation-sagemaker/opentelemetry/instrumentation/sagemaker/__init__.py`
around lines 114 - 129, The streaming wrapper
_instrumented_endpoint_invoke_with_response_stream should mirror the
non-streaming error handling: wrap the call to fn(*args, **kwargs) in a
try/except/finally inside with_instrumentation (located in
_instrumented_endpoint_invoke_with_response_stream) so that any exception is
recorded on the span, the span status is set to ERROR, and the span is ended; on
success still call _handle_stream_call when span.is_recording() and end the span
in finally. Ensure you still respect
context_api.get_value(SUPPRESS_LANGUAGE_MODEL_INSTRUMENTATION_KEY) and use the
provided tracer/span created via tracer.start_span("sagemaker.completion",
kind=SpanKind.CLIENT).

🧹 Nitpick comments (2)

packages/opentelemetry-instrumentation-sagemaker/tests/test_error_handling.py (1)

15-30: ⚡ Quick win

Consider adding a test for streaming endpoint error handling.

Once error handling is implemented for invoke_endpoint_with_response_stream (see the issue flagged in __init__.py), add a corresponding test to ensure streaming invocations also properly record exceptions and set span status to ERROR.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@packages/opentelemetry-instrumentation-sagemaker/tests/test_error_handling.py`
around lines 15 - 30, Add a parallel test for streaming invocations that mirrors
test_sagemaker_invoke_error_sets_span_status: create a new test (e.g.,
test_sagemaker_invoke_stream_error_sets_span_status) that uses the same fixtures
(instrument_legacy, smrt, span_exporter), patch
botocore.endpoint.URLLib3Session.send to raise the same ClientError, call
smrt.invoke_endpoint_with_response_stream with the same parameters and
consume/iterate the response (to force the error to surface), catch/expect
ClientError with pytest.raises, then fetch spans from span_exporter and assert
there is one finished span with span.status.status_code == StatusCode.ERROR and
an "exception" event whose attributes["exception.message"] contains
"ModelError".

packages/opentelemetry-instrumentation-bedrock/tests/traces/test_error_handling.py (1)

28-30: ⚡ Quick win

Add error.type assertion to lock in cross-instrumentation parity

These tests verify status/events but not the error.type attribute. Please assert it in both tests so Bedrock enforces the same failure contract as the other instrumentations in this PR.

Suggested assertion

     assert span.status.status_code == StatusCode.ERROR
+    assert span.attributes.get("error.type") is not None
     events = [e for e in span.events if e.name == "exception"]
     assert len(events) == 1

Also applies to: 44-46

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@packages/opentelemetry-instrumentation-bedrock/tests/traces/test_error_handling.py`
around lines 28 - 30, The tests currently assert span.status and that there is
one "exception" event but omit checking the event's error type; update both
tests to assert that the exception event's attributes include the correct error
type by checking events[0].attributes.get("error.type") equals the raised
exception's class name (e.g., type(err).__name__ or the literal class name used
in the test), using the existing span and events variables so the assertion
mirrors other instrumentations' failure contract.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.py`:
- Around line 225-230: The span exception handlers record exceptions and set
Status but omit the error.type attribute; update each exception block (the
try/except around the wrapped call to fn that uses span.record_exception and
span.set_status) to also set the span attribute error.type to the exception
class name (e.g., span.set_attribute("error.type", type(e).__name__)), and apply
this same change to the other three exception blocks referenced (the handlers at
the other wrapper sites).

In
`@packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py`:
- Around line 1065-1070: The error handling path in _handle_error (used by
on_llm_error, on_chain_error, on_agent_error, on_retriever_error) sets status
and records the exception but doesn't set the SpanAttributes.ERROR_TYPE like
on_tool_error does; update _handle_error (and those non-tool error callbacks if
they delegate elsewhere) to call _set_span_attribute(span,
SpanAttributes.ERROR_TYPE, type(error).__name__) (or similar using
error.__class__.__name__) before setting status/recording exception so the span
always contains error.type for all error callbacks.

In
`@packages/opentelemetry-instrumentation-ollama/opentelemetry/instrumentation/ollama/__init__.py`:
- Around line 314-318: In both exception handlers that call
span.record_exception(e), span.set_status(Status(StatusCode.ERROR, str(e))), and
span.end() (i.e., the except Exception as e blocks where span.record_exception
is used), add a span attribute for the error type before ending the span — e.g.,
call span.set_attribute("error.type", type(e).__name__) (or str(type(e))) right
after recording the exception and before span.end(); make the same change in the
second exception path that mirrors these calls so both paths consistently set
"error.type".

In `@packages/opentelemetry-instrumentation-ollama/tests/test_error_handling.py`:
- Around line 39-40: The test currently only asserts that an "exception" event
exists (events = [e for e in span.events if e.name == "exception"]; assert
len(events) == 1) but doesn't verify the exception payload; update the test to
also assert that events[0].attributes["exception.message"] equals the same
expected exception message used in the corresponding sync test so async
exception message capture is covered (locate the span/events usage in this test
and add the assertion against events[0].attributes["exception.message"] with the
shared expected string).

In
`@packages/opentelemetry-instrumentation-together/opentelemetry/instrumentation/together/__init__.py`:
- Around line 127-129: The exception handler in the together instrumentation
uses span.record_exception(e) and span.set_status(...) but does not set the
error.type span attribute; update the except block handling the exception (the
code around span.record_exception, span.set_status, and the exception variable
e) to also call span.set_attribute(ERROR_TYPE, e.__class__.__name__) (or
span.set_attribute("error.type", e.__class__.__name__) if ERROR_TYPE constant is
not available) so the canonical exception class name is recorded per
OpenTelemetry semantic conventions for GenAI error spans.

In
`@packages/opentelemetry-instrumentation-together/tests/test_error_handling.py`:
- Line 8: The test currently patches requests.Session.request but Together uses
httpx; update the patch target in tests/test_error_handling.py to patch
"httpx.Client.request" (or the exact httpx method Together calls) so the HTTP
call is intercepted. Replace the context manager that uses
patch("requests.Session.request", side_effect=...) with
patch("httpx.Client.request", side_effect=...) and run the test to confirm the
exception path in your error handling (adjust any import/mocking scope if the
test uses a different httpx client instance).

---

Outside diff comments:
In
`@packages/opentelemetry-instrumentation-sagemaker/opentelemetry/instrumentation/sagemaker/__init__.py`:
- Around line 114-129: The streaming wrapper
_instrumented_endpoint_invoke_with_response_stream should mirror the
non-streaming error handling: wrap the call to fn(*args, **kwargs) in a
try/except/finally inside with_instrumentation (located in
_instrumented_endpoint_invoke_with_response_stream) so that any exception is
recorded on the span, the span status is set to ERROR, and the span is ended; on
success still call _handle_stream_call when span.is_recording() and end the span
in finally. Ensure you still respect
context_api.get_value(SUPPRESS_LANGUAGE_MODEL_INSTRUMENTATION_KEY) and use the
provided tracer/span created via tracer.start_span("sagemaker.completion",
kind=SpanKind.CLIENT).

---

Nitpick comments:
In
`@packages/opentelemetry-instrumentation-bedrock/tests/traces/test_error_handling.py`:
- Around line 28-30: The tests currently assert span.status and that there is
one "exception" event but omit checking the event's error type; update both
tests to assert that the exception event's attributes include the correct error
type by checking events[0].attributes.get("error.type") equals the raised
exception's class name (e.g., type(err).__name__ or the literal class name used
in the test), using the existing span and events variables so the assertion
mirrors other instrumentations' failure contract.

In
`@packages/opentelemetry-instrumentation-sagemaker/tests/test_error_handling.py`:
- Around line 15-30: Add a parallel test for streaming invocations that mirrors
test_sagemaker_invoke_error_sets_span_status: create a new test (e.g.,
test_sagemaker_invoke_stream_error_sets_span_status) that uses the same fixtures
(instrument_legacy, smrt, span_exporter), patch
botocore.endpoint.URLLib3Session.send to raise the same ClientError, call
smrt.invoke_endpoint_with_response_stream with the same parameters and
consume/iterate the response (to force the error to surface), catch/expect
ClientError with pytest.raises, then fetch spans from span_exporter and assert
there is one finished span with span.status.status_code == StatusCode.ERROR and
an "exception" event whose attributes["exception.message"] contains
"ModelError".

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 89211e6f-d815-4d3e-862e-0d1da3b3562b

📥 Commits

Reviewing files that changed from the base of the PR and between 6d3e696 and cbac7ec.

⛔ Files ignored due to path filters (10)

packages/opentelemetry-instrumentation-anthropic/uv.lock is excluded by !**/*.lock
packages/opentelemetry-instrumentation-bedrock/uv.lock is excluded by !**/*.lock
packages/opentelemetry-instrumentation-groq/uv.lock is excluded by !**/*.lock
packages/opentelemetry-instrumentation-mistralai/uv.lock is excluded by !**/*.lock
packages/opentelemetry-instrumentation-ollama/uv.lock is excluded by !**/*.lock
packages/opentelemetry-instrumentation-openai-agents/uv.lock is excluded by !**/*.lock
packages/opentelemetry-instrumentation-sagemaker/uv.lock is excluded by !**/*.lock
packages/opentelemetry-instrumentation-together/uv.lock is excluded by !**/*.lock
packages/sample-app/uv.lock is excluded by !**/*.lock
packages/traceloop-sdk/uv.lock is excluded by !**/*.lock

📒 Files selected for processing (17)

packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py
packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py
packages/opentelemetry-instrumentation-anthropic/tests/test_prompt_caching.py
packages/opentelemetry-instrumentation-anthropic/tests/test_thinking.py
packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.py
packages/opentelemetry-instrumentation-bedrock/tests/traces/test_error_handling.py
packages/opentelemetry-instrumentation-groq/opentelemetry/instrumentation/groq/__init__.py
packages/opentelemetry-instrumentation-groq/tests/traces/test_error_handling.py
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py
packages/opentelemetry-instrumentation-mistralai/opentelemetry/instrumentation/mistralai/__init__.py
packages/opentelemetry-instrumentation-mistralai/tests/test_error_handling.py
packages/opentelemetry-instrumentation-ollama/opentelemetry/instrumentation/ollama/__init__.py
packages/opentelemetry-instrumentation-ollama/tests/test_error_handling.py
packages/opentelemetry-instrumentation-sagemaker/opentelemetry/instrumentation/sagemaker/__init__.py
packages/opentelemetry-instrumentation-sagemaker/tests/test_error_handling.py
packages/opentelemetry-instrumentation-together/opentelemetry/instrumentation/together/__init__.py
packages/opentelemetry-instrumentation-together/tests/test_error_handling.py

coderabbitai · 2026-05-10T12:53:39Z

+
+
+def test_together_error_sets_span_status(instrument_legacy, together_client, span_exporter):
+    with patch("requests.Session.request", side_effect=Exception("API connection error")):


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check what HTTP client the Together library uses # First, find Together library installation to inspect its dependencies python -c "import together; print(together.__file__)" # Search for HTTP client imports in Together library rg -n 'import (requests|httpx|aiohttp|urllib)' --type py -g 'together/**/*.py' 2>/dev/null || \ echo "Together package not found in current repository; checking installed package..." # Check Together's dependencies pip show together | grep -i "requires"

Repository: traceloop/openllmetry

Length of output: 325

🌐 Web query:

Together AI Python library HTTP client implementation requests or httpx

💡 Result:

Together’s Python SDK uses httpx as its HTTP client backend (not requests). [1][2] Evidence: - The Together Python library documentation states that it “offers both synchronous and asynchronous clients powered by httpx,” and that the async client uses httpx by default, with aiohttp as an optional alternative backend. [2] - The Python v2 migration guide explicitly notes: “Modern HTTP Client: Uses httpx instead of requests.” [1] - The SDK v2 announcement likewise says it “uses httpx instead of requests.” [3] Implication for implementation: - If you use the Together SDK (e.g., from together import Together), you can typically pass an httpx.Client(...) (via the http_client parameter) or rely on the SDK’s default httpx-based client. [1][2]

Citations:

1: https://docs.together.ai/docs/pythonv2-migration-guide

2: https://docs.together.ai/python-library

3: https://www.together.ai/blog/together-python-sdk-2-0

Update patch target to use httpx.Client.request instead of requests.Session.request.

The Together library uses httpx as its HTTP client backend, not requests. Patching requests.Session.request won't intercept the HTTP call, so this test won't actually validate error handling. Change the patch target to httpx.Client.request or the appropriate httpx method that Together uses internally.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/opentelemetry-instrumentation-together/tests/test_error_handling.py` at line 8, The test currently patches requests.Session.request but Together uses httpx; update the patch target in tests/test_error_handling.py to patch "httpx.Client.request" (or the exact httpx method Together calls) so the HTTP call is intercepted. Replace the context manager that uses patch("requests.Session.request", side_effect=...) with patch("httpx.Client.request", side_effect=...) and run the test to confirm the exception path in your error handling (adjust any import/mocking scope if the test uses a different httpx client instance).

this is a false positive for the version under test. This instrumentation pins together >= 1.2.0, < 2 (see _instruments in init.py and the together>=1.2.0,<2 entry in pyproject.toml), and together v1.x uses requests.Session() as its HTTP backend (see
▎ together/abstract/api_requestor.py in the installed package: import requests and requests.Session() in _make_session). The httpx switch you cited landed in together v2.0, which is excluded from our supported range here. The patch target is correct, and the test passes because requests.Session.request
▎ does intercept the real HTTP call. Happy to revisit if/when this instrumentation adds v2 support.

@dvirski, you're absolutely right — thank you for the detailed explanation. I missed that this instrumentation explicitly pins together >= 1.2.0, < 2, and v1.x uses requests.Session() as its HTTP backend. The web search I relied on returned v2.x documentation, which doesn't apply here. patch("requests.Session.request", ...) is the correct target for the supported version range, and I apologize for the noise.

✏️ Learnings added

Learnt from: dvirski Repo: traceloop/openllmetry PR: 4101 File: packages/opentelemetry-instrumentation-together/tests/test_error_handling.py:8-8 Timestamp: 2026-05-14T13:27:45.655Z Learning: In the traceloop/openllmetry repository, the `opentelemetry-instrumentation-together` package pins `together >= 1.2.0, < 2`. Together v1.x uses `requests.Session()` as its HTTP backend (via `together/abstract/api_requestor.py`). The switch to `httpx` only landed in together v2.0. Therefore, test patches targeting `requests.Session.request` are correct and should not be flagged as wrong patch targets for this instrumentation.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

🧠 Learnings used

Learnt from: banlor Repo: traceloop/openllmetry PR: 4089 File: packages/opentelemetry-instrumentation-dspy/opentelemetry/instrumentation/dspy/instrumentation.py:185-192 Timestamp: 2026-05-10T21:59:30.384Z Learning: In this repository (traceloop/openllmetry), do not flag DSPy/OpenTelemetry instrumentation that handles GenAI client exceptions using `span.record_exception()` as a violation of GenAI semantic conventions. The newer GenAI-specific event `gen_ai.client.operation.exception` is intentionally not used yet (still in Development), and a repo-wide migration is deferred to a future PR. Only treat `record_exception()` as problematic if it is clearly unintentional for GenAI client spans (i.e., not part of this repo’s intentionally supported exception-handling pattern).

doronkopit5 · 2026-05-11T14:31:27Z

Thanks for consolidating #3970 and #4005 — the langchain set_status() arg-order fix is a great catch. A few things I'd like to see tightened before merge so we end up with one consistent pattern across the 8 packages rather than 8 slightly-different ones:

1. `ERROR_TYPE` is inconsistent across packages

The PR description says "consistent error handling pattern", but only 3 of 8 actually set the semconv error.type attribute:

Package	record_exception	set_status	ERROR_TYPE
anthropic	✅	✅	✅
groq	✅	✅	✅
mistralai	✅	✅	✅
bedrock	✅	✅	❌
ollama	✅	✅	❌
sagemaker	✅	✅	❌
together	✅	✅	❌
langchain `_handle_error`	✅	✅	❌

error.type is the canonical attribute backends use to bucket failures per the OTel GenAI semconv. The tests already reflect this drift — groq/mistralai assert it, bedrock/ollama/sagemaker/together don't. Please add span.set_attribute(ERROR_TYPE, e.__class__.__name__) everywhere (and to _handle_error in the langchain callback handler).

2. Sagemaker streaming wrapper has no error handling

Only _instrumented_endpoint_invoke got the try/except — _instrumented_endpoint_invoke_with_response_stream is untouched. Errors from streaming invocations will still produce UNSET-status spans, so issue #412 is only partly closed for sagemaker. Bedrock's streaming wrappers correctly got the same treatment; sagemaker's should too.

3. Together test patches `requests.Session.request` while everything else patches `httpx.Client.send`

Together SDK ≥1.2 does use requests, so this is probably correct — but please confirm by running the test locally and verifying it actually fails without the instrumentation change (a green test against an unpatched path is a silent regression risk). If it does exercise the right path, leave it; otherwise switch to httpx.

4. `raise e` vs bare `raise` inconsistency

raise e: anthropic, groq
raise: mistralai, ollama, sagemaker, bedrock, together

Bare raise is the idiomatic re-raise and preserves the original traceback. Recommend raise everywhere.

5. Groq's `is_recording()` guard is only partial

span.set_attribute(ERROR_TYPE, e.__class__.__name__)   # unguarded
span.record_exception(e)                                # unguarded
if span.is_recording():
    span.set_status(Status(StatusCode.ERROR, str(e)))   # guarded
span.end()

Either guard all three or none. OpenAI's reference (the pattern this PR claims to follow) guards none — recommend dropping the guard for symmetry with the rest of the PR.

6. Anthropic test churn — defensive filter added to ~25 unrelated tests

spans = [s for s in span_exporter.get_finished_spans() if s.status.status_code != StatusCode.ERROR]

This pattern is repeated across test_messages.py, test_prompt_caching.py, test_thinking.py. The reason it's necessary: many of those tests have try: ... except Exception: pass blocks that previously hid failures silently, and now produce error spans the assertions trip over. The filter works but masks the real question — are those tests actually exercising the success path they claim? Two ways to make this less smelly:

Extract a successful_spans(exporter) helper to DRY the filter
Better: fix the tests that are silently swallowing exceptions, so they don't need a filter at all

Re: the goal of "simple and robust" — items 1, 2, and 4 are the ones that matter most. Once they're addressed, the pattern across all 8 packages becomes a single rule: on exception → set ERROR_TYPE, record_exception, set_status(ERROR, msg), end span, bare raise. That's the version that's easy to reason about and easy to extend to the next instrumentation.

dvirski · 2026-05-14T17:28:04Z

@doronkopit5 Addressed all PR comments.
This is a summary of the changes that have been done:

Production code — error handling

bedrock — added ERROR_TYPE import + set on all 4 exception blocks; changed raise e → raise
ollama — added ERROR_TYPE import + set on both sync/async exception blocks
together — added ERROR_TYPE import + set on exception block
sagemaker — added ERROR_TYPE import + set on non-streaming path; added full error handling (try/except, ERROR status, span.end) to the streaming wrapper
langchain — moved ERROR_TYPE set from on_tool_error into shared _handle_error so all 5 error callbacks set it
groq — removed inconsistent is_recording() guard around set_status in error blocks; raise e → raise
anthropic — restored lost ERROR_TYPE import (was dropped in a merge conflict); raise e → raise in both init.py exception blocks
anthropic streaming — raise e → raise

Tests — new dedicated error-handling coverage
9. anthropic — created tests/test_error_handling.py with sync + async tests asserting ERROR status, error.type, exception event, exception.type
10. langchain — created tests/test_error_handling.py building a real RunnableSequence (prompt | llm), patching httpx to raise, asserting at least one langchain span has ERROR + error.type +
exception event (verified via regression check that it actually exercises _handle_error)

Tests — assertion additions
11. bedrock — added error.type assertions to both existing error tests
12. ollama — added error.type to both tests + exception.message parity assertion to async test
13. sagemaker — added error.type assertion + new streaming-path error test
14. together — added error.type assertion to the existing error test

Tests — anthropic cleanup (the big one)

test_messages.py — removed 24 try/except: pass bad-kwarg blocks + 27 defensive filters + unused StatusCode import
test_prompt_caching.py — removed 12 bad-kwarg blocks + 12 filters + unused import
test_thinking.py — removed 12 bad-kwarg blocks + 12 filters + unused import
test_completion.py — removed 3 bad-kwarg blocks
tests/utils.py — cleaned verify_metrics: removed exception-metric enforcement and the or error.type=="TypeError" tolerance clause

Why the anthropic test cleanup was needed

Doron flagged that the original PR added a defensive filter ([s for s in spans if s.status.status_code != StatusCode.ERROR]) in 65 places across the anthropic test files. The filter was bridging a
side effect of this PR — but the deeper issue was a pre-existing test idiom that had been there for years:

try:
anthropic_client.messages.create(unknown_parameter="unknown")
except Exception:
pass

The original intent was a smoke test — "verify our instrumentation doesn't crash when the SDK rejects bad input." But on inspection it didn't actually test that, because:

except Exception: pass swallows everything — the SDK's expected TypeError, any bug in our wrapper, and the case where nothing raises at all. All three scenarios produced identical test behavior.
Pre-PR, the bad call's span was started but never .end()-ed (the wrapper re-raised before ending it), so it never reached the exporter. The existing tests passed because get_finished_spans()
returned only the success span.
This PR's instrumentation now correctly ends the span and marks it ERROR on the exception path → those previously-invisible spans suddenly become visible to get_finished_spans(), breaking tests
that assumed one call = one finished span. The filter was added as a quick fix to hide them.

The cleanup removes both the cause and the symptom: the bad-call blocks are deleted (they verified nothing), so no error spans get produced, so no filter is needed, and the legacy len(spans) == N
assertions remain accurate without modification. The new dedicated test_error_handling.py (item 9) provides the proper test coverage for the error-span behavior — explicitly asserting on ERROR
status, error.type, and exception events using a meaningful failure mode (mocked HTTP failure), rather than the prior accidental smoke test that swallowed everything.

verify_metrics was also touched (item 19) because it had grown a coupling to the bad-call pattern: it required the LLM_ANTHROPIC_COMPLETION_EXCEPTIONS metric to have been incremented and had a
tolerance clause for error.type=="TypeError" duration data points. Both existed only because every caller happened to make a failing call. Now that the bad calls are gone, those assertions become
dead requirements that would block the cleanup.

Verified but not changed

together test patch target — confirmed requests.Session.request is correct for together<2 (the pinned range); ran a regression check showing the test fails without our instrumentation change

Total: 51 bad-kwarg blocks removed, 51 defensive filters removed, ~440 net lines deleted in test cleanup, plus the production fixes and new error-span coverage.

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.py (1)
258-263: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Sync streaming exceptions still don’t mark the span as failed.

On Line 258–Line 263, the sync stream path only increments exception metrics and re-raises. It still misses the failed-span updates (record_exception, StatusCode.ERROR, and ending the span), so stream errors can remain underreported compared to the async path.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.py`
around lines 258 - 263, The sync streaming exception handler currently only
updates metrics and re-raises, so update the except block to mirror the async
path by recording the exception on the span, setting the span status to
StatusCode.ERROR, and ending the span before re-raising; specifically, in the
except Exception as e handler where error_metrics_attributes(e) and
self._exception_counter are used, call self._span.record_exception(e), set
self._span.set_status(StatusCode.ERROR) (import StatusCode if needed), then call
self._span.end() and only then re-raise the exception so the span reflects the
failure like the async path does.

🧹 Nitpick comments (2)

packages/opentelemetry-instrumentation-bedrock/tests/traces/test_error_handling.py (1)

28-31: ⚡ Quick win

Assert the span status description to lock in set_status(ERROR, message) behavior.

These tests currently verify StatusCode.ERROR but not the error message/description. Adding that assertion would catch regressions where status description gets dropped (the same failure mode this PR addresses elsewhere).

Suggested test assertion additions

     assert span.status.status_code == StatusCode.ERROR
+    assert "Rate exceeded" in (span.status.description or "")
     assert span.attributes.get("error.type") == "ClientError"
     events = [e for e in span.events if e.name == "exception"]
     assert len(events) == 1
@@
     assert span.status.status_code == StatusCode.ERROR
+    assert "Rate exceeded" in (span.status.description or "")
     assert span.attributes.get("error.type") == "ClientError"
     events = [e for e in span.events if e.name == "exception"]
     assert len(events) == 1

Also applies to: 45-48

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@packages/opentelemetry-instrumentation-bedrock/tests/traces/test_error_handling.py`
around lines 28 - 31, Add an assertion that the span's status description
matches the error message used when calling set_status(ERROR, message): after
asserting span.status.status_code == StatusCode.ERROR and before checking
exception events, assert span.status.description == error_message (or the
literal message used in the test) for the first span block (span variable) and
do the same for the second block around lines 45-48 so the tests lock in that
the status description is preserved.

packages/opentelemetry-instrumentation-mistralai/tests/test_error_handling.py (1)

7-43: ⚡ Quick win

Add stream/stream_async failure regression tests for parity with wrapped methods.

These tests cover complete/complete_async only. Since instrumentation also wraps streaming methods, add failure-path assertions for chat.stream and chat.stream_async to prevent regressions in stream exception handling.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@packages/opentelemetry-instrumentation-mistralai/tests/test_error_handling.py`
around lines 7 - 43, Add two new tests mirroring the existing
complete/complete_async ones but exercising the streaming wrappers: create
test_mistralai_stream_error_sets_span_status that patches "httpx.Client.send" to
raise Exception("API connection error"), calls mistralai_client.chat.stream(...)
inside pytest.raises, then asserts one finished span with StatusCode.ERROR,
span.status.description contains the error message,
span.attributes["error.type"] == "Exception", and that there is one exception
event whose "exception.message" contains the error; likewise add
test_async_mistralai_stream_error_sets_span_status that patches
"httpx.AsyncClient.send" to raise Exception("Async API error"), awaits
mistralai_async_client.chat.stream_async(...) inside pytest.raises, and performs
the same span and event assertions for the async case.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@packages/opentelemetry-instrumentation-mistralai/opentelemetry/instrumentation/mistralai/__init__.py`:
- Around line 418-425: The streaming accumulator functions
_accumulate_streaming_response and _aaccumulate_streaming_response currently
iterate without handling exceptions, so exceptions during iteration skip
span.set_attribute(ERROR_TYPE,...), span.record_exception(...),
span.set_status(Status(...)), and span.end(); update both functions to wrap
their for/async for loops in a try-except-else-finally: in except Exception as e
set span.set_attribute(ERROR_TYPE, e.__class__.__name__),
span.record_exception(e), span.set_status(Status(StatusCode.ERROR, str(e))) and
re-raise, in else call the existing _handle_response() logic, and in finally
always call span.end() to ensure the span is finalized regardless of success or
error.

---

Outside diff comments:
In
`@packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.py`:
- Around line 258-263: The sync streaming exception handler currently only
updates metrics and re-raises, so update the except block to mirror the async
path by recording the exception on the span, setting the span status to
StatusCode.ERROR, and ending the span before re-raising; specifically, in the
except Exception as e handler where error_metrics_attributes(e) and
self._exception_counter are used, call self._span.record_exception(e), set
self._span.set_status(StatusCode.ERROR) (import StatusCode if needed), then call
self._span.end() and only then re-raise the exception so the span reflects the
failure like the async path does.

---

Nitpick comments:
In
`@packages/opentelemetry-instrumentation-bedrock/tests/traces/test_error_handling.py`:
- Around line 28-31: Add an assertion that the span's status description matches
the error message used when calling set_status(ERROR, message): after asserting
span.status.status_code == StatusCode.ERROR and before checking exception
events, assert span.status.description == error_message (or the literal message
used in the test) for the first span block (span variable) and do the same for
the second block around lines 45-48 so the tests lock in that the status
description is preserved.

In
`@packages/opentelemetry-instrumentation-mistralai/tests/test_error_handling.py`:
- Around line 7-43: Add two new tests mirroring the existing
complete/complete_async ones but exercising the streaming wrappers: create
test_mistralai_stream_error_sets_span_status that patches "httpx.Client.send" to
raise Exception("API connection error"), calls mistralai_client.chat.stream(...)
inside pytest.raises, then asserts one finished span with StatusCode.ERROR,
span.status.description contains the error message,
span.attributes["error.type"] == "Exception", and that there is one exception
event whose "exception.message" contains the error; likewise add
test_async_mistralai_stream_error_sets_span_status that patches
"httpx.AsyncClient.send" to raise Exception("Async API error"), awaits
mistralai_async_client.chat.stream_async(...) inside pytest.raises, and performs
the same span and event assertions for the async case.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 719527e3-e290-4d92-a2fd-27d595c6551e

📥 Commits

Reviewing files that changed from the base of the PR and between cbac7ec and 88f021b.

⛔ Files ignored due to path filters (7)

packages/opentelemetry-instrumentation-bedrock/uv.lock is excluded by !**/*.lock
packages/opentelemetry-instrumentation-groq/uv.lock is excluded by !**/*.lock
packages/opentelemetry-instrumentation-mistralai/uv.lock is excluded by !**/*.lock
packages/opentelemetry-instrumentation-ollama/uv.lock is excluded by !**/*.lock
packages/opentelemetry-instrumentation-openai-agents/uv.lock is excluded by !**/*.lock
packages/opentelemetry-instrumentation-sagemaker/uv.lock is excluded by !**/*.lock
packages/opentelemetry-instrumentation-together/uv.lock is excluded by !**/*.lock

📒 Files selected for processing (20)

packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.py
packages/opentelemetry-instrumentation-anthropic/tests/test_completion.py
packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py
packages/opentelemetry-instrumentation-anthropic/tests/test_prompt_caching.py
packages/opentelemetry-instrumentation-anthropic/tests/test_thinking.py
packages/opentelemetry-instrumentation-anthropic/tests/utils.py
packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.py
packages/opentelemetry-instrumentation-bedrock/tests/traces/test_error_handling.py
packages/opentelemetry-instrumentation-groq/opentelemetry/instrumentation/groq/__init__.py
packages/opentelemetry-instrumentation-groq/tests/traces/test_error_handling.py
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py
packages/opentelemetry-instrumentation-mistralai/opentelemetry/instrumentation/mistralai/__init__.py
packages/opentelemetry-instrumentation-mistralai/tests/test_error_handling.py
packages/opentelemetry-instrumentation-ollama/opentelemetry/instrumentation/ollama/__init__.py
packages/opentelemetry-instrumentation-ollama/tests/test_error_handling.py
packages/opentelemetry-instrumentation-sagemaker/opentelemetry/instrumentation/sagemaker/__init__.py
packages/opentelemetry-instrumentation-sagemaker/tests/test_error_handling.py
packages/opentelemetry-instrumentation-together/opentelemetry/instrumentation/together/__init__.py
packages/opentelemetry-instrumentation-together/tests/test_error_handling.py

💤 Files with no reviewable changes (5)

packages/opentelemetry-instrumentation-anthropic/tests/test_completion.py
packages/opentelemetry-instrumentation-anthropic/tests/utils.py
packages/opentelemetry-instrumentation-anthropic/tests/test_thinking.py
packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py
packages/opentelemetry-instrumentation-anthropic/tests/test_prompt_caching.py

🚧 Files skipped from review as they are similar to previous changes (8)

packages/opentelemetry-instrumentation-ollama/opentelemetry/instrumentation/ollama/init.py
packages/opentelemetry-instrumentation-groq/tests/traces/test_error_handling.py
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py
packages/opentelemetry-instrumentation-together/tests/test_error_handling.py
packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/init.py
packages/opentelemetry-instrumentation-groq/opentelemetry/instrumentation/groq/init.py
packages/opentelemetry-instrumentation-together/opentelemetry/instrumentation/together/init.py
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/init.py

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@packages/opentelemetry-instrumentation-mistralai/tests/test_error_handling.py`:
- Around line 41-43: The test currently only checks that one exception event was
recorded; also assert that the exception payload includes the expected async
message by checking events[0].attributes["exception.message"] contains "Async
API error". Update the test in tests/test_error_handling.py to add an assertion
after the existing length check that verifies
events[0].attributes["exception.message"] includes the substring "Async API
error" so it matches the sync regression test parity.

In
`@packages/opentelemetry-instrumentation-ollama/opentelemetry/instrumentation/ollama/__init__.py`:
- Around line 145-156: The code computes response_data from last_response and
then merges it with accumulated_response using the dict-union operator, which
raises TypeError when last_response is None; update the logic in the block that
sets response_data (and in the other symmetric spot where the same merge occurs)
to coerce None to an empty dict (e.g., response_data = response_data or {})
before calling _handle_response so that response=response_data |
accumulated_response never attempts to union None with a dict; look for the
variable names last_response, response_data, accumulated_response and the
_handle_response(...) call to apply the fix.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 5f64a663-344d-4a42-ae5b-818ba77f1247

📥 Commits

Reviewing files that changed from the base of the PR and between 88f021b and 9b8fa3d.

📒 Files selected for processing (20)

packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.py
packages/opentelemetry-instrumentation-anthropic/tests/test_completion.py
packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py
packages/opentelemetry-instrumentation-anthropic/tests/test_prompt_caching.py
packages/opentelemetry-instrumentation-anthropic/tests/test_thinking.py
packages/opentelemetry-instrumentation-anthropic/tests/utils.py
packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.py
packages/opentelemetry-instrumentation-bedrock/tests/traces/test_error_handling.py
packages/opentelemetry-instrumentation-groq/opentelemetry/instrumentation/groq/__init__.py
packages/opentelemetry-instrumentation-groq/tests/traces/test_error_handling.py
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py
packages/opentelemetry-instrumentation-mistralai/opentelemetry/instrumentation/mistralai/__init__.py
packages/opentelemetry-instrumentation-mistralai/tests/test_error_handling.py
packages/opentelemetry-instrumentation-ollama/opentelemetry/instrumentation/ollama/__init__.py
packages/opentelemetry-instrumentation-ollama/tests/test_error_handling.py
packages/opentelemetry-instrumentation-sagemaker/opentelemetry/instrumentation/sagemaker/__init__.py
packages/opentelemetry-instrumentation-sagemaker/tests/test_error_handling.py
packages/opentelemetry-instrumentation-together/opentelemetry/instrumentation/together/__init__.py
packages/opentelemetry-instrumentation-together/tests/test_error_handling.py

💤 Files with no reviewable changes (5)

packages/opentelemetry-instrumentation-anthropic/tests/utils.py
packages/opentelemetry-instrumentation-anthropic/tests/test_completion.py
packages/opentelemetry-instrumentation-anthropic/tests/test_prompt_caching.py
packages/opentelemetry-instrumentation-anthropic/tests/test_messages.py
packages/opentelemetry-instrumentation-anthropic/tests/test_thinking.py

🚧 Files skipped from review as they are similar to previous changes (8)

packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.py
packages/opentelemetry-instrumentation-together/tests/test_error_handling.py
packages/opentelemetry-instrumentation-sagemaker/opentelemetry/instrumentation/sagemaker/init.py
packages/opentelemetry-instrumentation-sagemaker/tests/test_error_handling.py
packages/opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/init.py
packages/opentelemetry-instrumentation-together/opentelemetry/instrumentation/together/init.py
packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/init.py
packages/opentelemetry-instrumentation-bedrock/tests/traces/test_error_handling.py

nina-kollman · 2026-05-19T07:22:48Z

            if self._exception_counter:
                self._exception_counter.add(1, attributes=attributes)
-            raise e
+            raise


Why this change ?
Isnt is safer to rase the exception itself ?

@nina-kollman
We need bare raise here so the logged exception exact real line will show (if for example it fails in the providers sdk code), and not in our instrumentation wrapper.

safety wise they are identical. e is just a name bound to the exception that's already active. Bare raise re-raises that same exception.

this follows Doron suggestion

raise e vs bare raise inconsistency
raise e: anthropic, groq
raise: mistralai, ollama, sagemaker, bedrock, together
Bare raise is the idiomatic re-raise and preserves the original traceback. Recommend raise everywhere.

nina-kollman · 2026-05-19T07:23:41Z

                metric_params.exception_counter.add(1, attributes=attributes)

-            raise e
+            raise


…her): record exceptions and set ERROR status on failed spans

…alai, groq, and ollama

dvirski requested review from doronkopit5, galzilber, max-deygin-traceloop, netanel-tl and nina-kollman May 10, 2026 12:49

coderabbitai Bot reviewed May 10, 2026

View reviewed changes

dvirski force-pushed the dr/fix(langchain,anthropic,groq,mistralai,bedrock,ollama,sagemaker,together)-record-exceptions-and-set-ERROR-status-on-failed-spans branch from cbac7ec to 88f021b Compare May 14, 2026 17:27

coderabbitai Bot reviewed May 14, 2026

View reviewed changes

Comment thread .../opentelemetry-instrumentation-mistralai/opentelemetry/instrumentation/mistralai/__init__.py

dvirski force-pushed the dr/fix(langchain,anthropic,groq,mistralai,bedrock,ollama,sagemaker,together)-record-exceptions-and-set-ERROR-status-on-failed-spans branch from 88f021b to 9b8fa3d Compare May 18, 2026 16:06

coderabbitai Bot reviewed May 18, 2026

View reviewed changes

Comment thread packages/opentelemetry-instrumentation-mistralai/tests/test_error_handling.py Outdated

Comment thread packages/opentelemetry-instrumentation-ollama/opentelemetry/instrumentation/ollama/__init__.py

nina-kollman reviewed May 19, 2026

View reviewed changes

dvirski and others added 6 commits May 19, 2026 10:58

fix(langchain,anthropic,groq,mistralai,bedrock,ollama,sagemaker,toget…

d4c4728

…her): record exceptions and set ERROR status on failed spans

CR Changes

5086341

CR Comments fixes

5b7df81

added streaming error handling — wrapped the iteration loops in mistr…

94da9b0

…alai, groq, and ollama

fix(ollama): handle empty streaming response in accumulator

e153479

fixed linting

6408ee9

dvirski force-pushed the dr/fix(langchain,anthropic,groq,mistralai,bedrock,ollama,sagemaker,together)-record-exceptions-and-set-ERROR-status-on-failed-spans branch from e9e5af5 to 6408ee9 Compare May 19, 2026 07:58

nina-kollman approved these changes May 19, 2026

View reviewed changes

dvirski added 3 commits May 19, 2026 11:46

Fixed tests

516e731

fixed tests

5c32981

removed mistralai stream error loggin on spans - breaks tests

9b0560f

dvirski merged commit d6c991c into main May 19, 2026
12 checks passed

This was referenced May 19, 2026

fix(replicate): record errors on spans #4023

Closed

fix(ollama,together,replicate,sagemaker,bedrock): record exceptions on error spans #4005

Closed

This was referenced May 19, 2026

fix: record errors on spans for anthropic, groq, and mistralai #3970

Closed

fix(chromadb,pinecone,weaviate,lancedb,milvus): record exceptions on error spans #4006

Closed



		def test_together_error_sets_span_status(instrument_legacy, together_client, span_exporter):
		with patch("requests.Session.request", side_effect=Exception("API connection error")):

Conversation

dvirski commented May 10, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

CLAassistant commented May 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented May 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot May 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dvirski May 14, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 14, 2026

Choose a reason for hiding this comment

Uh oh!

doronkopit5 commented May 11, 2026

1. ERROR_TYPE is inconsistent across packages

2. Sagemaker streaming wrapper has no error handling

3. Together test patches requests.Session.request while everything else patches httpx.Client.send

4. raise e vs bare raise inconsistency

5. Groq's is_recording() guard is only partial

6. Anthropic test churn — defensive filter added to ~25 unrelated tests

Uh oh!

dvirski commented May 14, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

nina-kollman May 19, 2026

Choose a reason for hiding this comment

Uh oh!

dvirski May 19, 2026

Choose a reason for hiding this comment

Uh oh!

nina-kollman May 19, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

dvirski commented May 10, 2026 •

edited by coderabbitai Bot

Loading

CLAassistant commented May 10, 2026 •

edited

Loading

coderabbitai Bot commented May 10, 2026 •

edited

Loading

coderabbitai Bot May 10, 2026 •

edited

Loading

1. `ERROR_TYPE` is inconsistent across packages

3. Together test patches `requests.Session.request` while everything else patches `httpx.Client.send`

4. `raise e` vs bare `raise` inconsistency

5. Groq's `is_recording()` guard is only partial