fix(exporter/otlp/http): log error details from response on export failure#5155
fix(exporter/otlp/http): log error details from response on export failure#5155grvmishra788 wants to merge 5 commits intoopen-telemetry:mainfrom
Conversation
| status_code = None | ||
| else: | ||
| reason = resp.reason | ||
| reason = _parse_response_body(resp, ExportLogsServiceResponse) |
There was a problem hiding this comment.
Is this is not a successful resopnse should not resp being parsed as google.rpc.Status rather than ExportMetricsServiceResponse.
According to https://opentelemetry.io/docs/specs/otlp/#failures-1
"The response body for all HTTP 4xx and HTTP 5xx responses MUST be a Protobuf-encoded Status message that describes the problem."
There was a problem hiding this comment.
Thanks for catching that! Updated _parse_response_body to parse protobuf as google.rpc.Status and extract its message field. Removed the response_class parameter since it's no longer needed.
| if not resp.content: | ||
| return resp.reason | ||
|
|
||
| content_type = resp.headers.get("Content-Type", "") |
There was a problem hiding this comment.
| content_type = resp.headers.get("Content-Type", "") | |
| content_type = ( | |
| resp.headers.get("Content-Type", "") | |
| .split(";", 1)[0] | |
| .strip() | |
| .lower() | |
| ) |
| return resp.text or resp.reason | ||
| if isinstance(body, dict): | ||
| # OTLP partial_success uses camelCase in JSON | ||
| if error_message := body.get("partialSuccess", {}).get( |
There was a problem hiding this comment.
it will crash on
{"partialSuccess": null}or
{"partialSuccess": "x"}Better to cover edge cases
| if rpc_message := body.get("message", ""): | ||
| return rpc_message | ||
|
|
||
| return resp.text or resp.reason |
There was a problem hiding this comment.
| return resp.text or resp.reason | |
| return resp.text.strip() or resp.reason |
Description
Fixes #4526.
Before: When an OTLP HTTP export fails (non-2xx response), all three exporters (trace, metrics, logs) log only
resp.reason— the HTTP reason phrase (e.g."Bad Request"). The response body is never read, so server-provided error details (e.g. a protobuf-encoded error fromtelemetry.googleapis.com) are silently discarded.After: On a non-2xx response, exporters parse the body using the
Content-Typeheader before logging:application/x-protobufExport*ServiceResponse; extractpartial_success.error_messageapplication/jsonpartialSuccess.errorMessageormessage(google.rpc.Status)resp.textresp.reason(previous behaviour, unchanged)Type of change
How Has This Been Tested?
The fix was verified end-to-end by constructing a
400response with a protobuf-encodedExportTraceServiceResponsebody (containingpartial_success.error_message = "Request contains too many spans (limit: 10000)") and a mocked OTLP endpoint (via theresponseslibrary). Before the fix, the logged reason was"Bad Request"; after, it is the server-provided message. The same was confirmed for a JSONgoogle.rpc.Statusbody and for an empty body (fallback toresp.reason, no regression).Run from the repo root:
test_response_body_parsing.py— cover every branch of_parse_response_body: protobuf with/without error message, JSONpartialSuccessandmessagefields,; charset=utf-8content-type variants, unknown content-type, empty body, malformed protobuf, malformed JSON.test_proto_span_exporter.pyandtest_proto_log_exporter.py— verify the parsed message appears in the error log on a real export call.Does This PR Require a Contrib Repo Change?
Checklist: