Summary
dataretrieval/waterdata/utils.py:335 (_error_body) blindly calls resp.json() for any status code other than 429 or 403. When the upstream API returns a non-JSON body (typical for 5xx gateway errors — e.g. an openresty 502 Bad Gateway HTML page), this raises requests.exceptions.JSONDecodeError, which masks the real status code and replaces the actionable diagnostic with a confusing JSON parse traceback.
Repro
Observed in CI on branch add-filter-pitfall-check — test_get_latest_daily hit a transient upstream 502:
self = <Response [502]>
s = '<html>\r\n<head><title>502 Bad Gateway</title></head>...openresty...'
...
dataretrieval/waterdata/utils.py:335: in _error_body
j_txt = resp.json()
...
requests.exceptions.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
The user never sees that it was a 502 — they see a JSON decode error, which suggests a parsing problem in our code rather than an upstream outage.
Proposed fix
Make _error_body resilient to non-JSON bodies: try JSON first, fall back to a truncated raw-text snippet plus the status/reason.
try:
j_txt = resp.json()
return (
f"{status}: {j_txt.get('code', 'Unknown type')}. "
f"{j_txt.get('description', 'No description provided')}."
)
except ValueError:
snippet = (resp.text or "").strip()[:200]
if snippet:
return f"{status}: {resp.reason or 'Error'}. {snippet}"
return f"{status}: {resp.reason or 'Error'}."
Notes
- The triggering CI failure was an upstream flake, not a code bug — re-running CI will likely make it pass. This issue is specifically about the masked-error behavior, which would surface for any non-JSON error response, not just 502s.
- A test for this could mock a
Response with status_code=502 and an HTML body, and assert that the resulting RuntimeError message contains 502 (and does not raise JSONDecodeError).
Summary
dataretrieval/waterdata/utils.py:335(_error_body) blindly callsresp.json()for any status code other than 429 or 403. When the upstream API returns a non-JSON body (typical for 5xx gateway errors — e.g. an openresty502 Bad GatewayHTML page), this raisesrequests.exceptions.JSONDecodeError, which masks the real status code and replaces the actionable diagnostic with a confusing JSON parse traceback.Repro
Observed in CI on branch
add-filter-pitfall-check—test_get_latest_dailyhit a transient upstream 502:The user never sees that it was a 502 — they see a JSON decode error, which suggests a parsing problem in our code rather than an upstream outage.
Proposed fix
Make
_error_bodyresilient to non-JSON bodies: try JSON first, fall back to a truncated raw-text snippet plus the status/reason.Notes
Responsewithstatus_code=502and an HTML body, and assert that the resultingRuntimeErrormessage contains502(and does not raiseJSONDecodeError).