fix(flask): wrap wsgi_app call in try/except to prevent active_requests gauge leak#4433
fix(flask): wrap wsgi_app call in try/except to prevent active_requests gauge leak#4433alliasgher wants to merge 8 commits intoopen-telemetry:mainfrom
Conversation
|
Thanks for the quick fix! We hit this exact bug in production — the leaked gauge was preventing our Kubernetes HPA from scaling down. One suggestion: consider using The WSGI instrumentation already uses this pattern: # wsgi/__init__.py
try:
...
return _end_span_after_iterating(iterable, span, token)
except Exception as ex:
raise
finally:
self.active_requests_counter.add(-1, active_requests_count_attrs)For Flask it would look like: try:
result = wsgi_app(wrapped_app_environ, _start_response)
# ...duration recording unchanged...
return result
finally:
active_requests_counter.add(-1, active_requests_count_attrs)This also removes the original decrement at line 458, so there's exactly one decrement path. Also worth adding a test to prevent regression — I have one in #4437 ( |
|
Good catch — updated to use |
bb98692 to
5c2b07a
Compare
|
Added the regression test test_active_requests_counter_decremented_on_error. It hits /hello/500 (which raises ValueError internally), collects http.server.active_requests, and asserts the value is back to 0 after both requests complete. |
…ts gauge leak If wsgi_app() raises an uncaught exception, the active_requests_counter decrement at the end of _wrapped_app was never reached, causing the gauge to permanently read high. Kubernetes HPA and similar systems would see phantom load. Add a bare try/except that decrements the counter and re-raises on exception, matching the pattern already used in the WSGI instrumentation. Fixes open-telemetry#4431 Signed-off-by: alliasgher <alliasgher123@gmail.com>
Signed-off-by: alliasgher <alliasgher123@gmail.com>
Signed-off-by: alliasgher <alliasgher123@gmail.com>
Signed-off-by: Ali <alliasgher123@gmail.com>
5c2b07a to
efb677f
Compare
MikeGoldsmith
left a comment
There was a problem hiding this comment.
Thanks for this fix @alliasgher - looks good to me.
|
Any estimation for this fix to be released? |
Signed-off-by: Ali <alliasgher123@gmail.com>
e56a8af to
b578c8d
Compare
The check-links workflow fails on any PR that touches CHANGELOG.md because the full file is scanned and five historical entries contain broken URLs: - open-telemetry#1670 and open-telemetry#227 entries have a stray `]` inside the URL. - The open-telemetry#1033 entry is missing the `/` between the org and repo in the URL. - The `aws.ecs.*` spec link points to the old path in opentelemetry-specification; the content has since moved to the semantic-conventions repo. - The 1.12.0rc2-0.32b0 release tag does not exist on opentelemetry-python; drop the link, keep the heading text. Signed-off-by: Ali <alliasgher123@gmail.com>
|
Pushed bb17ad0 to fix broken links in |
Signed-off-by: Ali <alliasgher123@gmail.com>
Description
Wrap the
wsgi_appcall intry/finallyso thathttp.server.active_requestsis decremented when the wrapped WSGI app raises. Without this, every exception leaves the gauge permanently above the real number of in-flight requests, causing steady drift upward over the process lifetime.Fixes #4403
Type of change
How Has This Been Tested?
Added
test_active_requests_decremented_on_errorintest_programmatic.pywhich drives a handler that raises and asserts the gauge ends at 0.Checklist
Unreleased / Fixed