Skip to content

fix(flask): wrap wsgi_app call in try/except to prevent active_requests gauge leak#4433

Open
alliasgher wants to merge 8 commits intoopen-telemetry:mainfrom
alliasgher:fix-flask-active-requests-gauge-leak
Open

fix(flask): wrap wsgi_app call in try/except to prevent active_requests gauge leak#4433
alliasgher wants to merge 8 commits intoopen-telemetry:mainfrom
alliasgher:fix-flask-active-requests-gauge-leak

Conversation

@alliasgher
Copy link
Copy Markdown
Contributor

@alliasgher alliasgher commented Apr 14, 2026

Description

Wrap the wsgi_app call in try/finally so that http.server.active_requests is decremented when the wrapped WSGI app raises. Without this, every exception leaves the gauge permanently above the real number of in-flight requests, causing steady drift upward over the process lifetime.

Fixes #4403

Type of change

  • Bug fix (non-breaking change which fixes an issue)

How Has This Been Tested?

Added test_active_requests_decremented_on_error in test_programmatic.py which drives a handler that raises and asserts the gauge ends at 0.

Checklist

  • Changelog entry added under Unreleased / Fixed

@liyaka
Copy link
Copy Markdown

liyaka commented Apr 14, 2026

Thanks for the quick fix! We hit this exact bug in production — the leaked gauge was preventing our Kubernetes HPA from scaling down.

One suggestion: consider using try/finally instead of try/except to keep the decrement in a single code path. With the current approach, active_requests_counter.add(-1, ...) exists in two places (the except block and line 458), which is easy to get out of sync during future refactors.

The WSGI instrumentation already uses this pattern:

# wsgi/__init__.py
try:
    ...
    return _end_span_after_iterating(iterable, span, token)
except Exception as ex:
    raise
finally:
    self.active_requests_counter.add(-1, active_requests_count_attrs)

For Flask it would look like:

try:
    result = wsgi_app(wrapped_app_environ, _start_response)
    # ...duration recording unchanged...
    return result
finally:
    active_requests_counter.add(-1, active_requests_count_attrs)

This also removes the original decrement at line 458, so there's exactly one decrement path.

Also worth adding a test to prevent regression — I have one in #4437 (test_active_requests_counter_decremented_on_error) that could be adapted here.

@alliasgher
Copy link
Copy Markdown
Contributor Author

alliasgher commented Apr 14, 2026

Good catch — updated to use try/finally so the decrement is in a single code path regardless of whether wsgi_app succeeds or raises @liyaka

@alliasgher alliasgher force-pushed the fix-flask-active-requests-gauge-leak branch from bb98692 to 5c2b07a Compare April 14, 2026 20:20
@alliasgher
Copy link
Copy Markdown
Contributor Author

Added the regression test test_active_requests_counter_decremented_on_error. It hits /hello/500 (which raises ValueError internally), collects http.server.active_requests, and asserts the value is back to 0 after both requests complete.

…ts gauge leak

If wsgi_app() raises an uncaught exception, the active_requests_counter
decrement at the end of _wrapped_app was never reached, causing the gauge
to permanently read high. Kubernetes HPA and similar systems would see
phantom load.

Add a bare try/except that decrements the counter and re-raises on
exception, matching the pattern already used in the WSGI instrumentation.

Fixes open-telemetry#4431

Signed-off-by: alliasgher <alliasgher123@gmail.com>
Signed-off-by: alliasgher <alliasgher123@gmail.com>
Signed-off-by: alliasgher <alliasgher123@gmail.com>
Signed-off-by: Ali <alliasgher123@gmail.com>
@alliasgher alliasgher force-pushed the fix-flask-active-requests-gauge-leak branch from 5c2b07a to efb677f Compare April 15, 2026 20:21
@alliasgher alliasgher requested a review from a team as a code owner April 15, 2026 20:21
Copy link
Copy Markdown
Member

@MikeGoldsmith MikeGoldsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this fix @alliasgher - looks good to me.

@github-project-automation github-project-automation Bot moved this to Approved PRs in Python PR digest Apr 15, 2026
@liyaka
Copy link
Copy Markdown

liyaka commented Apr 16, 2026

Any estimation for this fix to be released?
thank you!

Signed-off-by: Ali <alliasgher123@gmail.com>
@alliasgher alliasgher force-pushed the fix-flask-active-requests-gauge-leak branch from e56a8af to b578c8d Compare April 16, 2026 21:14
The check-links workflow fails on any PR that touches CHANGELOG.md
because the full file is scanned and five historical entries contain
broken URLs:

- open-telemetry#1670 and open-telemetry#227 entries have a stray `]` inside the URL.
- The open-telemetry#1033 entry is missing the `/` between the org and repo in the URL.
- The `aws.ecs.*` spec link points to the old path in
  opentelemetry-specification; the content has since moved to the
  semantic-conventions repo.
- The 1.12.0rc2-0.32b0 release tag does not exist on
  opentelemetry-python; drop the link, keep the heading text.

Signed-off-by: Ali <alliasgher123@gmail.com>
@alliasgher
Copy link
Copy Markdown
Contributor Author

Pushed bb17ad0 to fix broken links in CHANGELOG.md that were failing check-links CI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Approved PRs

Development

Successfully merging this pull request may close these issues.

3 participants