UN-3211 [FEAT] HTTP session lifecycle management for workers API clients by muhammad-ali-e · Pull Request #1782 · Zipstack/unstract

muhammad-ali-e · 2026-02-06T09:01:26Z

What

Add session lifecycle management (close/cleanup) to all worker API clients
Wire API_CLIENT_POOL_SIZE into HTTPAdapter connection pools
Add _owns_session flag to prevent singleton shared sessions from being closed by individual clients
Add 25 unit tests for session lifecycle behavior

Why

Worker API clients created per-task were not always explicitly closed, risking socket/FD accumulation
API_CLIENT_POOL_SIZE config existed but was never wired into HTTPAdapter (dead config)
on_task_postrun signal handler ran uselessly on every task when singleton mode was disabled
No test coverage existed for session lifecycle behavior

How

base_client.py: Added _owns_session flag, idempotent close(), __del__ destructor, wired pool size into HTTPAdapter
internal_client.py: Set _owns_session=False on all clients sharing singleton session
api-deployment/tasks.py: Added try/finally with api_client.close() for missing cleanup
callback/tasks.py: try/finally cleanup in callback task functions
worker.py: Early-return guard in on_task_postrun when singleton disabled; on_worker_process_shutdown hook
worker_config.py: Default pool size 10, singleton reset threshold config
25 unit tests across 8 test classes

Can this PR break any existing features?

No. All changes are additive safety nets (destructors, finally blocks, flags). Default behavior unchanged — ENABLE_API_CLIENT_SINGLETON=false remains the default. Pool size default stays at 10.

Database Migrations

None

Env Config

API_CLIENT_POOL_SIZE — now actually wired in (default: 10, unchanged)
WORKER_SINGLETON_RESET_THRESHOLD — already documented in sample.env (default: 1000)
ENABLE_API_CLIENT_SINGLETON — existing, unchanged (default: false)

Relevant Docs

UNS-205 spec in mfbt-unstract

Related Issues or PRs

UN-3211

Dependencies Versions

None

Notes on Testing

Run: cd workers && PYTHONPATH=.:../unstract .venv/bin/python -m pytest shared/tests/ -v
25 tests covering: del destructor, close() idempotency, pool size config, singleton reset, task counter, postrun guard, singleton-aware close, managed execution context cleanup

🤖 Generated with Claude Code

coderabbitai · 2026-02-06T09:01:50Z

Summary by CodeRabbit

New Features
- Shared API client singleton lifecycle with Celery hooks to reset and track per-process usage.
Bug Fixes
- Centralized cleanup ensuring API client sessions are closed and execution state/notifications are reliably finalized.
Configuration
- Default API client pool size increased to 10.
- New worker singleton reset threshold (default 1000).
Tests
- Added comprehensive tests for session lifecycle, pooling, counters, and cleanup.

Walkthrough

Centralizes API client and StateStore cleanup in finally blocks; adds singleton/shared HTTP session support with task-count reset and observability; introduces Celery signal handlers for lifecycle events; increases API client pool defaults; adds test fixtures and extensive session lifecycle tests.

Changes

Cohort / File(s)	Summary
API execution & callbacks `workers/api-deployment/tasks.py`, `workers/callback/tasks.py`	Initialize `api_client` explicitly, centralize client closure and StateStore cleanup in `finally` blocks, unify status/pipeline update flows, assemble/return consolidated `callback_result`, and preserve existing error/log handling while ensuring deterministic cleanup.
Session lifecycle & client base `workers/shared/clients/base_client.py`, `workers/shared/api/internal_client.py`	Add session ownership flags (`_owns_session`, `_closed`), idempotent `close()` and `__del__`, config-driven HTTPAdapter pool sizing, shared-session (singleton) support with shared session sharing helper, per-process task counters and timestamp, and singleton control APIs (`reset_singleton`, `increment_task_counter`, `get_task_counter_info`).
Worker signals & lifecycle hooks `workers/worker.py`	Add Celery signal handlers: `on_worker_process_shutdown` to reset singleton and client factory state; `on_task_postrun` to increment task counter and trigger singleton reset when threshold reached (guarded by config). Both handlers use defensive error handling.
Configuration & environment `workers/sample.env`, `workers/shared/infrastructure/config/worker_config.py`	Increase default `API_CLIENT_POOL_SIZE` from `3` to `10`; add `WORKER_SINGLETON_RESET_THRESHOLD` env var and `singleton_reset_task_threshold` WorkerConfig field (default 1000) to control singleton reset behavior.
Tests & test setup `workers/conftest.py`, `workers/shared/tests/test_session_lifecycle.py`	Add root `conftest.py` to set test environment defaults and extensive `test_session_lifecycle.py` covering destructor safety, idempotent close, singleton lifecycle and task-counter reset, pool-size wiring, managed execution context cleanup, and related fixtures.

Sequence Diagram(s)

mermaid
sequenceDiagram
rect rgba(80,160,240,0.5)
participant Task as Celery Task
end
rect rgba(160,200,120,0.5)
participant API as InternalAPIClient
end
rect rgba(240,160,80,0.5)
participant State as StateStore
end
rect rgba(200,120,200,0.5)
participant DB as Pipeline/DB
end

Task->>API: setup_execution_context() => api_client
Task->>API: perform API calls
API-->>Task: responses
Task->>State: read/update context
Task->>DB: update execution/pipeline status
alt completion or error
Task->>API: api_client.close() (finally)
Task->>State: StateStore.clear() (finally)
end
Note right of Task: after task_postrun signal
Task->>API: increment_task_counter()
API-->>Task: reset_singleton() if threshold reached

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically describes the main feature added: HTTP session lifecycle management for workers API clients, matching the primary changes in the PR.
Description check	✅ Passed	The description comprehensively covers all required sections with detailed explanations of what changed, why, how it was implemented, compatibility notes, and testing instructions.
Docstring Coverage	✅ Passed	Docstring coverage is 97.96% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feat/UN-3211-FEAT_http_session_lifecycle_management

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

workers/shared/tests/test_session_lifecycle.py

- Add _owns_session flag to prevent singleton shared session from being closed by individual clients - Wire API_CLIENT_POOL_SIZE into HTTPAdapter connection pools - Add idempotent close() and __del__ destructor to BaseAPIClient - Add try/finally cleanup in api-deployment and callback tasks - Add on_worker_process_shutdown hook and early-return guard in postrun - Add 25 unit tests for session lifecycle behavior Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In `@workers/shared/api/internal_client.py`:
- Around line 256-268: The reset_singleton method currently swallows exceptions
when closing cls._shared_session; change this to log the exception details
instead of silently passing so FD leaks/errors are visible—catch the Exception
around cls._shared_session.close() and call logger.exception or logger.error
with the exception/context (mentioning InternalAPIClient.reset_singleton and
cls._shared_session) before continuing to set cls._shared_session,
cls._shared_base_client, cls._initialization_count, and cls._task_counter to
None/0 and logging the reset completion.
- Around line 271-294: The non-atomic update in increment_task_counter can lose
counts under threaded/eventlet/gevent worker pools; make the method thread-safe
by adding a class-level lock (e.g., _task_counter_lock = threading.Lock()) and
wrapping the read/increment/check/reset sequence in a with _task_counter_lock:
block (import threading where needed) so the operations on _task_counter and
_last_reset_time and the call to reset_singleton() are atomic; alternatively, if
you require prefork-only deployments, add a precondition/assertion at the start
of increment_task_counter that the worker is running in prefork mode and skip
changes.

🧹 Nitpick comments (2)

workers/api-deployment/tasks.py (1)

263-269: Don’t swallow api_client.close() errors silently.
A debug/warn log helps diagnose lingering sessions if close fails.

🔧 Suggested tweak

-            try:
-                api_client.close()
-            except Exception:
-                pass
+            try:
+                api_client.close()
+            except Exception as e:
+                logger.debug("api_client.close() failed during cleanup: %s", e)

workers/callback/tasks.py (1)

1508-1512: Avoid silent failures when closing API clients.
A debug/warn log helps diagnose leaks if close fails in either finally block.

🔧 Suggested tweak

-            try:
-                context.api_client.close()
-            except Exception:
-                pass
+            try:
+                context.api_client.close()
+            except Exception as e:
+                logger.debug("api_client.close() failed during callback cleanup: %s", e)
@@
-        try:
-            api_client.close()
-        except Exception:
-            pass
+        try:
+            api_client.close()
+        except Exception as e:
+            logger.debug("api_client.close() failed during API callback cleanup: %s", e)

Also applies to: 1781-1784

workers/shared/api/internal_client.py

…afe counter - Log warning instead of silently swallowing exceptions in reset_singleton() - Add threading.Lock around task counter increment for thread safety with threads/gevent/eventlet pools Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@workers/shared/api/internal_client.py`:
- Around line 258-296: reset_singleton() can close cls._shared_session while
other threads are mid-request; change to a "swap-then-close" approach: inside
reset_singleton() grab a session lock (use existing _task_counter_lock or add a
dedicated _session_lock), atomically replace cls._shared_session with a new
requests.Session() (or None if you want lazy re-init) and store the old session
in a local variable, then spawn a short-lived background worker (thread or
timer) that waits a configurable grace period and then closes the old session
(catching exceptions); update increment_task_counter to use the same session
lock when reading/swapping to avoid races and ensure any code that reads
_shared_session uses that lock or reads a local reference so in-flight requests
continue using the old session until it is closed after the grace period.

🧹 Nitpick comments (1)

workers/shared/api/internal_client.py (1)

280-296: WorkerConfig() is instantiated inside the lock on every task completion.

Line 285 creates a new WorkerConfig (parsing env vars) while holding _task_counter_lock. This extends the critical section unnecessarily and allocates an object per task. Consider reading the threshold once (e.g., as a class-level cached value or outside the lock).

Also, cls._task_counter = 0 on line 295 is redundant since reset_singleton() (line 269) already resets it.

♻️ Proposed refactor: read config outside the lock, remove redundant reset

     `@classmethod`
     def increment_task_counter(cls) -> None:
-        with cls._task_counter_lock:
-            cls._task_counter += 1
-
-            from shared.infrastructure.config.worker_config import WorkerConfig
+        from shared.infrastructure.config.worker_config import WorkerConfig
 
-            threshold = WorkerConfig().singleton_reset_task_threshold
-            if threshold > 0 and cls._task_counter >= threshold:
-                import time
+        threshold = WorkerConfig().singleton_reset_task_threshold
+        with cls._task_counter_lock:
+            cls._task_counter += 1
+            if threshold > 0 and cls._task_counter >= threshold:
+                import time
 
-                logger.info(
-                    "Task counter reached threshold (%d/%d), resetting singleton session",
-                    cls._task_counter,
-                    threshold,
-                )
-                cls.reset_singleton()
-                cls._task_counter = 0
-                cls._last_reset_time = time.time()
+                logger.info(
+                    "Task counter reached threshold (%d/%d), resetting singleton session",
+                    cls._task_counter,
+                    threshold,
+                )
+                cls.reset_singleton()
+                cls._last_reset_time = time.time()

workers/shared/api/internal_client.py

… document thread-safety - Move WorkerConfig() instantiation outside lock in increment_task_counter() - Remove redundant _task_counter=0 (already done inside reset_singleton) - Document thread-safety caveat in reset_singleton() docstring - Log close failures in task cleanup instead of silently swallowing Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

workers/callback/tasks.py (1)

1380-1389: ⚠️ Potential issue | 🟡 Minor

api_client created inside _extract_callback_parameters is not covered by the try/finally if extraction fails mid-way.

If _extract_callback_parameters raises after create_api_client() (line 702 or 772 in the extraction function) but before assigning to context.api_client (line 846), the newly created client becomes an orphan local variable. The try/finally at lines 1389/1507 won't run because the exception occurs before entering that block.

The __del__ destructor (FR-1) serves as the safety net here, which is the design intent. Just flagging for awareness — if deterministic cleanup is desired, the try/finally could be widened to wrap _extract_callback_parameters as well.

🤖 Fix all issues with AI agents

In `@workers/shared/tests/test_session_lifecycle.py`:
- Around line 371-396: Replace the inline simulated guard in the tests with
calls to the real signal handler on_task_postrun (imported from workers.worker)
and mock InternalAPIClient.increment_task_counter so the handler's guard,
try/except, and logging paths are exercised; for the singleton-disabled test
call on_task_postrun(sender=None, task_id=None, **{}) and assert
increment_task_counter was not called, and for the singleton-enabled test patch
the same method and call on_task_postrun then assert increment_task_counter was
called once, ensuring the patch target matches the import path used inside
on_task_postrun.

🧹 Nitpick comments (5)

workers/shared/api/internal_client.py (2)

125-179: Singleton initialization creates and immediately discards 7 sessions.

Each specialized client's __init__ (via BaseAPIClient.__init__) creates a fresh requests.Session with mounted HTTPAdapter, which _share_session immediately closes and replaces. For 7 specialized clients, that's 7 throwaway sessions per InternalAPIClient instantiation.

This isn't a bug — the sessions are properly closed — but it's wasteful, especially if InternalAPIClient is instantiated frequently (e.g., per-task in non-singleton mode). Consider passing an existing session into the specialized client constructors to avoid the create-then-close pattern.

286-300: WorkerConfig() instantiated on every task completion.

increment_task_counter is called via task_postrun signal after every task. Each call constructs a new WorkerConfig(), which reads environment variables. While this keeps the threshold dynamically reconfigurable, it adds overhead on every task completion.

If env-var reading becomes a concern at scale, consider caching the threshold at the class level and only refreshing it on reset.
workers/shared/tests/test_session_lifecycle.py (1)
322-348: mock_config_singleton fixtures are required for env setup — Ruff ARG002 is a false positive.

The mock_config_singleton parameter in test_increment_counter and test_threshold_triggers_reset isn't directly referenced in the test body, but it's needed because the fixture patches os.environ with WORKER_SINGLETON_RESET_THRESHOLD=3 and ENABLE_API_CLIENT_SINGLETON=true. Without it, WorkerConfig() inside increment_task_counter would read unpatched env vars.

To silence the Ruff warning while keeping the intent clear, you could prefix with underscore:
Suggested fix
-    def test_increment_counter(self, mock_config_singleton):
+    def test_increment_counter(self, mock_config_singleton):  # noqa: ARG002
Or rename the parameter:
-    def test_increment_counter(self, mock_config_singleton):
+    def test_increment_counter(self, _mock_config_singleton):
workers/callback/tasks.py (2)
1484-1503: Use logger.exception for better tracebacks in error handlers.

At lines 1485-1487 and 1503, logger.error(...) is used to log exceptions, but logger.exception(...) would automatically include the traceback, which is more useful for debugging callback failures.
Suggested fix
             except Exception as e:
-                logger.error(
-                    f"Unified batch callback processing failed for execution {context.execution_id}: {e}"
+                logger.exception(
+                    f"Unified batch callback processing failed for execution {context.execution_id}: {e}"
                 )
                 except Exception as cleanup_error:
-                    logger.error(f"Failed to mark execution as failed: {cleanup_error}")
+                    logger.exception(f"Failed to mark execution as failed: {cleanup_error}")
1757-1777: Same logger.error → logger.exception opportunity in API callback error path.

Lines 1758-1760 and 1777 use logger.error where logger.exception would capture the full traceback for easier debugging.
Suggested fix
             except Exception as e:
-                logger.error(
+                logger.exception(
                     f"API callback processing failed for execution {execution_id}: {e}"
                 )
                 except Exception as update_error:
-                    logger.error(f"Failed to update execution status: {update_error}")
+                    logger.exception(f"Failed to update execution status: {update_error}")

workers/shared/tests/test_session_lifecycle.py

…strun handler Tests now call the real worker.on_task_postrun() signal handler instead of simulating the guard logic inline, catching divergence if the handler's guard, try/except, or import path changes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@workers/shared/tests/test_session_lifecycle.py`:
- Around line 243-263: test_default_pool_size uses patch.dict(..., clear=False)
so an externally-set API_CLIENT_POOL_SIZE can leak into the test; ensure the
test removes any inherited value before instantiating WorkerConfig by
deleting/unsetting API_CLIENT_POOL_SIZE in the patched context (so
WorkerConfig().api_client_pool_size is forced to use the hardcoded default).
Locate test_default_pool_size and adjust the patched block to explicitly remove
API_CLIENT_POOL_SIZE from os.environ (e.g., pop if present) before creating
WorkerConfig and asserting api_client_pool_size == 10.

🧹 Nitpick comments (3)

workers/shared/tests/test_session_lifecycle.py (3)
322-348: Prefix unused fixture parameters with _ to suppress Ruff ARG002.

mock_config_singleton is correctly used for its env-patching side effect, but Ruff flags it as unused. Prefixing with _ is the idiomatic pytest convention for fixtures consumed only for side effects.
-    def test_increment_counter(self, mock_config_singleton):
+    def test_increment_counter(self, _mock_config_singleton):
-    def test_threshold_triggers_reset(self, mock_config_singleton):
+    def test_threshold_triggers_reset(self, _mock_config_singleton):
Alternatively, apply @pytest.mark.usefixtures("mock_config_singleton") at the class or method level to avoid the parameter entirely.

415-424: Extract the repeated sub-client attribute list into a constant.

The same 8-element list appears three times in this class. If a sub-client is added or renamed in InternalAPIClient, only some lists may get updated, causing silent test gaps.
Suggested refactor

Define once at module or class level:
_SUB_CLIENT_ATTRS = [
    "base_client",
    "execution_client",
    "file_client",
    "webhook_client",
    "organization_client",
    "tool_client",
    "workflow_client",
    "usage_client",
]
Then reference _SUB_CLIENT_ATTRS in all three test methods.
Also applies to: 466-475, 488-497

523-526: Prefix unused unpacked variables with _ to suppress Ruff RUF059.

The unpacked values aren't needed in these cleanup-focused tests.
-            ) as (cfg, client):
+            ) as (_cfg, _client):
Also applies to: 541-545

workers/shared/tests/test_session_lifecycle.py

…agement

workers/callback/tasks.py

workers/shared/api/internal_client.py

workers/worker.py

workers/api-deployment/tasks.py

chandrasekharan-zipstack · 2026-02-17T07:59:46Z

@muhammad-ali-e I also suggest wiring up the tests to run with tox on every PR to catch regressions

…agement

…ove close() logging Cache the singleton_reset_task_threshold to avoid re-importing WorkerConfig on every task increment. Promote api_client.close() failure logs from debug to warning for better production visibility. Update tests to reset cached threshold. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…r imports Refactor api-deployment tasks to handle setup failures early with proper cleanup, move shared imports to module level in worker.py, and fix type annotations in client_factory. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…agement

workers/sample.env

…agement

greptile-apps · 2026-03-19T09:34:34Z

Greptile Summary

This PR adds HTTP session lifecycle management to all worker API clients: explicit close() calls in try/finally blocks, an idempotent close() with _owns_session guard, a __del__ safety-net destructor, periodic singleton session resets via a task counter, and worker-shutdown hooks. It also wires the previously dead API_CLIENT_POOL_SIZE config into HTTPAdapter and adds 25 unit tests. The changes are well-structured and additive — default behaviour (ENABLE_API_CLIENT_SINGLETON=false) is unchanged.

Key observations:

base_client.py: _owns_session + _closed guard + __del__ are all correct; pool size is now actually used.
internal_client.py: The comment on line 137 ("The first client owns the session") directly contradicts _owns_session = False set on the very next line — misleading for future maintainers.
internal_client.py: _last_reset_time is set in increment_task_counter() but never cleared in reset_singleton(), unlike every other class-level counter. The field also has no consuming logic (only surfaced in get_task_counter_info()).
callback/tasks.py (process_batch_callback_api): The finally calls api_client.close() safely because api_client is assigned before the try, but there is no null-guard. A defensive refactor (initialize api_client = None before the try, or check inside finally) would protect against any future move of the assignment inside the block.
The class rename InternalAPIClientFactory → ClientFactory only had one live reference, and it was already commented out, so no import breakage occurs.

Confidence Score: 4/5

Safe to merge with minor follow-up: the misleading comment and the missing null-guard in the finally block are style/robustness issues that don't affect current correctness.
All changes are additive safety nets with no default-behaviour changes. The core logic (_owns_session, idempotent close(), try/finally blocks, shutdown hooks) is correct. The two previously flagged issues (_cached_reset_threshold outside lock, _last_reset_time never consumed) are still present but acknowledged. The new misleading comment and the lack of a null-guard in the finally of process_batch_callback_api are minor issues that don't affect runtime correctness today.
workers/shared/api/internal_client.py (comment mismatch on line 137, _last_reset_time not cleared in reset_singleton()); workers/callback/tasks.py (null-guard missing in process_batch_callback_api finally).

Important Files Changed

Filename	Overview
workers/shared/clients/base_client.py	Adds `_owns_session` flag (default `True`) and `_closed` idempotency guard to `close()`; adds `__del__` safety-net destructor; wires `API_CLIENT_POOL_SIZE` into `HTTPAdapter`. All logic is well-guarded. The `_closed` flag is not updated when the session is directly replaced in singleton mode, but all consumer code paths are safe given `_owns_session=False`.
workers/shared/api/internal_client.py	Adds `reset_singleton()`, `increment_task_counter()`, and `_owns_session=False` propagation to all shared sub-clients. Two pre-existing issues remain: (1) `_last_reset_time` is not cleared in `reset_singleton()` unlike every other counter field; (2) `_cached_reset_threshold` is read and written outside `_task_counter_lock` (previously flagged). New misleading comment on line 137 ("first client owns session") contradicts `_owns_session=False`.
workers/api-deployment/tasks.py	Restructures `_unified_api_execution` to separate setup failure from execution failure; moves `api_client.close()` and `StateStore.clear_all()` to a `finally` block for guaranteed cleanup. Removes the redundant `# Changed parameter name` / `# Add required task_id` comments from before. Behaviour change: on setup failure, `StateStore.clear_all()` is called in the early-exit `except` rather than the `finally` of a different try block — logically equivalent.
workers/callback/tasks.py	Adds `try/finally` with `api_client.close()` to `_process_batch_callback_core` and `process_batch_callback_api`. In `process_batch_callback_api`, `api_client` is created before the `try` block, so the `finally` is safe as-is, but lacks an explicit null guard. Also removes the old inner `api_client = create_api_client(schema_name)` duplicate creation (a latent resource leak fix).
workers/worker.py	Adds `on_worker_process_shutdown` hook to call `InternalAPIClient.reset_singleton()` and `ClientFactory.reset_shared_state()` on graceful shutdown; adds `on_task_postrun` hook with early-return guard when singleton is disabled (default). Both are wrapped in broad exception handlers. Clean implementation.
workers/shared/tests/test_session_lifecycle.py	25 tests across 8 classes covering destructor safety, close() idempotency, pool size config, singleton reset, task counter threshold, postrun guard, singleton-aware close, and managed context cleanup. Fixture properly resets all class-level state before and after each test. Solid coverage of the new lifecycle features.

Sequence Diagram

sequenceDiagram
    participant Task as Celery Task
    participant IAC as InternalAPIClient
    participant BAC as BaseAPIClient
    participant Session as requests.Session

    Note over Task,Session: Per-task execution (default: singleton disabled)

    Task->>IAC: __init__(config)
    IAC->>BAC: BaseAPIClient(config) [_owns_session=True]
    BAC->>Session: Session() + HTTPAdapter(pool_size)

    Task->>Task: execute work

    Task->>IAC: close()  [try/finally]
    IAC->>BAC: base_client.close()
    BAC->>Session: session.close() [_closed=True]

    Note over Task,Session: Singleton mode (ENABLE_API_CLIENT_SINGLETON=true)

    Task->>IAC: __init__(config)
    IAC->>BAC: BaseAPIClient(config)
    IAC->>BAC: _owns_session = False (shared)
    BAC-->>IAC: _shared_session stored

    Task->>IAC: close()  [try/finally]
    IAC-->>Task: no-op (shared session preserved)

    Note over Task,Session: Periodic reset (WORKER_SINGLETON_RESET_THRESHOLD tasks)

    Task->>+IAC: on_task_postrun signal
    IAC->>IAC: increment_task_counter()
    IAC->>IAC: counter >= threshold?
    IAC->>Session: reset_singleton() → session.close()
    IAC->>IAC: _shared_session = None

    Note over Task,Session: Worker shutdown

    Task->>IAC: on_worker_process_shutdown
    IAC->>Session: reset_singleton() → session.close()
    IAC->>BAC: ClientFactory.reset_shared_state() → close()

Prompt To Fix All With AI

This is a comment left during a code review.
Path: workers/shared/api/internal_client.py
Line: 136-138

Comment:
**Misleading comment contradicts the code**

The comment on line 137 says "The first client owns the session" but the very next line sets `_owns_session = False`, meaning ownership is explicitly *denied*. This is the opposite of what the comment implies and will confuse future maintainers who read it together with the `_owns_session` docstring in `BaseAPIClient` ("Track whether this client owns its session").

Consider replacing with something that matches the intent:

```suggestion
            self.base_client = BaseAPIClient(self.config)
            # Defer session ownership to reset_singleton(); no individual client
            # should close the shared session on close() or __del__.
            self.base_client._owns_session = False
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: workers/callback/tasks.py
Line: 1718-1722

Comment:
**`api_client` may be undefined when `finally` runs**

`api_client` is assigned at the very start of `process_batch_callback_api` before the `try` block:

```python
api_client = create_api_client(organization_id)
logger.info(f"Created organization-scoped API client: {organization_id}")

try:
    ...
finally:
    try:
        api_client.close()   # ← safe because api_client is pre-assigned
```

As currently structured this is safe: if `create_api_client()` raises, we never enter the `try` and the `finally` never runs. However, the `logger.info(...)` call between creation and the `try` block is an implicit code gap — any future refactor that moves the `api_client` assignment *inside* the `try` block would silently introduce a `NameError` in the `finally`.

A defensive guard (matching the pattern already used in `_process_batch_callback_core`) would make intent explicit:

```suggestion
    finally:
        if "api_client" in dir() and api_client is not None:
            try:
                api_client.close()
            except Exception as e:
                logger.warning("api_client.close() failed during cleanup: %s", e)
```

Or, more simply, move `api_client = create_api_client(...)` to the first line inside the `try` block and initialize `api_client = None` before it.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: workers/shared/api/internal_client.py
Line: 289-304

Comment:
**`cls._last_reset_time` set inside the lock but `reset_singleton()` does not clear it**

`cls._last_reset_time = time.time()` is correctly set inside `_task_counter_lock` after calling `reset_singleton()`. However, `reset_singleton()` does not clear `_last_reset_time`, which is inconsistent with its resetting of every other class-level counter/cache field (`_task_counter`, `_cached_reset_threshold`, `_initialization_count`).

This means `_last_reset_time` always reflects the *last* periodic reset, even after a manual `reset_singleton()` call from `on_worker_process_shutdown`. If observability tooling later relies on this value to determine "was the session ever reset?", it will read stale data after a clean shutdown + restart within the same process lifetime.

```python
# In reset_singleton(), alongside the other resets:
cls._last_reset_time = None
```

How can I resolve this? If you propose a fix, please make it concise.

_{Reviews (2): Last reviewed commit: "Merge branch 'main' into feat/UN-3211-FE..." | Re-trigger Greptile}

workers/shared/api/internal_client.py

…agement

github-actions · 2026-03-23T09:53:16Z

Test Results

Summary

✅ Runner Tests: 11 passed, 0 failed (11 total)
✅ SDK1 Tests: 63 passed, 0 failed (63 total)

Runner Tests - Full Report

filepath	function	$$\textcolor{#23d18b}{\tt{passed}}$$	SUBTOTAL
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_logs}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_cleanup}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_cleanup\_skip}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_client\_init}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_image\_exists}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_image}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_container\_run\_config}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_container\_run\_config\_without\_mount}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_run\_container}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_image\_for\_sidecar}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_sidecar\_container}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{TOTAL}}$$		$$\textcolor{#23d18b}{\tt{11}}$$	$$\textcolor{#23d18b}{\tt{11}}$$

SDK1 Tests - Full Report

sonarqubecloud · 2026-03-23T09:53:32Z

Quality Gate passed

Issues
0 New issues
3 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

workers/callback/tasks.py

greptile-apps · 2026-03-23T10:01:10Z

workers/shared/api/internal_client.py

+        if cls._cached_reset_threshold is None:
+            cls._cached_reset_threshold = WorkerConfig().singleton_reset_task_threshold
+
+        with cls._task_counter_lock:
+            cls._task_counter += 1
+            if (
+                cls._cached_reset_threshold > 0
+                and cls._task_counter >= cls._cached_reset_threshold
+            ):
+                logger.info(
+                    "Task counter reached threshold (%d/%d), resetting singleton session",
+                    cls._task_counter,
+                    cls._cached_reset_threshold,
+                )
+                cls.reset_singleton()
+                cls._last_reset_time = time.time()


cls._last_reset_time set inside the lock but reset_singleton() does not clear it

cls._last_reset_time = time.time() is correctly set inside _task_counter_lock after calling reset_singleton(). However, reset_singleton() does not clear _last_reset_time, which is inconsistent with its resetting of every other class-level counter/cache field (_task_counter, _cached_reset_threshold, _initialization_count).

This means _last_reset_time always reflects the last periodic reset, even after a manual reset_singleton() call from on_worker_process_shutdown. If observability tooling later relies on this value to determine "was the session ever reset?", it will read stale data after a clean shutdown + restart within the same process lifetime.

# In reset_singleton(), alongside the other resets: cls._last_reset_time = None

Prompt To Fix With AI

This is a comment left during a code review. Path: workers/shared/api/internal_client.py Line: 289-304 Comment: **`cls._last_reset_time` set inside the lock but `reset_singleton()` does not clear it** `cls._last_reset_time = time.time()` is correctly set inside `_task_counter_lock` after calling `reset_singleton()`. However, `reset_singleton()` does not clear `_last_reset_time`, which is inconsistent with its resetting of every other class-level counter/cache field (`_task_counter`, `_cached_reset_threshold`, `_initialization_count`). This means `_last_reset_time` always reflects the *last* periodic reset, even after a manual `reset_singleton()` call from `on_worker_process_shutdown`. If observability tooling later relies on this value to determine "was the session ever reset?", it will read stale data after a clean shutdown + restart within the same process lifetime. ```python # In reset_singleton(), alongside the other resets: cls._last_reset_time = None ``` How can I resolve this? If you propose a fix, please make it concise.

P2 level, can consider later.

muhammad-ali-e requested review from athul-rs, johnyrahul, pk-zipstack, ritwik-g and vishnuszipstack February 6, 2026 09:02

github-advanced-security bot found potential problems Feb 6, 2026

View reviewed changes

workers/shared/tests/test_session_lifecycle.py Fixed Show fixed Hide fixed

workers/shared/tests/test_session_lifecycle.py Fixed Show fixed Hide fixed

workers/shared/tests/test_session_lifecycle.py Fixed Show fixed Hide fixed

muhammad-ali-e force-pushed the feat/UN-3211-FEAT_http_session_lifecycle_management branch from 7f18370 to 0752a37 Compare February 6, 2026 09:13

coderabbitai bot reviewed Feb 6, 2026

View reviewed changes

workers/shared/api/internal_client.py Show resolved Hide resolved

workers/shared/api/internal_client.py Show resolved Hide resolved

coderabbitai bot reviewed Feb 6, 2026

View reviewed changes

workers/shared/api/internal_client.py Show resolved Hide resolved

coderabbitai bot reviewed Feb 6, 2026

View reviewed changes

workers/shared/tests/test_session_lifecycle.py Outdated Show resolved Hide resolved

coderabbitai bot reviewed Feb 6, 2026

View reviewed changes

workers/shared/tests/test_session_lifecycle.py Show resolved Hide resolved

Merge branch 'main' into feat/UN-3211-FEAT_http_session_lifecycle_man…

e55ca26

…agement

chandrasekharan-zipstack reviewed Feb 17, 2026

View reviewed changes

muhammad-ali-e and others added 4 commits March 10, 2026 15:29

Merge branch 'main' into feat/UN-3211-FEAT_http_session_lifecycle_man…

2407280

…agement

Merge branch 'main' into feat/UN-3211-FEAT_http_session_lifecycle_man…

eeafc02

…agement

chandrasekharan-zipstack approved these changes Mar 10, 2026

View reviewed changes

vishnuszipstack reviewed Mar 11, 2026

View reviewed changes

workers/sample.env Show resolved Hide resolved

vishnuszipstack approved these changes Mar 11, 2026

View reviewed changes

Merge branch 'main' into feat/UN-3211-FEAT_http_session_lifecycle_man…

119176e

…agement

greptile-apps bot reviewed Mar 19, 2026

View reviewed changes

workers/shared/api/internal_client.py Show resolved Hide resolved

workers/shared/api/internal_client.py Show resolved Hide resolved

Merge branch 'main' into feat/UN-3211-FEAT_http_session_lifecycle_man…

06bd327

…agement

greptile-apps bot reviewed Mar 23, 2026

View reviewed changes

hari-kuriakose merged commit df3a560 into main Mar 23, 2026
8 checks passed

hari-kuriakose deleted the feat/UN-3211-FEAT_http_session_lifecycle_management branch March 23, 2026 10:29

Conversation

muhammad-ali-e commented Feb 6, 2026

What

Why

How

Can this PR break any existing features?

Database Migrations

Env Config

Relevant Docs

Related Issues or PRs

Dependencies Versions

Notes on Testing

Uh oh!

coderabbitai bot commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chandrasekharan-zipstack commented Feb 17, 2026

Uh oh!

Uh oh!

greptile-apps bot commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Mar 23, 2026

Test Results

Uh oh!

sonarqubecloud bot commented Mar 23, 2026

Quality Gate passed

Uh oh!

Uh oh!

greptile-apps bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

hari-kuriakose Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

coderabbitai bot commented Feb 6, 2026 •

edited

Loading

greptile-apps bot commented Mar 19, 2026 •

edited

Loading