Fix CloudWatch remote logging for ephemeral lifecycle executor by jason810496 · Pull Request #68779 · apache/airflow

jason810496 · 2026-06-20T05:26:46Z

Why

While trying to setup cloudwatch remote logging in #68709 in order to persist the logs in real time. I encounter the same errors as above listed issues.

The root cause I found is same as #66475 (comment) pointed out. The configure_logging -> dictConfig -> _clearExistingHandlers call chain shutdown the watchtower handler.

How

I went to the another than #66633, instead of configuring the processors after the dictConfig call. We could make the cloudwatch remote logger itself self-healing by creating the fresh instance if previous instance was shutdown by the dictConfig call but also ensure the .close semantic by guarding with the _close state.

What

Fix the lifecycle issue of cloudwatch remote logging and verify with breeze k8s system test with provider only changes without touching the Task-SDK changes.

jason810496 · 2026-06-20T06:10:49Z

cc @sarvesh371, @seanghaeli Could you verify this patch for your setup when you have a moment? Since it likely #66633 won't catch the 3.3 release (we're close to dev freeze for 3.3), so we might release this provider-only patch first. Thanks.

The streaming CloudWatch handler is rebuilt whenever it reports shutting_down, so logs survive configure_logging() closing it. But shutting_down alone cannot tell a mid-task close apart from genuine teardown, so a record arriving after teardown would spin up an orphan handler and its background queue thread that nobody flushes or closes. The supervisor lifecycle makes the two cases distinguishable in time: 1. configure_logging() builds the handler via remote.processors (processors does `_ = self.handler`), registering it in logging._handlerList. 2. The same call then runs dictConfig, whose non-incremental reset closes that handler -> watchtower sets shutting_down=True. 3. Child log records stream through proc -> self.handler, which sees shutting_down and rebuilds. This is the case we must keep working. 4. At the last possible moment _upload_logs() -> upload() -> close() flushes; nothing logs after this. shutting_down is watchtower's flag set by dictConfig (step 2); the new _closed flag is ours, set only by close() (step 4). dictConfig never touches _closed, so the rebuild in step 3 still fires, while a late record after step 4 keeps the closed handler instead of orphaning a new one. close() on the outer CloudwatchTaskHandler now closes the handler the IO is currently using rather than the reference captured in set_context(), which dictConfig may have closed and the IO since rebuilt.

Fix Cloudwatch remote logging for ephemeral lifecycle executor

ee56865

jason810496 requested a review from o-nikolas as a code owner June 20, 2026 05:26

boring-cyborg Bot added area:logging area:providers provider:amazon AWS/Amazon - related issues labels Jun 20, 2026

jason810496 self-assigned this Jun 20, 2026

jason810496 mentioned this pull request Jun 20, 2026

Fix: CloudWatch/Watchtower logs dropped in Task SDK due to handler lifetime bugs #66633

Open

jason810496 requested review from ashb and ferruzzi June 20, 2026 06:07

jason810496 force-pushed the fix/cloudwatch/remote-logging-k8s-executor branch from a92793f to f28925a Compare June 20, 2026 06:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix CloudWatch remote logging for ephemeral lifecycle executor#68779

Fix CloudWatch remote logging for ephemeral lifecycle executor#68779
jason810496 wants to merge 2 commits into
apache:mainfrom
jason810496:fix/cloudwatch/remote-logging-k8s-executor

jason810496 commented Jun 20, 2026 •

edited

Loading

Uh oh!

jason810496 commented Jun 20, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jason810496 commented Jun 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

How

What

Uh oh!

jason810496 commented Jun 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jason810496 commented Jun 20, 2026 •

edited

Loading

jason810496 commented Jun 20, 2026 •

edited

Loading