Skip to content

[core] Fix JDBC connection leak in ClientPool and JdbcCatalog#8268

Open
junmuz wants to merge 2 commits into
apache:masterfrom
junmuz:fixing_jdbc_connection_leaks
Open

[core] Fix JDBC connection leak in ClientPool and JdbcCatalog#8268
junmuz wants to merge 2 commits into
apache:masterfrom
junmuz:fixing_jdbc_connection_leaks

Conversation

@junmuz

@junmuz junmuz commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Purpose

Fixes the Jdbc connection leaks that happen when Flink TMs restart.
Issue: #8267

Tests

There are some tests added specially the testConnectionClosedWhenPoolClosedDuringAction that simulates the scenario.

@junmuz junmuz force-pushed the fixing_jdbc_connection_leaks branch from 7f16f08 to 05dee59 Compare June 17, 2026 16:40
@junmuz

junmuz commented Jun 17, 2026

Copy link
Copy Markdown
Contributor Author

@JingsongLi Can I get a review on the PR?

@JingsongLi

Copy link
Copy Markdown
Contributor

I found one remaining leak race in ClientPool.run. The check at lines 71-72 is not atomic with returning the borrowed client to the deque: a thread can observe this.clients != null, then another thread calls close(), sets this.clients = null and drains the deque, and then the first thread executes clients.addFirst(client) on the old deque. That client was not included in the drain and is no longer reachable from the pool, so it is still left unclosed. This is the same close-during-action case this PR is trying to handle, just with a narrower interleaving. The return-to-pool path and close() need to be made mutually exclusive, or the code needs a post-add closed check/removal that closes the client if the pool was closed concurrently.

Simplify the finally block to always add the client back to the deque
first, then check if close() raced. The deque's internal lock ensures
mutual exclusion between drainTo and remove, preventing both leaks and
double-closes.
@junmuz

junmuz commented Jun 22, 2026

Copy link
Copy Markdown
Contributor Author

I found one remaining leak race in ClientPool.run. The check at lines 71-72 is not atomic with returning the borrowed client to the deque: a thread can observe this.clients != null, then another thread calls close(), sets this.clients = null and drains the deque, and then the first thread executes clients.addFirst(client) on the old deque. That client was not included in the drain and is no longer reachable from the pool, so it is still left unclosed. This is the same close-during-action case this PR is trying to handle, just with a narrower interleaving. The return-to-pool path and close() need to be made mutually exclusive, or the code needs a post-add closed check/removal that closes the client if the pool was closed concurrently.

@JingsongLi Thanks for the comment. I have tried addressing that.

nickdelnano added a commit to nickdelnano/paimon that referenced this pull request Jun 22, 2026
When close() is called while a run() action is in-flight, the connection
is returned to the deque after close() has already drained it. The
connection is then orphaned and never closed.

Fix: after returning the client to the deque, check if close() raced us
(this.clients == null). If so, remove the client we just added and close
it directly.

Credit: extracted from apache#8268.
@nickdelnano

Copy link
Copy Markdown
Contributor

@junmuz I took a different approach in #8323 to reduce jdbc client connections via sharing per JVM. That approach can reduce connections by a large factor for high parallelism jobs. It would affect some of the logic in this PR as with shared connections we would not explicitly call close.

We would still need the fix of 54a23e4 from this branch, which I have ported there.

Let me know if you have any thoughts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants