Skip to content

[fix][broker] Defer ack state updates until persistence succeeds#25528

Open
nodece wants to merge 2 commits intoapache:masterfrom
nodece:fix-broker-ack-state-updates
Open

[fix][broker] Defer ack state updates until persistence succeeds#25528
nodece wants to merge 2 commits intoapache:masterfrom
nodece:fix-broker-ack-state-updates

Conversation

@nodece
Copy link
Copy Markdown
Member

@nodece nodece commented Apr 15, 2026

Motivation

The broker ack path was updating consumer-side in-memory ack state before the underlying subscription ack path had fully completed. That makes ack failures harder to reason about and can leave broker-side state transitions out of sync with the persisted ack result and ack-receipt flow.

Changes

  • Add an async-first subscription ack path and route broker ack handling through it
  • Defer consumer unacked and pending-ack state updates until the async ack operation completes successfully
  • Keep ack-receipt handling aligned with the async broker ack completion in ServerCnx
  • Update subscription implementations that participate in the shared ack flow, including persistent, non-persistent, compactor, replicated-subscription, dispatcher, and pending-ack paths
  • Refactor Consumer.individualAckNormal and Consumer.individualAckWithTransaction into a single unified individualAck method, eliminating code duplication between the transactional and non-transactional individual ack paths. The unified method preserves the original timing semantics:
    • Non-transactional acks: pending ack state updates remain deferred until persistence succeeds
    • Transactional acks: pending ack state updates are applied immediately (preserving original behavior), with per-position cleanup scheduled on txn storage completion
  • Fix computeAckedCount to return the correct batch size (instead of always 1) when no batch-index ack set is present, ensuring messageAckRate metrics are accurate for batched messages

Known limitation: When a transaction is aborted, the consumer's unacked message count is not restored to its pre-ack value. This is a pre-existing issue unrelated to this refactoring and will be addressed in a follow-up PR.

@codelipenghui
Copy link
Copy Markdown
Contributor

Will the broker stop dispatch messages to the consumer if the consumer has been set to a lower max unack message and the persistence interval of the cursor set to a higher value?

@nodece
Copy link
Copy Markdown
Member Author

nodece commented Apr 16, 2026

@codelipenghui

The broker should not stop dispatching messages just because managedLedgerCursorPositionFlushSeconds is higher.

In this implementation, unackedMessages is decremented when acknowledgeMessageAsync(...) completes, and that completion is driven by the managed cursor ack callback path, not by the periodic cursor flush task itself.

For the normal path, the ack immediately triggers async mark-delete/delete persistence, so it does not wait for managedLedgerCursorPositionFlushSeconds.

managedLedgerCursorPositionFlushSeconds is only a fallback for dirty/rate-limited cursor updates. In that case the cursor callback is completed immediately and the actual flush is deferred, so dispatch is not blocked waiting for that interval either.

@nodece nodece force-pushed the fix-broker-ack-state-updates branch 3 times, most recently from b6b33f1 to 65f7a2a Compare April 17, 2026 10:46
@nodece nodece marked this pull request as draft April 20, 2026 11:09
@nodece nodece force-pushed the fix-broker-ack-state-updates branch 2 times, most recently from 872e835 to 509d095 Compare April 30, 2026 13:03
@nodece nodece marked this pull request as ready for review April 30, 2026 13:07
@nodece nodece force-pushed the fix-broker-ack-state-updates branch from 509d095 to b27a7ae Compare April 30, 2026 13:16
@nodece nodece requested review from dao-jun and lhotari May 1, 2026 09:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants