Skip to content

Fixed RTCEngine.addTrack leaking pendingTrackResolvers on cancellation#920

Open
adrian-niculescu wants to merge 1 commit intolivekit:mainfrom
adrian-niculescu:fix-addtrack-resolver-leak
Open

Fixed RTCEngine.addTrack leaking pendingTrackResolvers on cancellation#920
adrian-niculescu wants to merge 1 commit intolivekit:mainfrom
adrian-niculescu:fix-addtrack-resolver-leak

Conversation

@adrian-niculescu
Copy link
Copy Markdown
Contributor

Summary

RTCEngine.addTrack registers the publish continuation in pendingTrackResolvers[cid] but never cleans it up if the 20s withDeadline fires or the calling coroutine is cancelled — the suspendCancellableCoroutine block has no invokeOnCancellation handler. As a result, retrying the same LocalAudioTrack / LocalVideoTrack (same MediaStreamTrack.id() → same cid) hits the duplicate guard at RTCEngine.kt:395 and throws DuplicateTrackException, blocking republish until any of:

  • the server eventually responds for that cid (clears the entry via onLocalTrackPublished),
  • the signal channel reconnects (onCloseabortPendingPublishTracks), or
  • the engine closes.

For lost AddTrack requests or dropped responses on a healthy signal, the mic/camera publish stays poisoned indefinitely.

Fix

Register a cont.invokeOnCancellation inside the same synchronized(pendingTrackResolvers) block that does the insert. The handler removes the entry only if it still references the same continuation (pendingTrackResolvers[cid] === cont) so a stale handler from a cancelled publish cannot evict a freshly-registered resolver bound to the same cid — necessary because the duplicate guard would still let the retry register before the previous handler runs in concurrent scenarios.

Test plan

  • Added addTrackTimeoutDoesNotPoisonRetry and addTrackCallerCancellationDoesNotPoisonRetry in RTCEngineMockE2ETest. Both unregister the mock auto-responder so the first publish actually times out / stays in flight, then assert that a same-cid retry resolves cleanly when the server eventually replies.
  • Verified both tests fail without the fix with Track with same ID local_cid has already been published! — confirming they catch the original bug.
  • Both tests pass with the fix.
  • Full livekit-android-test:testDebugUnitTest suite passes (242 tests).
  • ./gradlew spotlessApply clean.

When the AddTrack request timed out via withDeadline (20s) or its caller
coroutine was cancelled, the cid stayed in pendingTrackResolvers because
suspendCancellableCoroutine had no invokeOnCancellation cleanup. A retry
of the same track instance (same cid) hit the duplicate guard and threw
DuplicateTrackException, blocking republish until the connection closed,
the signal reconnected, or the server eventually replied for that cid.

Added cont.invokeOnCancellation that removes the entry under the existing
synchronized monitor, with an identity check so a stale handler cannot
evict a freshly-registered resolver bound to the same cid.

Added two regression tests in RTCEngineMockE2ETest covering the timeout
and caller-cancellation paths.
@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented Apr 30, 2026

🦋 Changeset detected

Latest commit: 8d335b3

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
client-sdk-android Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant