Skip to content

Deadlock occurs in Sentry OpenTelemetry Agent under virtual thread environment #5226

@tanyaofei

Description

@tanyaofei

Integration

  • sentry-opentelemetry-agent 8.30.0
  • spring-boot 3.5.9
  • Java 25

Description

We observed that all Redisson threads in our production environment stopped making progress. After collecting thread dumps for all redisson-netty threads, we found that every thread is stuck in Scope.<init>, waiting to acquire the ReentrantLock used for breadcrumbs.

io.netty.util.concurrent.FastThreadLocalThread @ 0x811346a8 : redisson-netty-1-4
  at jdk.internal.misc.Unsafe.park(ZJ)V (Unsafe.java(Native Method))
  at java.util.concurrent.locks.LockSupport.park(Ljava/lang/Object;)V (LockSupport.java:223)
  at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(Ljava/util/concurrent/locks/AbstractQueuedSynchronizer$Node;IZZZJ)I (AbstractQueuedSynchronizer.java:790)
  at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(I)V (AbstractQueuedSynchronizer.java:1030)
  at java.util.concurrent.locks.ReentrantLock$Sync.lock()V (ReentrantLock.java:154)
  at java.util.concurrent.locks.ReentrantLock.lock()V (ReentrantLock.java:323)
  at io.sentry.util.AutoClosableReentrantLock.acquire()Lio/sentry/ISentryLifecycleToken; (AutoClosableReentrantLock.java:12)
  at io.sentry.SynchronizedQueue.toArray([Ljava/lang/Object;)[Ljava/lang/Object; (SynchronizedQueue.java:148)
  at io.sentry.Scope.<init>(Lio/sentry/Scope;)V (Scope.java:145)
  at io.sentry.Scope.clone()Lio/sentry/IScope; (Scope.java:1108)
  at io.sentry.Scopes.forkedScopes(Ljava/lang/String;)Lio/sentry/IScopes; (Scopes.java:114)
  at io.sentry.Sentry.forkedRootScopes(Ljava/lang/String;)Lio/sentry/IScopes; (Sentry.java:146)
  at io.sentry.opentelemetry.SentryContextWrapper.forkCurrentScopeInternal(Lio/opentelemetry/javaagent/shaded/io/opentelemetry/context/Context;Lio/sentry/IScopes;)Lio/sentry/IScopes; (SentryContextWrapper.java:75)
  at io.sentry.opentelemetry.SentryContextWrapper.forkCurrentScope(Lio/opentelemetry/javaagent/shaded/io/opentelemetry/context/Context;)Lio/opentelemetry/javaagent/shaded/io/opentelemetry/context/Context; (SentryContextWrapper.java:46)
  at io.sentry.opentelemetry.SentryContextWrapper.wrap(Lio/opentelemetry/javaagent/shaded/io/opentelemetry/context/Context;)Lio/sentry/opentelemetry/SentryContextWrapper; (SentryContextWrapper.java:94)
  at io.sentry.opentelemetry.SentryContextStorage.root()Lio/opentelemetry/javaagent/shaded/io/opentelemetry/context/Context; (SentryContextStorage.java:44)
  at io.opentelemetry.javaagent.shaded.io.opentelemetry.context.Context.root()Lio/opentelemetry/javaagent/shaded/io/opentelemetry/context/Context; (Context.java:105)
  at io.opentelemetry.javaagent.shaded.io.opentelemetry.context.Context.current()Lio/opentelemetry/javaagent/shaded/io/opentelemetry/context/Context; (Context.java:93)
  at io.sentry.opentelemetry.OtelContextScopesStorage.get()Lio/sentry/IScopes; (OtelContextScopesStorage.java:42)
  at io.sentry.Sentry.getCurrentScopes(Z)Lio/sentry/IScopes; (Sentry.java:120)
  at io.sentry.Sentry.getCurrentScopes()Lio/sentry/IScopes; (Sentry.java:104)
  at io.sentry.ScopesAdapter.getOptions()Lio/sentry/SentryOptions; (ScopesAdapter.java:339)
  at io.sentry.opentelemetry.OtelSentrySpanProcessor.onEnd(Lio/opentelemetry/sdk/trace/ReadableSpan;)V (OtelSentrySpanProcessor.java:179)
  at io.opentelemetry.sdk.trace.MultiSpanProcessor.onEnd(Lio/opentelemetry/sdk/trace/ReadableSpan;)V (MultiSpanProcessor.java:54)
  at io.opentelemetry.sdk.trace.SdkSpan.endInternal(J)V (SdkSpan.java:570)
  at io.opentelemetry.sdk.trace.SdkSpan.end()V (SdkSpan.java:539)
  at io.opentelemetry.javaagent.shaded.instrumentation.api.instrumenter.Instrumenter.doEnd(Lio/opentelemetry/javaagent/shaded/io/opentelemetry/context/Context;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Throwable;Ljava/time/Instant;)V (Instrumenter.java:298)
  at io.opentelemetry.javaagent.shaded.instrumentation.api.instrumenter.Instrumenter.end(Lio/opentelemetry/javaagent/shaded/io/opentelemetry/context/Context;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Throwable;)V (Instrumenter.java:156)
  at io.opentelemetry.javaagent.instrumentation.redisson.EndOperationListener.accept(Ljava/lang/Object;Ljava/lang/Throwable;)V (EndOperationListener.java:26)
  at io.opentelemetry.javaagent.instrumentation.redisson.CompletableFutureWrapper.lambda$new$0(Lio/opentelemetry/javaagent/shaded/io/opentelemetry/context/Context;Ljava/util/concurrent/CompletableFuture;Ljava/lang/Object;Ljava/lang/Throwable;)V (CompletableFutureWrapper.java:22)
  at io.opentelemetry.javaagent.instrumentation.redisson.CompletableFutureWrapper$$Lambda+0x000000000ae9a1b0.accept(Ljava/lang/Object;Ljava/lang/Object;)V ()
  at java.util.concurrent.CompletableFuture.uniWhenComplete(Ljava/lang/Object;Ljava/util/function/BiConsumer;Ljava/util/concurrent/CompletableFuture$UniWhenComplete;)Z (CompletableFuture.java:884)
  at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(I)Ljava/util/concurrent/CompletableFuture; (CompletableFuture.java:862)
  at java.util.concurrent.CompletableFuture.postComplete()V (CompletableFuture.java:531)
  at java.util.concurrent.CompletableFuture.complete(Ljava/lang/Object;)Z (CompletableFuture.java:2206)
  at org.redisson.client.handler.CommandDecoder.completeResponse(Lorg/redisson/client/protocol/CommandData;Ljava/lang/Object;)V (CommandDecoder.java:550)
  at org.redisson.client.handler.CommandDecoder.handleResult(Lorg/redisson/client/protocol/CommandData;Ljava/util/List;Ljava/lang/Object;Z)V (CommandDecoder.java:544)
  at org.redisson.client.handler.CommandDecoder.decodeResult(Lorg/redisson/client/protocol/CommandData;Ljava/util/List;Lio/netty/channel/Channel;Ljava/lang/Object;)V (CommandDecoder.java:534)
  at org.redisson.client.handler.CommandPubSubDecoder.decodeResult(Lorg/redisson/client/protocol/CommandData;Ljava/util/List;Lio/netty/channel/Channel;Ljava/lang/Object;)V (CommandPubSubDecoder.java:200)
  at org.redisson.client.handler.CommandDecoder.decodeList(Lio/netty/buffer/ByteBuf;Lorg/redisson/client/protocol/CommandData;Ljava/util/List;Lio/netty/channel/Channel;JLjava/util/List;ZLjava/util/List;Lorg/redisson/client/handler/State;)V (CommandDecoder.java:529)
  at org.redisson.client.handler.CommandDecoder.decode(Lio/netty/buffer/ByteBuf;Lorg/redisson/client/protocol/CommandData;Ljava/util/List;Lio/netty/channel/Channel;ZLjava/util/List;JLorg/redisson/client/handler/State;)V (CommandDecoder.java:464)
  at org.redisson.client.handler.CommandPubSubDecoder.decodeCommand(Lio/netty/channel/Channel;Lio/netty/buffer/ByteBuf;Lorg/redisson/client/protocol/QueueCommand;ILorg/redisson/client/handler/State;)V (CommandPubSubDecoder.java:92)
  at org.redisson.client.handler.CommandDecoder.decode(Lio/netty/channel/ChannelHandlerContext;Lio/netty/buffer/ByteBuf;Lorg/redisson/client/protocol/QueueCommand;ILorg/redisson/client/handler/State;)V (CommandDecoder.java:148)
  at org.redisson.client.handler.CommandDecoder.decode(Lio/netty/channel/ChannelHandlerContext;Lio/netty/buffer/ByteBuf;Ljava/util/List;)V (CommandDecoder.java:124)
  at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(Lio/netty/channel/ChannelHandlerContext;Lio/netty/buffer/ByteBuf;Ljava/util/List;)V (ByteToMessageDecoder.java:545)
  at io.netty.handler.codec.ReplayingDecoder.callDecode(Lio/netty/channel/ChannelHandlerContext;Lio/netty/buffer/ByteBuf;Ljava/util/List;)V (ReplayingDecoder.java:366)
  at io.netty.handler.codec.ByteToMessageDecoder.channelRead(Lio/netty/channel/ChannelHandlerContext;Ljava/lang/Object;)V (ByteToMessageDecoder.java:296)
  at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(Ljava/lang/Object;)V (AbstractChannelHandlerContext.java:444)
 ...

While investigating, we tried to identify which thread was holding the lock. However, we observed that in AQS the state = 0 and exclusiveOwnerThread = null, while there are still waiters in the queue. The next waiter is a virtual thread. This indicates that the lock is in the process of being released.

Eventually, we identified the thread that previously held the lock. It is stuck during the lock release process. Its stack trace is as follows:

java.lang.VirtualThread @ 0xde480368 : tomcat-handler-188
  at jdk.internal.misc.Unsafe.park(ZJ)V (Unsafe.java(Native Method))
  at java.lang.VirtualThread.parkOnCarrierThread(ZJ)V (VirtualThread.java:813)
  at java.lang.VirtualThread.park()V (VirtualThread.java:751)
  at java.lang.System$1.parkVirtualThread()V (System.java:2284)
  at java.util.concurrent.locks.LockSupport.park(Ljava/lang/Object;)V (LockSupport.java:221)
  at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(Ljava/util/concurrent/locks/AbstractQueuedSynchronizer$Node;IZZZJ)I (AbstractQueuedSynchronizer.java:790)
  at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(I)V (AbstractQueuedSynchronizer.java:1030)
  at java.util.concurrent.locks.ReentrantLock$Sync.lock()V (ReentrantLock.java:154)
  at java.util.concurrent.locks.ReentrantLock.lock()V (ReentrantLock.java:323)
  at io.sentry.util.AutoClosableReentrantLock.acquire()Lio/sentry/ISentryLifecycleToken; (AutoClosableReentrantLock.java:12)
  at io.sentry.SynchronizedQueue.toArray([Ljava/lang/Object;)[Ljava/lang/Object; (SynchronizedQueue.java:148)
  at io.sentry.Scope.<init>(Lio/sentry/Scope;)V (Scope.java:145)
  at io.sentry.Scope.clone()Lio/sentry/IScope; (Scope.java:1108)
  at io.sentry.Scopes.forkedScopes(Ljava/lang/String;)Lio/sentry/IScopes; (Scopes.java:114)
  at io.sentry.Sentry.forkedRootScopes(Ljava/lang/String;)Lio/sentry/IScopes; (Sentry.java:146)
  at io.sentry.opentelemetry.SentryContextWrapper.forkCurrentScopeInternal(Lio/opentelemetry/javaagent/shaded/io/opentelemetry/context/Context;Lio/sentry/IScopes;)Lio/sentry/IScopes; (SentryContextWrapper.java:75)
  at io.sentry.opentelemetry.SentryContextWrapper.forkCurrentScope(Lio/opentelemetry/javaagent/shaded/io/opentelemetry/context/Context;)Lio/opentelemetry/javaagent/shaded/io/opentelemetry/context/Context; (SentryContextWrapper.java:46)
  at io.sentry.opentelemetry.SentryContextWrapper.wrap(Lio/opentelemetry/javaagent/shaded/io/opentelemetry/context/Context;)Lio/sentry/opentelemetry/SentryContextWrapper; (SentryContextWrapper.java:94)
  at io.sentry.opentelemetry.SentryContextStorage.root()Lio/opentelemetry/javaagent/shaded/io/opentelemetry/context/Context; (SentryContextStorage.java:44)
  at io.opentelemetry.javaagent.shaded.io.opentelemetry.context.Context.root()Lio/opentelemetry/javaagent/shaded/io/opentelemetry/context/Context; (Context.java:105)
  at io.opentelemetry.javaagent.shaded.io.opentelemetry.context.Context.current()Lio/opentelemetry/javaagent/shaded/io/opentelemetry/context/Context; (Context.java:93)
  at io.opentelemetry.javaagent.instrumentation.executors.JavaExecutorInstrumentation$SetExecuteRunnableStateAdvice$ExecuteRunnableAdviceScope.start(Lio/opentelemetry/javaagent/bootstrap/CallDepth;Ljava/lang/Runnable;)Lio/open-telemetry/javaagent/instrumentation/executors/JavaExecutorInstrumentation$SetExecuteRunnableStateAdvice$ExecuteRunnableAdviceScope; (JavaExecutorInstrumentation.java:120)
  at java.util.concurrent.ForkJoinPool.execute(Ljava/lang/Runnable;)V (ForkJoinPool.java:3179)
  at java.lang.VirtualThread.submitRunContinuation(Ljava/util/concurrent/Executor;Z)V (VirtualThread.java:335)
  at java.lang.VirtualThread.submitRunContinuation()V (VirtualThread.java:382)
  at java.lang.VirtualThread.unpark()V (VirtualThread.java:850)
  at java.lang.System$1.unparkVirtualThread(Ljava/lang/Thread;)V (System.java:2301)
  at java.util.concurrent.locks.LockSupport.unpark(Ljava/lang/Thread;)V (LockSupport.java:181)
  at java.util.concurrent.locks.AbstractQueuedSynchronizer.signalNext(Ljava/util/concurrent/locks/AbstractQueuedSynchronizer$Node;)V (AbstractQueuedSynchronizer.java:646)
  at java.util.concurrent.locks.AbstractQueuedSynchronizer.release(I)Z (AbstractQueuedSynchronizer.java:1100)
  at java.util.concurrent.locks.ReentrantLock.unlock()V (ReentrantLock.java:495)
  at io.sentry.util.AutoClosableReentrantLock$AutoClosableReentrantLockLifecycleToken.close()V (AutoClosableReentrantLock.java:26)
  at io.sentry.SynchronizedQueue.toArray([Ljava/lang/Object;)[Ljava/lang/Object; (SynchronizedQueue.java:150)
  at io.sentry.Scope.<init>(Lio/sentry/Scope;)V (Scope.java:145)

We can see the thread attempts to acquire the same lock while releasing the lock.

How this happen

  1. When thread A creates a Scope, which involves copying the parent scope’s breadcrumbs (using a ReentrantLock), so it acquires the lock.
  2. Thread B (a virtual thread) also tries to create a Scope at this moment and attempts to acquire the same lock. Since the lock is already held, it is added to the waiting queue.
  3. After Thread A finishes copying, it attempts to release the lock. Upon release, it needs to wake up the next waiting thread (thread B). Since B is a virtual thread, this results in creating a continuation that is submitted to the ForkJoinPool.
  4. When submitting this continuation to the ForkJoinPool, the Java Executor Instrumentation triggers another Scope creation, which again tries to acquire the same lock.
  5. At this point, thread B has not been successfully resumed, while thread A ends up queuing again to wait for the same lock.
Image

Metadata

Metadata

Assignees

No one assigned
    No fields configured for issues without a type.

    Projects

    Status

    Waiting for: Product Owner

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions