Implement option to use JVMTI/JFR stack-walker for CPU- and wall-clock-profiler#504
Implement option to use JVMTI/JFR stack-walker for CPU- and wall-clock-profiler#504
Conversation
| // Detect the HotSpot JFR async stack-trace JVMTI extension. Present on JDK 27+ | ||
| // builds that expose the functions "com.sun.hotspot.functions.RequestStackTrace" | ||
| // and "com.sun.hotspot.functions.InitializeRequestStackTrace". If the user asked | ||
| // for jvmtistacks and both are present, initialise now (the JDK requires this | ||
| // to happen during Agent_OnLoad). After initialisation, signal handlers may | ||
| // delegate stack walks via VM::requestStackTrace(). | ||
| if (_hotspot) { | ||
| jint ext_count = 0; | ||
| jvmtiExtensionFunctionInfo *ext_functions = nullptr; | ||
| if (_jvmti->GetExtensionFunctions(&ext_count, &ext_functions) == 0) { | ||
| for (jint i = 0; i < ext_count; i++) { | ||
| if (strcmp(ext_functions[i].id, | ||
| "com.sun.hotspot.functions.RequestStackTrace") == 0) { | ||
| _request_stack_trace = ext_functions[i].func; | ||
| } else if (strcmp(ext_functions[i].id, | ||
| "com.sun.hotspot.functions.InitializeRequestStackTrace") == 0) { | ||
| _init_request_stack_trace = ext_functions[i].func; | ||
| } | ||
| } | ||
| _jvmti->Deallocate((unsigned char *)ext_functions); | ||
| } | ||
|
|
||
| if (_agent_args._jvmtistacks) { | ||
| if (_request_stack_trace != nullptr && _init_request_stack_trace != nullptr) { | ||
| jvmtiError rc = _init_request_stack_trace(_jvmti); | ||
| if (rc == JVMTI_ERROR_NONE) { | ||
| _request_stack_trace_initialized = true; | ||
| Counters::increment(JVMTI_STACKS_INIT_OK); | ||
| } else { | ||
| Log::warn("InitializeRequestStackTrace failed: %d", rc); | ||
| Counters::increment(JVMTI_STACKS_INIT_FAILED); | ||
| } | ||
| } else { | ||
| Log::warn("jvmtistacks requested but HotSpot RequestStackTrace extension " | ||
| "is not available on this JVM"); | ||
| Counters::increment(JVMTI_STACKS_INIT_FAILED); | ||
| } | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
A stylistic nit - can this be extracted to a function?
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 57ab35fed9
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if (_agent_args._jvmtistacks) { | ||
| if (_request_stack_trace != nullptr && _init_request_stack_trace != nullptr) { | ||
| jvmtiError rc = _init_request_stack_trace(_jvmti); |
There was a problem hiding this comment.
Initialize RequestStackTrace from runtime jvmtistacks option
probeJFRRequestStackTrace() gates initialization on _agent_args._jvmtistacks, which is only populated from agent load options. In command/API flows (e.g., JavaProfiler.execute("start,jvmtistacks") after init0()), _agent_args remains false, so InitializeRequestStackTrace is never called and VM::canRequestStackTrace() stays false. As a result, the new jvmtistacks runtime option silently cannot activate the delegated CPU/wall engines unless it was present at JVM/agent startup.
Useful? React with 👍 / 👎.
| Error ITimerJvmti::check(Arguments &args) { | ||
| if (!VM::canRequestStackTrace()) { | ||
| return Error("HotSpot RequestStackTrace JVMTI extension not available"); | ||
| } | ||
| return Error::OK; |
There was a problem hiding this comment.
Validate ITIMER_PROF support in delegated ITimer check
ITimerJvmti::check() only verifies VM::canRequestStackTrace() and skips the ITIMER_PROF probe that ITimer::check() performs. selectCpuEngine() uses check() to choose engines, so this can select ITimerJvmti on systems where setitimer(ITIMER_PROF) later fails in start(), disabling CPU profiling instead of falling back to another available CPU engine.
Useful? React with 👍 / 👎.
|
I think Error::OK is 0, so the condition is correct. But it might be good to use explicit comparison against Error::OK for posterity |
|
Data Race Uninitialized Pointers Plain non-atomic reads of _request_stack_trace and _request_stack_trace_initialized in canRequestStackTrace() race with initialization writes in probeJFRRequestStackTrace(). Signal handlers may see partial initialization on other CPUs due to lack of memory ordering. CWE-131 pattern: pointer_race_no_sync. Use __atomic_load_n for both _request_stack_trace and _request_stack_trace_initialized with __ATOMIC_ACQUIRE semantics in canRequestStackTrace(). Initialize with __atomic_store_n in probeJFRRequestStackTrace(). |
|
False Sharing _sample_seq is placed adjacent to _total_samples (both write-mostly atomics incremented on every sample), sharing a single cache line. This doubles coherence traffic on multi-socket/many-core hardware. Separate with explicit padding (alignas(DEFAULT_CACHE_LINE_SIZE) on _sample_seq in profiler.h:99) or co-locate _sample_seq with a read-mostly field so the dirtied line is not also being read by other paths. |
|
Contended Atomic _sample_seq is incremented with atomicInc (full barrier __sync_fetch_and_add) on every CPU/wall sample from every thread, creating a single globally-contended cache line in the sample hot path. Combined with adjacent _total_samples and adjacent write-mostly operations, this doubles coherence traffic on multi-socket/many-core hardware. Use atomicIncRelaxed(_sample_seq) (or __atomic_fetch_add(..., __ATOMIC_RELAXED)) like _total_samples; the value is opaque to the JVM and only needs uniqueness, not synchronization. Separate with explicit padding (alignas(DEFAULT_CACHE_LINE_SIZE) on _sample_seq in profiler.h:99) to avoid false-sharing with _total_samples. |
|
Integer Overflow Correlation ID generation via atomicInc(_sample_seq) + 1 can wrap at u64 boundaries; when _sample_seq reaches ULLONG_MAX, (ULLONG_MAX + 1) overflows to zero, violating the wire-format contract where zero means 'no correlation'. CWE-190. This becomes more severe when profiler restarts and resets _sample_seq to 0 (line 1067), potentially reusing correlation IDs that match stale AsyncStackTrace events from prior JFR recordings (replay attack vector). Implement saturating increment: u64 seq = atomicIncRelaxed(_sample_seq); u64 corr = (seq == ULLONG_MAX) ? 1 : (seq + 1); to ensure correlation_id is never zero. Alternatively prevent zero by saturating/modular arithmetic on restart. |
|
Misleading Comment The new comment block above selectWallEngine's jvmtistacks branch states 'The engine-level _wallclock_sampler knob still takes precedence for users who explicitly request JVMTI/ASGCT', but the code returns wall_jvmti_engine before the switch(args._wallclock_sampler) is even reached, so jvmtistacks in fact OVERRIDES the knob. Documentation contradicts behavior. Rewrite comment to: 'jvmtistacks overrides _wallclock_sampler when the HotSpot extension is available'. |
|
Signal Safety Errno WallClockJvmti::signalHandler does not save/restore errno across the body, unlike the sibling CTimerJvmti and ITimerJvmti handlers added in the same diff. Calls into ProfiledThread, OS::threadId, and recordSampleDelegated may set errno and clobber the interrupted thread's errno value. Bracket the body with |
|
Null Pointer Dereference probeJFRRequestStackTrace() reads args._jvmtistacks (line 763) and calls _init_request_stack_trace->() (line 765) without checking if the InitializeRequestStackTrace function pointer was actually set. If the extension is not found in GetExtensionFunctions, _init_request_stack_trace remains nullptr, causing a null deref. Add an explicit null check before calling _init_request_stack_trace: 'if (_init_request_stack_trace == nullptr) { Log::warn(...); Counters::increment(JVMTI_STACKS_INIT_FAILED); return; }' after line 764. |
|
Duplication And Lsp CTimerJvmti::start duplicates CTimer::start almost verbatim (max_timers/_timers calloc, registerThread loop) but skips OS::primeSignalOriginCheck() and origin validation that CTimer::start/signalHandler perform. Without origin validation, CTimerJvmti::signalHandler processes every SIGPROF without verifying it originates from setitimer(ITIMER_PROF); foreign SIGPROF (e.g. from raise/pthread_kill) will trigger unintended JVMTI stack-trace requests. Either reuse CTimer::start (e.g. by parameterizing the signal handler) or add OS::primeSignalOriginCheck() call in CTimerJvmti::start() and OS::shouldProcessSignal guard in CTimerJvmti::signalHandler. |
|
Silent Fail CTimerJvmti::start() returns Error::OK unconditionally, discarding thread-registration failures from lines 219-222. Swallows per-thread registerThread failures and reports success even when no threads were actually registered. Change line 226 to |
|
Resource Leak GetExtensionFunctions returns a table where each jvmtiExtensionFunctionInfo entry owns sub-allocations (id, short_description, params array, and each param's name); only the top-level table is Deallocated, leaking every per-entry allocation for the lifetime of the JVM. Iterate ext_functions before deallocating and call _jvmti->Deallocate on each entry's id, short_description, params[j].name for j in [0,param_count), and params itself, then Deallocate(ext_functions). |
|
Argument Parsing Fallthrough jvmtistacks argument parser has a logic bug: case 'f' falls through to 'default' which sets true, but comments say 'false'. All non-matched characters implicitly set true. Add explicit break after case 'f': case 'f': _jvmtistacks = false; break;. Replace default with explicit error handling for invalid values. |
|
Lock Contention recordSampleDelegated attempts tryLock on three different lock indices (582-586) before falling back. If all three fail, the sample is dropped ('dangling AsyncStackTrace'). Under high lock contention, silently loses correlation between profiler events and JFR stack traces. Add a counter (similar to JVMTI_STACKS_FAILED_OTHER) to track dropped samples due to lock contention, and surface this in the stop() diagnostic warnings. |
|
Return Value Semantics recordSampleDelegated returns Either return |
|
Documentation Accuracy Error message at line 675 suggests JFR flag syntax '-XX:StartFlightRecording=...,+jdk.AsyncStackTrace#enabled=true' but standard JFR syntax uses '.enabled=true' not '#enabled=true'. Replace '+jdk.AsyncStackTrace#enabled=true' with 'jdk.AsyncStackTrace.enabled=true'. |
|
Maybe add |
|
NIT: Add a runtime check in WallClockJvmti::initialize() or add a comment stating that initialization must be guarded by the caller checking VM::canRequestStackTrace() first. |
|
Initialization _sample_seq is reset to 0 on each profiler start/reset. No test verifies that the reset does not cause correlation ID collisions if profiler is stopped and restarted with existing JFR recordings. Add integration test or comment explaining that stale AsyncStackTrace events from prior recording are expected. |
What does this PR do?:
We are working to contribute a JVMTI/JFR-based stack-walker into upstream JDK (see JEP Draft: Native Profiler Hook for Unbiased Stack Traces (Experimental)). This feature adds a JVMTI extension to trigger a stack-trace to be emitted via the JVM's JFR. This PR wires that feature to our CPU and wall-clock profilers. When the feature is enabled, and the JVMTI extension is detected, it will prefer that over the other stack-walkers.
Motivation:
I would like to be able to test the new JDK feature end-to-end. Eventually, when the feature is released in upstream JDK, it will be ready to go on our side.
Additional Notes:
The feature needs to be enabled by the jvmtistacks option.
The stack-traces will be emitted via the JVM's JFR stream. We merge the JVM JFR with the agent's JFR, the backend will receive the combined JFR. I'll post a separate PR in the backend which implements the necessary post-processing.
I also made a small unrelated addition to vmStructs to deal with removed uncompressed class-pointers support in JDK 27.
How to test the change?:
I manually verified that the CPU and wall-clock profiler produce the desired AsyncStackTrace events.
For Datadog employees:
credentials of any kind, I've requested a review from
@DataDog/security-design-and-guidance.Unsure? Have a question? Request a review!