Skip to content

Implement option to use JVMTI/JFR stack-walker for CPU- and wall-clock-profiler#504

Open
rkennke wants to merge 3 commits intomainfrom
rkennke/PROF-14350
Open

Implement option to use JVMTI/JFR stack-walker for CPU- and wall-clock-profiler#504
rkennke wants to merge 3 commits intomainfrom
rkennke/PROF-14350

Conversation

@rkennke
Copy link
Copy Markdown
Contributor

@rkennke rkennke commented Apr 28, 2026

What does this PR do?:

We are working to contribute a JVMTI/JFR-based stack-walker into upstream JDK (see JEP Draft: Native Profiler Hook for Unbiased Stack Traces (Experimental)). This feature adds a JVMTI extension to trigger a stack-trace to be emitted via the JVM's JFR. This PR wires that feature to our CPU and wall-clock profilers. When the feature is enabled, and the JVMTI extension is detected, it will prefer that over the other stack-walkers.

Motivation:

I would like to be able to test the new JDK feature end-to-end. Eventually, when the feature is released in upstream JDK, it will be ready to go on our side.

Additional Notes:

The feature needs to be enabled by the jvmtistacks option.

The stack-traces will be emitted via the JVM's JFR stream. We merge the JVM JFR with the agent's JFR, the backend will receive the combined JFR. I'll post a separate PR in the backend which implements the necessary post-processing.

I also made a small unrelated addition to vmStructs to deal with removed uncompressed class-pointers support in JDK 27.

How to test the change?:

I manually verified that the CPU and wall-clock profiler produce the desired AsyncStackTrace events.

For Datadog employees:

  • If this PR touches code that signs or publishes builds or packages, or handles
    credentials of any kind, I've requested a review from @DataDog/security-design-and-guidance.
  • This PR doesn't touch any of that.
  • JIRA: PROF-14350

Unsure? Have a question? Request a review!

Comment thread ddprof-lib/src/main/cpp/vmEntry.cpp Outdated
Comment on lines +434 to +473
// Detect the HotSpot JFR async stack-trace JVMTI extension. Present on JDK 27+
// builds that expose the functions "com.sun.hotspot.functions.RequestStackTrace"
// and "com.sun.hotspot.functions.InitializeRequestStackTrace". If the user asked
// for jvmtistacks and both are present, initialise now (the JDK requires this
// to happen during Agent_OnLoad). After initialisation, signal handlers may
// delegate stack walks via VM::requestStackTrace().
if (_hotspot) {
jint ext_count = 0;
jvmtiExtensionFunctionInfo *ext_functions = nullptr;
if (_jvmti->GetExtensionFunctions(&ext_count, &ext_functions) == 0) {
for (jint i = 0; i < ext_count; i++) {
if (strcmp(ext_functions[i].id,
"com.sun.hotspot.functions.RequestStackTrace") == 0) {
_request_stack_trace = ext_functions[i].func;
} else if (strcmp(ext_functions[i].id,
"com.sun.hotspot.functions.InitializeRequestStackTrace") == 0) {
_init_request_stack_trace = ext_functions[i].func;
}
}
_jvmti->Deallocate((unsigned char *)ext_functions);
}

if (_agent_args._jvmtistacks) {
if (_request_stack_trace != nullptr && _init_request_stack_trace != nullptr) {
jvmtiError rc = _init_request_stack_trace(_jvmti);
if (rc == JVMTI_ERROR_NONE) {
_request_stack_trace_initialized = true;
Counters::increment(JVMTI_STACKS_INIT_OK);
} else {
Log::warn("InitializeRequestStackTrace failed: %d", rc);
Counters::increment(JVMTI_STACKS_INIT_FAILED);
}
} else {
Log::warn("jvmtistacks requested but HotSpot RequestStackTrace extension "
"is not available on this JVM");
Counters::increment(JVMTI_STACKS_INIT_FAILED);
}
}
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A stylistic nit - can this be extracted to a function?

Comment thread ddprof-lib/src/main/cpp/hotspot/vmStructs.h
@rkennke rkennke marked this pull request as ready for review April 28, 2026 11:15
@rkennke rkennke requested a review from a team as a code owner April 28, 2026 11:15
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 57ab35fed9

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +399 to +401
if (_agent_args._jvmtistacks) {
if (_request_stack_trace != nullptr && _init_request_stack_trace != nullptr) {
jvmtiError rc = _init_request_stack_trace(_jvmti);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Initialize RequestStackTrace from runtime jvmtistacks option

probeJFRRequestStackTrace() gates initialization on _agent_args._jvmtistacks, which is only populated from agent load options. In command/API flows (e.g., JavaProfiler.execute("start,jvmtistacks") after init0()), _agent_args remains false, so InitializeRequestStackTrace is never called and VM::canRequestStackTrace() stays false. As a result, the new jvmtistacks runtime option silently cannot activate the delegated CPU/wall engines unless it was present at JVM/agent startup.

Useful? React with 👍 / 👎.

Comment on lines +132 to +136
Error ITimerJvmti::check(Arguments &args) {
if (!VM::canRequestStackTrace()) {
return Error("HotSpot RequestStackTrace JVMTI extension not available");
}
return Error::OK;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Validate ITIMER_PROF support in delegated ITimer check

ITimerJvmti::check() only verifies VM::canRequestStackTrace() and skips the ITIMER_PROF probe that ITimer::check() performs. selectCpuEngine() uses check() to choose engines, so this can select ITimerJvmti on systems where setitimer(ITIMER_PROF) later fails in start(), disabling CPU profiling instead of falling back to another available CPU engine.

Useful? React with 👍 / 👎.

@jbachorik
Copy link
Copy Markdown
Collaborator

ddprof-lib/src/main/cpp/profiler.cpp:609

I think Error::OK is 0, so the condition is correct. But it might be good to use explicit comparison against Error::OK for posterity

@jbachorik
Copy link
Copy Markdown
Collaborator

ddprof-lib/src/main/cpp/vmEntry.h:804

Data Race Uninitialized Pointers

Plain non-atomic reads of _request_stack_trace and _request_stack_trace_initialized in canRequestStackTrace() race with initialization writes in probeJFRRequestStackTrace(). Signal handlers may see partial initialization on other CPUs due to lack of memory ordering. CWE-131 pattern: pointer_race_no_sync.

Use __atomic_load_n for both _request_stack_trace and _request_stack_trace_initialized with __ATOMIC_ACQUIRE semantics in canRequestStackTrace(). Initialize with __atomic_store_n in probeJFRRequestStackTrace().

@jbachorik
Copy link
Copy Markdown
Collaborator

ddprof-lib/src/main/cpp/profiler.h:99

False Sharing

_sample_seq is placed adjacent to _total_samples (both write-mostly atomics incremented on every sample), sharing a single cache line. This doubles coherence traffic on multi-socket/many-core hardware.

Separate with explicit padding (alignas(DEFAULT_CACHE_LINE_SIZE) on _sample_seq in profiler.h:99) or co-locate _sample_seq with a read-mostly field so the dirtied line is not also being read by other paths.

@jbachorik
Copy link
Copy Markdown
Collaborator

ddprof-lib/src/main/cpp/profiler.cpp:597

Contended Atomic

_sample_seq is incremented with atomicInc (full barrier __sync_fetch_and_add) on every CPU/wall sample from every thread, creating a single globally-contended cache line in the sample hot path. Combined with adjacent _total_samples and adjacent write-mostly operations, this doubles coherence traffic on multi-socket/many-core hardware.

Use atomicIncRelaxed(_sample_seq) (or __atomic_fetch_add(..., __ATOMIC_RELAXED)) like _total_samples; the value is opaque to the JVM and only needs uniqueness, not synchronization. Separate with explicit padding (alignas(DEFAULT_CACHE_LINE_SIZE) on _sample_seq in profiler.h:99) to avoid false-sharing with _total_samples.

@jbachorik
Copy link
Copy Markdown
Collaborator

ddprof-lib/src/main/cpp/profiler.cpp:597

Integer Overflow

Correlation ID generation via atomicInc(_sample_seq) + 1 can wrap at u64 boundaries; when _sample_seq reaches ULLONG_MAX, (ULLONG_MAX + 1) overflows to zero, violating the wire-format contract where zero means 'no correlation'. CWE-190. This becomes more severe when profiler restarts and resets _sample_seq to 0 (line 1067), potentially reusing correlation IDs that match stale AsyncStackTrace events from prior JFR recordings (replay attack vector).

Implement saturating increment: u64 seq = atomicIncRelaxed(_sample_seq); u64 corr = (seq == ULLONG_MAX) ? 1 : (seq + 1); to ensure correlation_id is never zero. Alternatively prevent zero by saturating/modular arithmetic on restart.

@jbachorik
Copy link
Copy Markdown
Collaborator

ddprof-lib/src/main/cpp/profiler.cpp:1032

Misleading Comment

The new comment block above selectWallEngine's jvmtistacks branch states 'The engine-level _wallclock_sampler knob still takes precedence for users who explicitly request JVMTI/ASGCT', but the code returns wall_jvmti_engine before the switch(args._wallclock_sampler) is even reached, so jvmtistacks in fact OVERRIDES the knob. Documentation contradicts behavior.

Rewrite comment to: 'jvmtistacks overrides _wallclock_sampler when the HotSpot extension is available'.

@jbachorik
Copy link
Copy Markdown
Collaborator

ddprof-lib/src/main/cpp/wallClock.cpp:878

Signal Safety Errno

WallClockJvmti::signalHandler does not save/restore errno across the body, unlike the sibling CTimerJvmti and ITimerJvmti handlers added in the same diff. Calls into ProfiledThread, OS::threadId, and recordSampleDelegated may set errno and clobber the interrupted thread's errno value.

Bracket the body with int saved_errno = errno; ...; errno = saved_errno; matching the pattern used in CTimerJvmti/ITimerJvmti.

@jbachorik
Copy link
Copy Markdown
Collaborator

ddprof-lib/src/main/cpp/vmEntry.cpp:765

Null Pointer Dereference

probeJFRRequestStackTrace() reads args._jvmtistacks (line 763) and calls _init_request_stack_trace->() (line 765) without checking if the InitializeRequestStackTrace function pointer was actually set. If the extension is not found in GetExtensionFunctions, _init_request_stack_trace remains nullptr, causing a null deref.

Add an explicit null check before calling _init_request_stack_trace: 'if (_init_request_stack_trace == nullptr) { Log::warn(...); Counters::increment(JVMTI_STACKS_INIT_FAILED); return; }' after line 764.

@jbachorik
Copy link
Copy Markdown
Collaborator

ddprof-lib/src/main/cpp/ctimer_linux.cpp:187

Duplication And Lsp

CTimerJvmti::start duplicates CTimer::start almost verbatim (max_timers/_timers calloc, registerThread loop) but skips OS::primeSignalOriginCheck() and origin validation that CTimer::start/signalHandler perform. Without origin validation, CTimerJvmti::signalHandler processes every SIGPROF without verifying it originates from setitimer(ITIMER_PROF); foreign SIGPROF (e.g. from raise/pthread_kill) will trigger unintended JVMTI stack-trace requests.

Either reuse CTimer::start (e.g. by parameterizing the signal handler) or add OS::primeSignalOriginCheck() call in CTimerJvmti::start() and OS::shouldProcessSignal guard in CTimerJvmti::signalHandler.

@jbachorik
Copy link
Copy Markdown
Collaborator

ddprof-lib/src/main/cpp/ctimer_linux.cpp:226

Silent Fail

CTimerJvmti::start() returns Error::OK unconditionally, discarding thread-registration failures from lines 219-222. Swallows per-thread registerThread failures and reports success even when no threads were actually registered.

Change line 226 to return result; instead of return Error::OK;

@jbachorik
Copy link
Copy Markdown
Collaborator

ddprof-lib/src/main/cpp/vmEntry.cpp:760

Resource Leak

GetExtensionFunctions returns a table where each jvmtiExtensionFunctionInfo entry owns sub-allocations (id, short_description, params array, and each param's name); only the top-level table is Deallocated, leaking every per-entry allocation for the lifetime of the JVM.

Iterate ext_functions before deallocating and call _jvmti->Deallocate on each entry's id, short_description, params[j].name for j in [0,param_count), and params itself, then Deallocate(ext_functions).

@jbachorik
Copy link
Copy Markdown
Collaborator

ddprof-lib/src/main/cpp/arguments.cpp:364

Argument Parsing Fallthrough

jvmtistacks argument parser has a logic bug: case 'f' falls through to 'default' which sets true, but comments say 'false'. All non-matched characters implicitly set true.

Add explicit break after case 'f': case 'f': _jvmtistacks = false; break;. Replace default with explicit error handling for invalid values.

@jbachorik
Copy link
Copy Markdown
Collaborator

ddprof-lib/src/main/cpp/profiler.cpp:597

Lock Contention

recordSampleDelegated attempts tryLock on three different lock indices (582-586) before falling back. If all three fail, the sample is dropped ('dangling AsyncStackTrace'). Under high lock contention, silently loses correlation between profiler events and JFR stack traces.

Add a counter (similar to JVMTI_STACKS_FAILED_OTHER) to track dropped samples due to lock contention, and surface this in the stop() diagnostic warnings.

@jbachorik
Copy link
Copy Markdown
Collaborator

ddprof-lib/src/main/cpp/profiler.cpp:597

Return Value Semantics

recordSampleDelegated returns true even when the lock could not be acquired and the sample event was dropped (only the dangling JVM-side stack-trace request remains). The header comment says 'true if the request was accepted by the JVM ... and the sample event was recorded' — the sample event was NOT recorded in that branch, so callers cannot rely on true meaning what the contract says.

Either return false on the lock-skip path so callers can count it as a drop, or update the header doc to: 'true if the JVM accepted the stack-trace request; the local sample event may still have been dropped due to lock contention'.

@jbachorik
Copy link
Copy Markdown
Collaborator

ddprof-lib/src/main/cpp/profiler.cpp:641

Documentation Accuracy

Error message at line 675 suggests JFR flag syntax '-XX:StartFlightRecording=...,+jdk.AsyncStackTrace#enabled=true' but standard JFR syntax uses '.enabled=true' not '#enabled=true'.

Replace '+jdk.AsyncStackTrace#enabled=true' with 'jdk.AsyncStackTrace.enabled=true'.

@jbachorik
Copy link
Copy Markdown
Collaborator

ddprof-lib/src/main/cpp/itimer.cpp:131

Maybe add __atomic_store_n(&_enabled, false, __ATOMIC_RELEASE); at the top of ITimerJvmti::stop before disarming the timer to close the in-flight signal window.

@jbachorik
Copy link
Copy Markdown
Collaborator

ddprof-lib/src/main/cpp/wallClock.cpp:843

NIT: Add a runtime check in WallClockJvmti::initialize() or add a comment stating that initialization must be guarded by the caller checking VM::canRequestStackTrace() first.

@jbachorik
Copy link
Copy Markdown
Collaborator

ddprof-lib/src/main/cpp/profiler.cpp:1045

Initialization

_sample_seq is reset to 0 on each profiler start/reset. No test verifies that the reset does not cause correlation ID collisions if profiler is stopped and restarted with existing JFR recordings.

Add integration test or comment explaining that stale AsyncStackTrace events from prior recording are expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants