fix(llmobs): openai-java payload mapping for responses, tool metadata, and prompt tracking by ygree · Pull Request #10644 · DataDog/dd-trace-java

ygree · 2026-02-19T21:45:14Z

What Does This Do

Aligns OpenAI Java LLMObs span payloads with expected intake/system-test schema by:

Adding/filling missing LLMObs tags:
- _ml_obs_tag.integration
- _ml_obs_tag.source
- _ml_obs_tag.ddtrace.version
- _ml_obs_tag.error
- _ml_obs_tag.error_type
Ensuring model_name (and stable placeholder output where applicable) is set on error paths for
chat/completions/embeddings/responses.
Expanding Responses instrumentation:
- prompt tracking (input.prompt, variables, chat_template)
- tool definition extraction (tool_definitions)
- tool call/result extraction across function/custom/MCP outputs
- metadata normalization (stream, tool_choice, text.verbosity, etc.)
Updating LLMObs mapper payload shape:
- writes _dd map with span/trace ids
- nests error fields under meta.error
- supports map-based LLM input serialization (messages + prompt)
- remaps tool_definitions into meta.

Motivation

OpenAI/LLMObs system tests exposed schema and tag mismatches in Java payloads (especially response spans, tool metadata, error mapping, and prompt tracking structure). This change brings Java output in line with expected LLMObs intake contract and behavior.

Additional Notes

openai-java-3.0 min version updated from 3.0.0 to 3.0.1.
ResponseTextConfig fun verbosity(): Optional<Verbosity> was added in 3.0.1 openai/openai-java@c1de354#diff-6b385fb153d457757ba112e6117593cb59da6af308cce0f9b6f26e3885befc6cR73

DataDog/dd-apm-test-agent#280
DataDog/system-tests#6364

Contributor Checklist

Format the title according to the contribution guidelines
Assign the type: and (comp: or inst:) labels in addition to any other useful labels
Avoid using close, fix, or any linking keywords when referencing an issue
Use solves instead, and assign the PR milestone to the issue
Update the CODEOWNERS file on source file addition, migration, or deletion
Update public documentation with any new configuration flags or behaviors

Jira ticket: [PROJ-IDENT]

Note: Once your PR is ready to merge, add it to the merge queue by commenting /merge. /merge -c cancels the queue request. /merge -f --reason "reason" skips all merge queue checks; please use this judiciously, as some checks do not run at the PR-level. For more information, see this doc.

pr-commenter · 2026-02-19T22:33:17Z

Benchmarks

Startup

Parameters

	Baseline	Candidate
baseline_or_candidate	baseline	candidate
git_branch	master	ygree/llmobs-systest-fixes
git_commit_date	1773939812	1774400065
git_commit_sha	`5580c61`	`3d12515`
release_version	1.61.0-SNAPSHOT~5580c61ac4	1.60.0-SNAPSHOT~3d12515bb7

See matching parameters

	Baseline	Candidate
application	insecure-bank	insecure-bank
ci_job_date	1774401879	1774401879
ci_job_id	1535749070	1535749070
ci_pipeline_id	104249056	104249056
cpu_model	Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
kernel_version	Linux runner-zfyrx7zua-project-304-concurrent-0-ptwbs9lz 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux	Linux runner-zfyrx7zua-project-304-concurrent-0-ptwbs9lz 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
module	Agent	Agent
parent	None	None

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 62 metrics, 9 unstable metrics.

Startup time reports for petclinic

gantt
    title petclinic - global startup overhead: candidate=1.60.0-SNAPSHOT~3d12515bb7, baseline=1.61.0-SNAPSHOT~5580c61ac4

    dateFormat X
    axisFormat %s
section tracing
Agent [baseline] (1.058 s) : 0, 1057721
Total [baseline] (11.134 s) : 0, 11133804
Agent [candidate] (1.076 s) : 0, 1075626
Total [candidate] (11.153 s) : 0, 11153081
section appsec
Agent [baseline] (1.259 s) : 0, 1259346
Total [baseline] (11.201 s) : 0, 11201251
Agent [candidate] (1.249 s) : 0, 1249093
Total [candidate] (11.145 s) : 0, 11144880
section iast
Agent [baseline] (1.239 s) : 0, 1239499
Total [baseline] (11.408 s) : 0, 11407886
Agent [candidate] (1.238 s) : 0, 1237753
Total [candidate] (11.352 s) : 0, 11352262
section profiling
Agent [baseline] (1.192 s) : 0, 1191973
Total [baseline] (11.029 s) : 0, 11029415
Agent [candidate] (1.191 s) : 0, 1190882
Total [candidate] (11.087 s) : 0, 11087453

baseline results

Module	Variant	Duration	Δ tracing
Agent	tracing	1.058 s	-
Agent	appsec	1.259 s	201.625 ms (19.1%)
Agent	iast	1.239 s	181.778 ms (17.2%)
Agent	profiling	1.192 s	134.252 ms (12.7%)
Total	tracing	11.134 s	-
Total	appsec	11.201 s	67.448 ms (0.6%)
Total	iast	11.408 s	274.083 ms (2.5%)
Total	profiling	11.029 s	-104.389 ms (-0.9%)

candidate results

Module	Variant	Duration	Δ tracing
Agent	tracing	1.076 s	-
Agent	appsec	1.249 s	173.467 ms (16.1%)
Agent	iast	1.238 s	162.127 ms (15.1%)
Agent	profiling	1.191 s	115.256 ms (10.7%)
Total	tracing	11.153 s	-
Total	appsec	11.145 s	-8.2 ms (-0.1%)
Total	iast	11.352 s	199.181 ms (1.8%)
Total	profiling	11.087 s	-65.628 ms (-0.6%)

gantt
    title petclinic - break down per module: candidate=1.60.0-SNAPSHOT~3d12515bb7, baseline=1.61.0-SNAPSHOT~5580c61ac4

    dateFormat X
    axisFormat %s
section tracing
crashtracking [baseline] (1.19 ms) : 0, 1190
crashtracking [candidate] (1.213 ms) : 0, 1213
BytebuddyAgent [baseline] (629.278 ms) : 0, 629278
BytebuddyAgent [candidate] (639.784 ms) : 0, 639784
AgentMeter [baseline] (29.387 ms) : 0, 29387
AgentMeter [candidate] (29.945 ms) : 0, 29945
GlobalTracer [baseline] (256.777 ms) : 0, 256777
GlobalTracer [candidate] (260.861 ms) : 0, 260861
AppSec [baseline] (31.656 ms) : 0, 31656
AppSec [candidate] (32.29 ms) : 0, 32290
Debugger [baseline] (60.481 ms) : 0, 60481
Debugger [candidate] (61.308 ms) : 0, 61308
Remote Config [baseline] (591.548 µs) : 0, 592
Remote Config [candidate] (598.512 µs) : 0, 599
Telemetry [baseline] (8.018 ms) : 0, 8018
Telemetry [candidate] (8.174 ms) : 0, 8174
Flare Poller [baseline] (4.269 ms) : 0, 4269
Flare Poller [candidate] (5.14 ms) : 0, 5140
section appsec
crashtracking [baseline] (1.201 ms) : 0, 1201
crashtracking [candidate] (1.184 ms) : 0, 1184
BytebuddyAgent [baseline] (665.509 ms) : 0, 665509
BytebuddyAgent [candidate] (660.488 ms) : 0, 660488
AgentMeter [baseline] (12.266 ms) : 0, 12266
AgentMeter [candidate] (12.15 ms) : 0, 12150
GlobalTracer [baseline] (260.977 ms) : 0, 260977
GlobalTracer [candidate] (258.415 ms) : 0, 258415
AppSec [baseline] (178.806 ms) : 0, 178806
AppSec [candidate] (177.543 ms) : 0, 177543
Debugger [baseline] (66.757 ms) : 0, 66757
Debugger [candidate] (66.325 ms) : 0, 66325
Remote Config [baseline] (638.436 µs) : 0, 638
Remote Config [candidate] (619.631 µs) : 0, 620
Telemetry [baseline] (8.461 ms) : 0, 8461
Telemetry [candidate] (8.258 ms) : 0, 8258
Flare Poller [baseline] (3.63 ms) : 0, 3630
Flare Poller [candidate] (3.615 ms) : 0, 3615
IAST [baseline] (24.653 ms) : 0, 24653
IAST [candidate] (24.202 ms) : 0, 24202
section iast
crashtracking [baseline] (1.211 ms) : 0, 1211
crashtracking [candidate] (1.216 ms) : 0, 1216
BytebuddyAgent [baseline] (804.891 ms) : 0, 804891
BytebuddyAgent [candidate] (803.152 ms) : 0, 803152
AgentMeter [baseline] (11.712 ms) : 0, 11712
AgentMeter [candidate] (11.72 ms) : 0, 11720
GlobalTracer [baseline] (248.815 ms) : 0, 248815
GlobalTracer [candidate] (249.18 ms) : 0, 249180
AppSec [baseline] (26.73 ms) : 0, 26730
AppSec [candidate] (26.798 ms) : 0, 26798
Debugger [baseline] (69.869 ms) : 0, 69869
Debugger [candidate] (69.676 ms) : 0, 69676
Remote Config [baseline] (529.736 µs) : 0, 530
Remote Config [candidate] (518.658 µs) : 0, 519
Telemetry [baseline] (10.306 ms) : 0, 10306
Telemetry [candidate] (10.099 ms) : 0, 10099
Flare Poller [baseline] (3.689 ms) : 0, 3689
Flare Poller [candidate] (3.639 ms) : 0, 3639
IAST [baseline] (25.561 ms) : 0, 25561
IAST [candidate] (25.528 ms) : 0, 25528
section profiling
ProfilingAgent [baseline] (94.071 ms) : 0, 94071
ProfilingAgent [candidate] (93.748 ms) : 0, 93748
crashtracking [baseline] (1.188 ms) : 0, 1188
crashtracking [candidate] (1.183 ms) : 0, 1183
BytebuddyAgent [baseline] (688.536 ms) : 0, 688536
BytebuddyAgent [candidate] (688.462 ms) : 0, 688462
AgentMeter [baseline] (9.115 ms) : 0, 9115
AgentMeter [candidate] (9.075 ms) : 0, 9075
GlobalTracer [baseline] (217.146 ms) : 0, 217146
GlobalTracer [candidate] (216.977 ms) : 0, 216977
AppSec [baseline] (32.391 ms) : 0, 32391
AppSec [candidate] (32.288 ms) : 0, 32288
Debugger [baseline] (65.656 ms) : 0, 65656
Debugger [candidate] (65.47 ms) : 0, 65470
Remote Config [baseline] (559.469 µs) : 0, 559
Remote Config [candidate] (555.13 µs) : 0, 555
Telemetry [baseline] (8.5 ms) : 0, 8500
Telemetry [candidate] (7.683 ms) : 0, 7683
Flare Poller [baseline] (3.473 ms) : 0, 3473
Flare Poller [candidate] (4.235 ms) : 0, 4235
Profiling [baseline] (94.643 ms) : 0, 94643
Profiling [candidate] (94.301 ms) : 0, 94301

Startup time reports for insecure-bank

gantt
    title insecure-bank - global startup overhead: candidate=1.60.0-SNAPSHOT~3d12515bb7, baseline=1.61.0-SNAPSHOT~5580c61ac4

    dateFormat X
    axisFormat %s
section tracing
Agent [baseline] (1.062 s) : 0, 1062273
Total [baseline] (8.888 s) : 0, 8888380
Agent [candidate] (1.061 s) : 0, 1061233
Total [candidate] (8.861 s) : 0, 8861113
section iast
Agent [baseline] (1.235 s) : 0, 1234635
Total [baseline] (9.541 s) : 0, 9541353
Agent [candidate] (1.227 s) : 0, 1226844
Total [candidate] (9.541 s) : 0, 9541287

baseline results

Module	Variant	Duration	Δ tracing
Agent	tracing	1.062 s	-
Agent	iast	1.235 s	172.362 ms (16.2%)
Total	tracing	8.888 s	-
Total	iast	9.541 s	652.973 ms (7.3%)

candidate results

Module	Variant	Duration	Δ tracing
Agent	tracing	1.061 s	-
Agent	iast	1.227 s	165.611 ms (15.6%)
Total	tracing	8.861 s	-
Total	iast	9.541 s	680.174 ms (7.7%)

gantt
    title insecure-bank - break down per module: candidate=1.60.0-SNAPSHOT~3d12515bb7, baseline=1.61.0-SNAPSHOT~5580c61ac4

    dateFormat X
    axisFormat %s
section tracing
crashtracking [baseline] (1.196 ms) : 0, 1196
crashtracking [candidate] (1.203 ms) : 0, 1203
BytebuddyAgent [baseline] (632.575 ms) : 0, 632575
BytebuddyAgent [candidate] (632.625 ms) : 0, 632625
AgentMeter [baseline] (29.437 ms) : 0, 29437
AgentMeter [candidate] (29.564 ms) : 0, 29564
GlobalTracer [baseline] (258.02 ms) : 0, 258020
GlobalTracer [candidate] (257.908 ms) : 0, 257908
AppSec [baseline] (32.025 ms) : 0, 32025
AppSec [candidate] (31.869 ms) : 0, 31869
Debugger [baseline] (59.889 ms) : 0, 59889
Debugger [candidate] (59.896 ms) : 0, 59896
Remote Config [baseline] (584.354 µs) : 0, 584
Remote Config [candidate] (587.177 µs) : 0, 587
Telemetry [baseline] (8.083 ms) : 0, 8083
Telemetry [candidate] (8.043 ms) : 0, 8043
Flare Poller [baseline] (4.335 ms) : 0, 4335
Flare Poller [candidate] (3.511 ms) : 0, 3511
section iast
crashtracking [baseline] (1.209 ms) : 0, 1209
crashtracking [candidate] (1.195 ms) : 0, 1195
BytebuddyAgent [baseline] (800.882 ms) : 0, 800882
BytebuddyAgent [candidate] (797.058 ms) : 0, 797058
AgentMeter [baseline] (11.62 ms) : 0, 11620
AgentMeter [candidate] (11.35 ms) : 0, 11350
GlobalTracer [baseline] (248.864 ms) : 0, 248864
GlobalTracer [candidate] (247.016 ms) : 0, 247016
IAST [baseline] (25.666 ms) : 0, 25666
IAST [candidate] (25.317 ms) : 0, 25317
AppSec [baseline] (26.824 ms) : 0, 26824
AppSec [candidate] (26.377 ms) : 0, 26377
Debugger [baseline] (67.592 ms) : 0, 67592
Debugger [candidate] (68.144 ms) : 0, 68144
Remote Config [baseline] (540.643 µs) : 0, 541
Remote Config [candidate] (522.065 µs) : 0, 522
Telemetry [baseline] (11.285 ms) : 0, 11285
Telemetry [candidate] (10.168 ms) : 0, 10168
Flare Poller [baseline] (3.974 ms) : 0, 3974
Flare Poller [candidate] (3.63 ms) : 0, 3630

Load

Parameters

	Baseline	Candidate
baseline_or_candidate	baseline	candidate
git_branch	master	ygree/llmobs-systest-fixes
git_commit_date	1773939812	1774400065
git_commit_sha	`5580c61`	`3d12515`
release_version	1.61.0-SNAPSHOT~5580c61ac4	1.60.0-SNAPSHOT~3d12515bb7

See matching parameters

	Baseline	Candidate
application	insecure-bank	insecure-bank
ci_job_date	1774402357	1774402357
ci_job_id	1535749071	1535749071
ci_pipeline_id	104249056	104249056
cpu_model	Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
kernel_version	Linux runner-zfyrx7zua-project-304-concurrent-1-bkawo5ox 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux	Linux runner-zfyrx7zua-project-304-concurrent-1-bkawo5ox 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux

Summary

Found 0 performance improvements and 5 performance regressions! Performance is the same for 15 metrics, 16 unstable metrics.

scenario	Δ mean agg_http_req_duration_p50	Δ mean agg_http_req_duration_p95	Δ mean throughput	candidate mean agg_http_req_duration_p50	candidate mean agg_http_req_duration_p95	candidate mean throughput	baseline mean agg_http_req_duration_p50	baseline mean agg_http_req_duration_p95	baseline mean throughput
scenario:load:insecure-bank:iast_GLOBAL:high_load	worse [+233.763µs; +339.352µs] or [+8.732%; +12.677%]	worse [+408.664µs; +821.344µs] or [+5.387%; +10.828%]	unstable [-244.805op/s; +6.118op/s] or [-18.299%; +0.457%]	2.964ms	8.201ms	1218.469op/s	2.677ms	7.586ms	1337.812op/s
scenario:load:insecure-bank:iast_FULL:high_load	worse [+229.983µs; +396.886µs] or [+4.549%; +7.850%]	worse [+553.892µs; +1112.687µs] or [+4.643%; +9.328%]	unstable [-119.437op/s; +26.625op/s] or [-14.645%; +3.265%]	5.369ms	12.762ms	769.125op/s	5.056ms	11.929ms	815.531op/s
scenario:load:petclinic:no_agent:high_load	worse [+0.457ms; +1.857ms] or [+2.622%; +10.647%]	unstable [-0.148ms; +2.987ms] or [-0.501%; +10.089%]	unstable [-45.093op/s; +11.406op/s] or [-17.298%; +4.375%]	18.593ms	31.026ms	243.844op/s	17.436ms	29.606ms	260.688op/s

Request duration reports for petclinic

gantt
    title petclinic - request duration [CI 0.99] : candidate=1.60.0-SNAPSHOT~3d12515bb7, baseline=1.61.0-SNAPSHOT~5580c61ac4
    dateFormat X
    axisFormat %s
section baseline
no_agent (17.899 ms) : 17718, 18081
.   : milestone, 17899,
appsec (18.638 ms) : 18448, 18827
.   : milestone, 18638,
code_origins (17.78 ms) : 17601, 17959
.   : milestone, 17780,
iast (17.392 ms) : 17221, 17564
.   : milestone, 17392,
profiling (18.583 ms) : 18392, 18774
.   : milestone, 18583,
tracing (17.887 ms) : 17708, 18065
.   : milestone, 17887,
section candidate
no_agent (19.141 ms) : 18944, 19338
.   : milestone, 19141,
appsec (19.014 ms) : 18820, 19209
.   : milestone, 19014,
code_origins (17.559 ms) : 17389, 17729
.   : milestone, 17559,
iast (17.711 ms) : 17539, 17882
.   : milestone, 17711,
profiling (18.811 ms) : 18628, 18994
.   : milestone, 18811,
tracing (17.743 ms) : 17566, 17920
.   : milestone, 17743,

baseline results

Variant	Request duration [CI 0.99]	Δ no_agent
no_agent	17.899 ms [17.718 ms, 18.081 ms]	-
appsec	18.638 ms [18.448 ms, 18.827 ms]	738.19 µs (4.1%)
code_origins	17.78 ms [17.601 ms, 17.959 ms]	-119.18 µs (-0.7%)
iast	17.392 ms [17.221 ms, 17.564 ms]	-507.245 µs (-2.8%)
profiling	18.583 ms [18.392 ms, 18.774 ms]	683.594 µs (3.8%)
tracing	17.887 ms [17.708 ms, 18.065 ms]	-12.69 µs (-0.1%)

candidate results

Variant	Request duration [CI 0.99]	Δ no_agent
no_agent	19.141 ms [18.944 ms, 19.338 ms]	-
appsec	19.014 ms [18.82 ms, 19.209 ms]	-126.703 µs (-0.7%)
code_origins	17.559 ms [17.389 ms, 17.729 ms]	-1.582 ms (-8.3%)
iast	17.711 ms [17.539 ms, 17.882 ms]	-1.43 ms (-7.5%)
profiling	18.811 ms [18.628 ms, 18.994 ms]	-329.814 µs (-1.7%)
tracing	17.743 ms [17.566 ms, 17.92 ms]	-1.398 ms (-7.3%)

Request duration reports for insecure-bank

gantt
    title insecure-bank - request duration [CI 0.99] : candidate=1.60.0-SNAPSHOT~3d12515bb7, baseline=1.61.0-SNAPSHOT~5580c61ac4
    dateFormat X
    axisFormat %s
section baseline
no_agent (1.161 ms) : 1150, 1172
.   : milestone, 1161,
iast (3.192 ms) : 3149, 3235
.   : milestone, 3192,
iast_FULL (5.668 ms) : 5613, 5724
.   : milestone, 5668,
iast_GLOBAL (3.425 ms) : 3373, 3476
.   : milestone, 3425,
profiling (2.178 ms) : 2157, 2199
.   : milestone, 2178,
tracing (1.814 ms) : 1799, 1829
.   : milestone, 1814,
section candidate
no_agent (1.184 ms) : 1173, 1196
.   : milestone, 1184,
iast (3.252 ms) : 3209, 3295
.   : milestone, 3252,
iast_FULL (6.014 ms) : 5954, 6074
.   : milestone, 6014,
iast_GLOBAL (3.767 ms) : 3694, 3840
.   : milestone, 3767,
profiling (2.058 ms) : 2038, 2077
.   : milestone, 2058,
tracing (1.8 ms) : 1785, 1814
.   : milestone, 1800,

baseline results

Variant	Request duration [CI 0.99]	Δ no_agent
no_agent	1.161 ms [1.15 ms, 1.172 ms]	-
iast	3.192 ms [3.149 ms, 3.235 ms]	2.031 ms (174.9%)
iast_FULL	5.668 ms [5.613 ms, 5.724 ms]	4.507 ms (388.2%)
iast_GLOBAL	3.425 ms [3.373 ms, 3.476 ms]	2.264 ms (195.0%)
profiling	2.178 ms [2.157 ms, 2.199 ms]	1.017 ms (87.6%)
tracing	1.814 ms [1.799 ms, 1.829 ms]	652.973 µs (56.2%)

candidate results

Variant	Request duration [CI 0.99]	Δ no_agent
no_agent	1.184 ms [1.173 ms, 1.196 ms]	-
iast	3.252 ms [3.209 ms, 3.295 ms]	2.068 ms (174.6%)
iast_FULL	6.014 ms [5.954 ms, 6.074 ms]	4.83 ms (407.8%)
iast_GLOBAL	3.767 ms [3.694 ms, 3.84 ms]	2.583 ms (218.1%)
profiling	2.058 ms [2.038 ms, 2.077 ms]	873.355 µs (73.7%)
tracing	1.8 ms [1.785 ms, 1.814 ms]	615.518 µs (52.0%)

Dacapo

Parameters

	Baseline	Candidate
baseline_or_candidate	baseline	candidate
git_branch	master	ygree/llmobs-systest-fixes
git_commit_date	1773939812	1774400065
git_commit_sha	`5580c61`	`3d12515`
release_version	1.61.0-SNAPSHOT~5580c61ac4	1.60.0-SNAPSHOT~3d12515bb7

See matching parameters

	Baseline	Candidate
application	biojava	biojava
ci_job_date	1774402105	1774402105
ci_job_id	1535749072	1535749072
ci_pipeline_id	104249056	104249056
cpu_model	Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
kernel_version	Linux runner-zfyrx7zua-project-304-concurrent-0-d1220nyq 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux	Linux runner-zfyrx7zua-project-304-concurrent-0-d1220nyq 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 11 metrics, 1 unstable metrics.

Execution time for tomcat

gantt
    title tomcat - execution time [CI 0.99] : candidate=1.60.0-SNAPSHOT~3d12515bb7, baseline=1.61.0-SNAPSHOT~5580c61ac4
    dateFormat X
    axisFormat %s
section baseline
no_agent (1.473 ms) : 1461, 1484
.   : milestone, 1473,
appsec (2.505 ms) : 2450, 2559
.   : milestone, 2505,
iast (2.252 ms) : 2183, 2321
.   : milestone, 2252,
iast_GLOBAL (2.293 ms) : 2225, 2362
.   : milestone, 2293,
profiling (2.093 ms) : 2039, 2148
.   : milestone, 2093,
tracing (2.048 ms) : 1995, 2101
.   : milestone, 2048,
section candidate
no_agent (1.472 ms) : 1461, 1484
.   : milestone, 1472,
appsec (3.781 ms) : 3560, 4003
.   : milestone, 3781,
iast (2.25 ms) : 2182, 2319
.   : milestone, 2250,
iast_GLOBAL (2.294 ms) : 2225, 2362
.   : milestone, 2294,
profiling (2.094 ms) : 2038, 2150
.   : milestone, 2094,
tracing (2.052 ms) : 1999, 2106
.   : milestone, 2052,

baseline results

Variant	Execution Time [CI 0.99]	Δ no_agent
no_agent	1.473 ms [1.461 ms, 1.484 ms]	-
appsec	2.505 ms [2.45 ms, 2.559 ms]	1.032 ms (70.0%)
iast	2.252 ms [2.183 ms, 2.321 ms]	779.596 µs (52.9%)
iast_GLOBAL	2.293 ms [2.225 ms, 2.362 ms]	820.602 µs (55.7%)
profiling	2.093 ms [2.039 ms, 2.148 ms]	620.584 µs (42.1%)
tracing	2.048 ms [1.995 ms, 2.101 ms]	575.47 µs (39.1%)

candidate results

Variant	Execution Time [CI 0.99]	Δ no_agent
no_agent	1.472 ms [1.461 ms, 1.484 ms]	-
appsec	3.781 ms [3.56 ms, 4.003 ms]	2.309 ms (156.8%)
iast	2.25 ms [2.182 ms, 2.319 ms]	778.222 µs (52.9%)
iast_GLOBAL	2.294 ms [2.225 ms, 2.362 ms]	821.365 µs (55.8%)
profiling	2.094 ms [2.038 ms, 2.15 ms]	621.75 µs (42.2%)
tracing	2.052 ms [1.999 ms, 2.106 ms]	580.209 µs (39.4%)

Execution time for biojava

gantt
    title biojava - execution time [CI 0.99] : candidate=1.60.0-SNAPSHOT~3d12515bb7, baseline=1.61.0-SNAPSHOT~5580c61ac4
    dateFormat X
    axisFormat %s
section baseline
no_agent (15.277 s) : 15277000, 15277000
.   : milestone, 15277000,
appsec (14.971 s) : 14971000, 14971000
.   : milestone, 14971000,
iast (18.25 s) : 18250000, 18250000
.   : milestone, 18250000,
iast_GLOBAL (17.676 s) : 17676000, 17676000
.   : milestone, 17676000,
profiling (15.037 s) : 15037000, 15037000
.   : milestone, 15037000,
tracing (14.746 s) : 14746000, 14746000
.   : milestone, 14746000,
section candidate
no_agent (14.851 s) : 14851000, 14851000
.   : milestone, 14851000,
appsec (14.733 s) : 14733000, 14733000
.   : milestone, 14733000,
iast (17.993 s) : 17993000, 17993000
.   : milestone, 17993000,
iast_GLOBAL (18.212 s) : 18212000, 18212000
.   : milestone, 18212000,
profiling (15.075 s) : 15075000, 15075000
.   : milestone, 15075000,
tracing (14.767 s) : 14767000, 14767000
.   : milestone, 14767000,

baseline results

Variant	Execution Time [CI 0.99]	Δ no_agent
no_agent	15.277 s [15.277 s, 15.277 s]	-
appsec	14.971 s [14.971 s, 14.971 s]	-306.0 ms (-2.0%)
iast	18.25 s [18.25 s, 18.25 s]	2.973 s (19.5%)
iast_GLOBAL	17.676 s [17.676 s, 17.676 s]	2.399 s (15.7%)
profiling	15.037 s [15.037 s, 15.037 s]	-240.0 ms (-1.6%)
tracing	14.746 s [14.746 s, 14.746 s]	-531.0 ms (-3.5%)

candidate results

Variant	Execution Time [CI 0.99]	Δ no_agent
no_agent	14.851 s [14.851 s, 14.851 s]	-
appsec	14.733 s [14.733 s, 14.733 s]	-118.0 ms (-0.8%)
iast	17.993 s [17.993 s, 17.993 s]	3.142 s (21.2%)
iast_GLOBAL	18.212 s [18.212 s, 18.212 s]	3.361 s (22.6%)
profiling	15.075 s [15.075 s, 15.075 s]	224.0 ms (1.5%)
tracing	14.767 s [14.767 s, 14.767 s]	-84.0 ms (-0.6%)

…wthTestOpenAiLlmInteractions::test_completion

…teractions::test_chat_completion_tool_call

…d with python openai instrumentation and system-tests

… with variables + chat_template, longest-first overlap handling) and support map-based LLM input serialization (messages + prompt) in LLMObs mapper. Also filter empty instruction messages to match system-test expectations.

…st and return [image] (not empty) when stripped input_image URLs are missing, aligning mixed-input chat_template output with expected behavior.

…output.messages from request params so existing error-span tests pass.

…ol_definitions tags

…JSON argument parsing and remove duplicate manual parsing logic from ResponseDecorator.

ygree · 2026-03-17T22:11:55Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0c879ba692

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

...enai-java-3.0/src/main/java/datadog/trace/instrumentation/openai_java/ResponseDecorator.java

Kyle-Verhoog

LLMObs Team Review

Nice work aligning the Java SDK payloads with the intake schema — this is a big step for system test compliance. A few items to address/clarify below (inline), plus some overall notes:

Test Coverage Notes

What's well-covered: LLMObsSpanMapperTest expansion is great — covers _dd map, nested meta.error, map-based input with prompt/chat_template, tool definitions, tool calls + tool results. The decorator tests verify the new tags (source, integration, error, ddtrace.version).

Gaps to consider:

Error paths: No test exercises the error-path defaults (model_name and empty output set during withResponseCreateParams when the HTTP call fails). A test where the response errors out and verifying the span still has model_name and placeholder output would be valuable.
Prompt tracking: enrichInputWithPromptTracking(), extractChatTemplate(), extractPromptFromParams(), and normalizePromptVariable() have no unit tests. Template variable replacement edge cases (overlapping values, empty variables, image/file fallbacks) would increase confidence.
Custom/MCP tool calls: ToolCallExtractor.getToolCall(ResponseCustomToolCall) and getToolCall(McpCall) are new with no unit tests.
JsonValueUtils: New utility class with no dedicated tests for recursive JSON-to-Object conversion.

Questions

The min version bump from 3.0.0 to 3.0.1 — what API was missing in 3.0.0? This affects which customer versions get instrumented.
For the _dd map — does the intake expect apm_trace_id to equal trace_id? In other SDKs these can differ (APM trace ID vs LLMObs ID).

Kyle-Verhoog · 2026-03-22T04:34:50Z

.github/workflows/run-system-tests.yaml

    # If you change the following comment, update the pattern in the update_system_test_reference.sh script to match.
-    uses: DataDog/system-tests/.github/workflows/system-tests.yml@main # system tests are pinned for releases only
+    uses: DataDog/system-tests/.github/workflows/system-tests.yml@ea458202a7673efbe365e498d64d74a815c0a137 # system tests are pinned for releases only
    secrets:


System tests are pinned to commit ea458202 instead of main. The comment says "system tests are pinned for releases only." This should be reverted to main before merge to avoid blocking future system test updates. If it's intentional for CI during development, just make sure it goes back before merging.

Yes I also think this is a bad manipulation (hence my review is only limited to this file :) )

Yes, it was intentional and has just been reverted. The reason for this was to verify that the system tests pass with changes that are not yet part of the main branch: DataDog/system-tests#6364 Without that, none of the related tests would run at all.

Kyle-Verhoog · 2026-03-22T04:34:50Z

dd-trace-core/src/main/java/datadog/trace/llmobs/writer/ddintake/LLMObsSpanMapper.java

-      writable.writeString(errored ? "error" : "ok", null);
+      writable.writeUTF8(DD);
+      writable.startMap(3);
+      writable.writeUTF8(SPAN_ID);


The _dd map writes span_id, trace_id, and apm_trace_id. The Python SDK also includes t_id (64-bit trace ID) and s_id in the _dd map. Can you verify this is the correct subset for the Java path? If the intake expects additional fields, spans may be rejected or processed incorrectly.

This is aligned with dd-trace-py https://github.com/DataDog/dd-trace-py/blob/876c5f1ce4d173815537798a6a7b0ac15b0a4ede/ddtrace/llmobs/_llmobs.py#L618-L622. I don't find any t_id or s_id there.

Kyle-Verhoog · 2026-03-22T04:34:50Z

dd-trace-core/src/main/java/datadog/trace/llmobs/writer/ddintake/LLMObsSpanMapper.java

-
-      boolean errored = span.getError() == 1;
+      writable.writeUTF8(STATUS);
+      writable.writeString(span.getError() == 0 ? "ok" : "error", null);


The top-level error: 0/1 integer field has been removed and replaced with status: "ok"/"error" + error details nested under meta.error. Can you confirm no downstream consumers (EvP remapper, indexer facets, etc.) read error from the top level? This is a payload shape change that could be breaking if anything depends on the old field.

This change is dictated by the TestOpenAiLlmInteractions::test_chat_completion assertion. I assume that the system test assertions are correct. Have they been verified as being compliant with the requirements of downstream consumers?

Kyle-Verhoog · 2026-03-22T04:34:50Z

...enai-java-3.0/src/main/java/datadog/trace/instrumentation/openai_java/ResponseDecorator.java

+        }
+      }
+    } catch (Throwable ignored) {
+      // fall back to raw JSON if typed extraction is unavailable or fails


nit: catch (Throwable) swallows OutOfMemoryError, StackOverflowError, etc. Consider narrowing to catch (Exception) for both fallback paths here.

Kyle-Verhoog · 2026-03-22T04:34:50Z

dd-trace-core/src/main/java/datadog/trace/llmobs/writer/ddintake/LLMObsSpanMapper.java

+        boolean hasToolCalls = null != toolCalls && !toolCalls.isEmpty();
+        boolean hasToolResults = null != toolResults && !toolResults.isEmpty();
+        boolean hasContent = message.getContent() != null;
+        int mapSize = 1;


Behavioral change: previously content was always written (even as null). Now it's skipped when message.getContent() == null (e.g., tool-call-only messages). This is likely correct and matches Python SDK behavior, but worth confirming the intake handles messages without a content key.

This change is driven by TestOpenAiResponses::test_responses_create_tool_call. When the content is null, it is expected to be missing; otherwise, the assertion fails.

ygree · 2026-03-24T19:53:07Z

dd-java-agent/instrumentation/openai-java/openai-java-3.0/build.gradle

 apply from: "$rootDir/gradle/java.gradle"

-def minVer = '3.0.0'
+def minVer = '3.0.1'


ResponseTextConfig fun verbosity(): Optional<Verbosity> was added in 3.0.1 openai/openai-java@c1de354#diff-6b385fb153d457757ba112e6117593cb59da6af308cce0f9b6f26e3885befc6cR73

ygree · 2026-03-24T22:25:05Z

Questions

The min version bump from 3.0.0 to 3.0.1 — what API was missing in 3.0.0? This affects which customer versions get instrumented.

ResponseTextConfig fun verbosity(): Optional was added in 3.0.1 openai/openai-java@c1de354#diff-6b385fb153d457757ba112e6117593cb59da6af308cce0f9b6f26e3885befc6cR73

For the _dd map — does the intake expect apm_trace_id to equal trace_id? In other SDKs these can differ (APM trace ID vs LLMObs ID).

This is aligned with dd-trace-py https://github.com/DataDog/dd-trace-py/blob/876c5f1ce4d173815537798a6a7b0ac15b0a4ede/ddtrace/llmobs/_llmobs.py#L618-L622.

…and placeholder output set by withResponseCreateParams.

ygree self-assigned this Feb 19, 2026

ygree added comp: mlobs ML Observability (LLMObs) type: bug Bug report and fix labels Feb 19, 2026

llmobs: set model tag even when llmobs disabled

cbd6226

ygree force-pushed the ygree/llmobs-systest-fixes branch from 5cd257e to cbd6226 Compare February 24, 2026 09:31

ygree changed the title ~~llmobs: set model tag even when llmobs disabled~~ fix(llmobs): set model tag even when llmobs disabled Mar 2, 2026

ygree added 23 commits March 2, 2026 13:30

Set metadata.stream tag no matter it's true or false

4f27673

Set chat/completion CACHE_READ_INPUT_TOKENS tag

d128d6b

Set error nad error_type tags

3fc5ceb

Use "" instead of null for the role in CompletionDecorator to comply …

021a9d1

…wthTestOpenAiLlmInteractions::test_completion

Use "" instead of null for the content to comply with TestOpenAiLlmIn…

0637931

…teractions::test_chat_completion_tool_call

Add missing metatadata.tool_choice

0cb41e1

Add missing tool_definitions

a42f8aa

Add source:integration tag

6e10255

Add missing _dd attribute to the llmobs span event

34f3a07

Add missing error tags

a0c1139

Remove error from the llmobs span event. It must be part of meta block

effc343

Add missing meta.text.verbosity

c0e3876

Add summaryText and encrypted_content

b000770

Add missing tool_calls and tool_results for responses

53471a2

Always set stream param to produce the same request body to be aligne…

2207c46

…d with python openai instrumentation and system-tests

Fix OpenAI Responses prompt tracking to use response instructions fir…

7d683b6

…st and return [image] (not empty) when stripped input_image URLs are missing, aligning mixed-input chat_template output with expected behavior.

Set LLMObs error-path defaults in Java to always emit model_name and …

2c17ddc

…output.messages from request params so existing error-span tests pass.

Add OpenAI Responses tool definition extraction to populate LLMObs to…

ad3b782

…ol_definitions tags

Fix ChatCompletionServiceTest

1810327

Extract JsonValueUtils

46221e4

Refactor OpenAI responses instrumentation to reuse ToolCallExtractor …

61ad667

…JSON argument parsing and remove duplicate manual parsing logic from ResponseDecorator.

Fix test assertions

f0957b7

ygree added 5 commits March 6, 2026 10:35

Add integration tag

f3f1f75

Add ddtrace.verion

668e955

Improve test assertions

d57402e

Merge branch 'master' into ygree/llmobs-systest-fixes

a3051e3

Fix format

0c879ba

ygree changed the title ~~fix(llmobs): set model tag even when llmobs disabled~~ fix(llmobs): openai-java payload mapping for responses, tool metadata, and prompt tracking Mar 6, 2026

ygree added tag: ai generated Largely based on code generated by an AI or LLM tag: no release notes Changes to exclude from release notes labels Mar 6, 2026

ygree marked this pull request as ready for review March 6, 2026 13:46

ygree requested review from a team as code owners March 6, 2026 13:46

chatgpt-codex-connector bot reviewed Mar 17, 2026

View reviewed changes

...enai-java-3.0/src/main/java/datadog/trace/instrumentation/openai_java/ResponseDecorator.java Outdated Show resolved Hide resolved

...enai-java-3.0/src/main/java/datadog/trace/instrumentation/openai_java/ResponseDecorator.java Outdated Show resolved Hide resolved

ygree added 4 commits March 17, 2026 17:35

Include input messages when instructions are present in prompt tracking

f4e3a8b

Fix instructions role to system in prompt tracking

028d64f

Merge branch 'master' into ygree/llmobs-systest-fixes

82f4303

fix LLMObsSpanMapperTest

717a8f0

ygree requested a review from a team as a code owner March 20, 2026 00:37

ygree requested review from amarziali and removed request for a team March 20, 2026 00:37

ygree removed the tag: no release notes Changes to exclude from release notes label Mar 20, 2026

Kyle-Verhoog reviewed Mar 22, 2026

View reviewed changes

ygree force-pushed the ygree/llmobs-systest-fixes branch from 6dcdaf4 to 717a8f0 Compare March 24, 2026 18:43

ygree commented Mar 24, 2026

View reviewed changes

Catch exception not throwable

8420f0a

ygree added 2 commits March 24, 2026 16:00

Add JsonValueUtilsTest

91707fa

Test that on HTTP error, the OpenAI response span retains model_name …

3d12515

…and placeholder output set by withResponseCreateParams.

Conversation

ygree commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What Does This Do

Motivation

Additional Notes

Contributor Checklist

Uh oh!

pr-commenter bot commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarks

Startup

Parameters

Summary

Load

Parameters

Summary

Dacapo

Parameters

Summary

Uh oh!

ygree commented Mar 17, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Kyle-Verhoog left a comment

Choose a reason for hiding this comment

LLMObs Team Review

Test Coverage Notes

Questions

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ygree Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ygree Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ygree Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ygree commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Questions

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ygree commented Feb 19, 2026 •

edited

Loading

pr-commenter bot commented Feb 19, 2026 •

edited

Loading

ygree Mar 24, 2026 •

edited

Loading

ygree Mar 24, 2026 •

edited

Loading

ygree Mar 24, 2026 •

edited

Loading

ygree commented Mar 24, 2026 •

edited

Loading