Skip to content

skill(add-apm-integrations): R13-R33 + Cat B (context-propagation) rules#11760

Draft
jordan-wong wants to merge 2 commits into
masterfrom
skill/r13-r33-from-toolkit
Draft

skill(add-apm-integrations): R13-R33 + Cat B (context-propagation) rules#11760
jordan-wong wants to merge 2 commits into
masterfrom
skill/r13-r33-from-toolkit

Conversation

@jordan-wong

Copy link
Copy Markdown
Contributor

Summary

Ports reviewer-encoded rules from the toolkit's synced derivative (apm-integrations skill in DataDog/apm-instrumentation-toolkit) back to this canonical add-apm-integrations skill.

Over the past several weeks, expert reviewers (@PerfectSlayer, @amarziali) commented on toolkit-generated integration PRs (#11337 jedis-3.0, #11562 sparkjava-2.3, #11506 RxJava 3, #11709 feign, #11717 commons-httpclient). Their feedback was encoded as 21 numbered R-rules and a new "Step 4.4" section for context-propagation libraries.

Both the toolkit copy and this canonical copy should stay in sync — and they had drifted significantly (toolkit was 563/831 lines; canonical was 246).

What's added

Step 4.4 — Library category: span-creating vs context-propagation (NEW)

Routes reactive/async/executor/future/actor libraries (RxJava, Reactor, Kotlin coroutines, executors, etc.) into the context_propagation codegen path that produces InstrumenterModule.ContextTracking + per-type wrappers, rather than the standard span-creating advice. Includes:

  • Category A (span_creating) vs Category B (context_propagation) decision tree
  • Required output shape (target_kind, context_propagation sibling field)
  • The Flowable subscribe(FlowableSubscriber) overload rule (hook the framework-internal overload, not the public wrapper)

Step 4.5 — Java naming consistency (NEW)

Module-name and integration-name conventions.

Inline R-rules across Steps 4-12:

Where Rule
Step 4 R32 — dir name must end with version OR allowed suffix
Step 5 R13 — no single-type helper for CallDepthThreadLocalMap
Step 5 R30 — preserve master's integration name when regenerating
Step 7 R15/R16/R17 — instrument the single delegate, not all overloads
Step 7 R33 — no NullPointerException catches; use null-check guards
Step 9 R14 — test error/exception scenarios + spotless
Step 9 R18 — muzzle excludes incompatible majors
Step 9 R19 — latestDepTestImplementation range matches instrumented
Step 9 R20 — Java tests only; no new .groovy files
Step 9 R28 — compileOnly / testImplementation version split rationale
Step 9 R29 — register names in supported-configurations.json
Step 9 R31 — assertInverse only when declared min is true min
Step 4.4 R21-R27 — Cat A vs Cat B classification + Cat B output schema

What's NOT added

  • The toolkit's "Quick Reference" navigation section (toolkit-internal)
  • The "Step 11.1 — disclose skill knowledge in every PR description" boilerplate that was added during eval research and later removed as research-process overhead, not a permanent rule
  • The toolkit's ## Step 4 (eval runs) – Use a -generated directory name section (toolkit-specific eval workflow, not relevant to the canonical skill)

Paired with

DataDog/apm-instrumentation-toolkit#472 — same content on the toolkit's synced derivative copy.

Test plan

This is a documentation/skill content PR. No runtime code changes, no CI hooks to run.

  • Both copies of the skill (canonical here + toolkit derivative) describe the same R-rules
  • No removal of existing Step 1-12 content; all additions are additive
  • (Future) re-run the rxjava3 toolkit generation against this updated canonical skill and verify the agent produces a working InstrumenterModule.ContextTracking integration autonomously

Stats

+542 -6 (one file: .claude/skills/add-apm-integrations/SKILL.md)

…ep 4.4

Port reviewer-encoded rules from the toolkit's synced derivative
(`apm-integrations` skill in DataDog/apm-instrumentation-toolkit) back
to the canonical `add-apm-integrations` skill.

The toolkit accumulated 21+ numbered reviewer rules during eval research
(commits flowing through PR #11337 jedis-3.0, PR #11562 sparkjava-2.3,
PR #11717 commons-httpclient, PR #11709 feign, and PR #11506 RxJava 3).
These rules encode real failure modes observed during agent-generated
PRs and the fixes reviewers asked for.

Sections added:

- **Step 4.4** — Library category: span-creating vs context-propagation.
  New axis that routes Cat B libraries (reactive, async, executors,
  futures, actors) into a `context_propagation` codegen path producing
  `InstrumenterModule.ContextTracking` instead of span-creating advice.
  Includes the Flowable subscribe(FlowableSubscriber) overload rule
  (hook the framework-internal overload, not the public wrapper).
- **Step 4.5** — Java naming consistency (module-name conventions).

R-rule placements:

- Step 4:  R32 (dir name must end with version OR allowed suffix)
- Step 5:  R13 (no single-type helper class for CallDepthThreadLocalMap)
           R30 (preserve master's integration name when regenerating)
- Step 7:  R15/R16/R17 (single delegate method, not all overloads)
           R33 (no NullPointerException catches; use null-check guards)
- Step 9:  R14 (test error/exception scenarios + spotless)
           R18 (muzzle excludes incompatible majors)
           R19 (latestDepTestImplementation range matches instrumented)
           R20 (Java tests only; no new .groovy files)
           R28 (compileOnly/testImplementation version split rationale)
           R29 (register names in supported-configurations.json)
           R31 (assertInverse only when declared min is true min)
- Step 4.4: R21-R27 (the Cat A vs Cat B classification + Cat B schema)

This PR pairs with toolkit-side PR DataDog/apm-instrumentation-toolkit#472
which adds the same content to the toolkit's synced derivative copy.
Both copies should stay in sync — this is the canonical home.

Signed-off-by: Jordan Wong <jordan.wong@datadoghq.com>
@dd-octo-sts

dd-octo-sts Bot commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

🟢 Java Benchmark SLOs — All performance SLOs passed

Suite Status
Startup 🟢 pass

SLO thresholds are defined here based on automatically generated metrics. A warning is raised when results are within 5% of the threshold.

PR vs. master results
Scenario Candidate master Δ (95% CI of mean)
startup:insecure-bank:iast:Agent 13.99 s 14.00 s [-0.8%; +0.7%] (no difference)
startup:insecure-bank:tracing:Agent 12.96 s 13.08 s [-1.9%; -0.0%] (maybe better)
startup:petclinic:appsec:Agent 16.96 s 16.75 s [+0.2%; +2.4%] (maybe worse)
startup:petclinic:iast:Agent 16.91 s 16.95 s [-1.1%; +0.5%] (no difference)
startup:petclinic:profiling:Agent 16.59 s 16.87 s [-2.5%; -0.7%] (maybe better)
startup:petclinic:sca:Agent 16.92 s 16.53 s [+1.4%; +3.3%] (significantly worse)
startup:petclinic:tracing:Agent 16.18 s 16.11 s [-0.7%; +1.6%] (no difference)

Commit: de2291cd · CI Pipeline · Benchmarking Platform UI


Load and DaCapo benchmarks can be triggered manually in the GitLab pipeline. Results will appear in the Benchmarking Platform UI after completion.

Two cosmetic passes on the rule encoding ported from the toolkit's
synced derivative copy. Substantive content unchanged.

Pass 1 — strip R-numbering. The R13-R33 numbers are toolkit-internal
traceability tags that tie each rule to a specific reviewer comment on a
specific generated-PR review. They are meaningful to the toolkit-side
eval research workflow but meaningless to readers of the canonical
skill, who have no R1-R12 context here. Removes:

- '#### R<NN> — <title>' prefixes (14 sub-section headings)
- Inline '(R<NN>)', '(R<NN>/R<NN>/R<NN>)' parentheticals on existing
  bullets that already convey the rule in their wording
- A stale cross-reference 'see R29 below' rewritten to 'see
  "Register new integration names"'

The toolkit-side copy keeps the R-numbering — it remains the eval-research
home where the traceability matters. This is a one-way port convention.

Pass 2 — renumber half-steps so the decimals make sense. Was:

  Step 4 → Step 4.4 → Step 4.5 → Step 5

implying missing 4.1, 4.2, 4.3. The original numbering was an artifact
of an earlier toolkit-side draft that had a 4.1-4.4 enumeration which
got collapsed. Renumbered to:

  Step 4 → Step 4.1 → Step 4.2 → Step 5

Main integer steps (1-12) unchanged. Step 7.1 (Multiple advice classes
and @AppliesOn) left as-is since its decimal already makes sense
relative to Step 7.

Signed-off-by: Jordan Wong <jordan.wong@datadoghq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant